From mogul  Fri Aug 20 15:26:54 1999
Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM)
	id AA10061; Fri, 20 Aug 1999 15:26:54 -0700
Message-Id: <9908202226.AA10061@youra.pa.dec.com>
To: http-delta
Subject: Announcing/testing the http-delta mailing list
Date: Fri, 20 Aug 99 15:26:54 -0700
From: mogul
X-Mts: smtp

As I wrote yesterday, I've created a new mailing list,
	http-delta@pa.dec.com
for further discussion of delta encoding in HTTP.  (My
hope is that this list will have a very short active
lifetime, but who knows?)

Usual administrivia: DON'T SEND (UN)SUBSCRIPTION REQUESTS TO
THE ENTIRE LIST!  Send them to
	http-delta-request@pa.dec.com

FYI, it's possible that (due to the highly distributed
implementation of mailing lists here), you won't be able
to post messages yourselves for several hours.  Please
don't all rush to send mail!

My next step will be to forward several people's mail
to the entire mailing list, so that we all see the messages
and so that they end up in the archive.  (Sorry, for the
time being I don't have an easy way to mirror the archive
onto a public Web site - you can ask me to email it to you.)

-Jeff

From mogul  Fri Aug 20 15:30:05 1999
Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM)
	id AA10103; Fri, 20 Aug 1999 15:30:05 -0700
Message-Id: <9908202230.AA10103@youra.pa.dec.com>
To: http-delta
Subject: Delta encoding in HTTP
From: Clifford Heath <cjh@osa.com.au>
X-Original-Date: Wed, 28 Jul 1999 11:10:53 +1000
Date: Fri, 20 Aug 99 15:30:05 -0700
Sender: mogul
X-Mts: smtp


Folk,

I am one of the designers of OSA's netDeploy product. OSA has experimented
with various forms of delta encoding for several years now, including
various forms of rsync-like protocols over HTTP. We wish to contribute
some of this experience to strengthening your work towards an RFC for
delta encodings in HTTP.

Rsync in HTTP (standardised perhaps differently to how Andrew Tridgell
recently suggested) offers the possibility of removing the need for the
server to store either old versions of a resource, or (precomputed or
cached) delta files for updating between specific versions. It does this
however at the cost of requiring on-the-fly difference computation on the
web server - this cost may be problematic in some situations. There is
also an additional cost in increasing the size of the HTTP request.
The response from an rsync-enabled web server could be encoded using a
format similar to vcdiff, with only minor encoding changes.

However we have also invented and filed a patent application for a
technology which has additional advantages over rsync in an HTTP context,
and which we believe avoids conflict with Pyne's patent.  We expect to be
able to get our board of directors to approve disclosure of this invention
and to allow its use within the terms required by the IETF for an RFC.

Specifically, the invention allows:
 - distributed byte-level differencing (server only requires current
   instances, no precomputed deltas or previous instance versions, as for
   rsync),
 - difference computation is performed by the client, relieving the server
   of this additional load,
 - difference files are cachable by unmodified existing web caches.
 - no modification is required to either existing web servers or to web
   caches, although there is some management advantage if changes can
   be made. A web cache can be independently fitted with enhanced
   differencing capability without needing servers to also be enhanced,
   and vice versa.

We also have a means whereby this differencing can be performed on
a compressed file, generating compressed differences, again without
requiring server or cache modifications including addition of transfer
encodings.

The cost for these benefits is an additional HTTP request per transfer,
meaning an extra network round-trip.  In some situations this additional
cost is unacceptable (your draft rules out the use of additional
requests), but in many situations it has no real impact.

I understand that there is no formal working group for your proposals.
Please reply indicating your interest in discussing our work and the
processes by which we can contribute to formulating an RFC that includes
some of the advantages I have mentioned.

------------------------------------------------------------
Clifford Heath                    http://www.osa.com.au/~cjh
Open Software Associates Limited       mailto:cjh@osa.com.au
29 Ringwood Street / PO Box 4414       Phone  +613 9871 1694
Ringwood VIC 3134      AUSTRALIA       Fax    +613 9871 1711
------------------------------------------------------------
Proven Solution Deployment for the Global Enterprise

From mogul  Fri Aug 20 15:31:07 1999
Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM)
	id AA10055; Fri, 20 Aug 1999 15:31:07 -0700
Message-Id: <9908202231.AA10055@youra.pa.dec.com>
To: http-delta
Subject: Re: Delta encoding in HTTP
From: Andrew Tridgell <tridge@linuxcare.com>
X-Original-Date: Wed, 28 Jul 1999 13:06:10 +1000
Date: Fri, 20 Aug 99 15:31:07 -0700
Sender: mogul
X-Mts: smtp

Clifford,

Thanks for raising this.

for those of you who haven't see the rproxy paper I gave at CALU you
can grab a copy at ftp://samba.org/pub/tridge/rproxy/ or look at a
"very alpha" prototype implementaion at
ftp://samba.org/pub/unpacked/rproxy/

> It does this however at the cost of requiring on-the-fly difference
> computation on the web server - this cost may be problematic in some
> situations.

Luckily it turns out to be not all that expensive. A simple
implementaion easily saturates my 10MBit lan at home on a low-end PC
and I'm sure it could be made a lot faster. If you include compression
then it gets a lot slower, but still is much faster than is needed for
most peoples internet links. It won't win a specweb benchmark but it
doesn't aim to :)

> There is also an additional cost in increasing the size of the HTTP
> request.

I think that isn't too much of a problem. In the current rproxy it
adds about 500 bytes to the request which leaves the request a long
way short of the common 1500 MTU, and thus within one IP segment. As
long as the request stays as a single packet I don't think the
overhead is excessive, especially when server-side signatures are used
as that ensures that signatures are only sent when both ends of the
link understand the protocol extension.

Still, it would be interesting to explore ways of avoiding this
overhead. Paul and I have come up with a couple of ways of doing this
but they involve server-side storage (not much storage, but some)
which we have been trying to avoid.

Our basic rules have been that we want no server storage, no extra
round-trips and using existing HTTP infrastructure whenever possible.

> However we have also invented and filed a patent application for a
> technology which has additional advantages over rsync in an HTTP context,
> and which we believe avoids conflict with Pyne's patent.  We expect to be
> able to get our board of directors to approve disclosure of this invention
> and to allow its use within the terms required by the IETF for an RFC.

would that allow 3rd party implementaion without permission from your
company? 

> Specifically, the invention allows:
>  - distributed byte-level differencing (server only requires current
>    instances, no precomputed deltas or previous instance versions, as for
>    rsync),

ummm, rsync doesn't require precomputed deltas or previous instance
versions. Maybe I didn't make that clear enough in the paper?

>  - difference computation is performed by the client, relieving the server
>    of this additional load,

that's certainly interesting 

>  - difference files are cachable by unmodified existing web caches.

we have a way of doing that in rproxy (using a content-encoding trick)
although I'm the first to admit it's a bit of a hack. I'll be
interested to see what your solution is.

>  - no modification is required to either existing web servers or to web
>    caches, although there is some management advantage if changes can
>    be made. A web cache can be independently fitted with enhanced
>    differencing capability without needing servers to also be enhanced,
>    and vice versa.

that is also the case with rproxy. I currently run:

netscape -> rproxy -> modem link -> rproxy -> squid -> world

and I get the delta benefit over the modem link. 

One interesting choice is minimal path versus maximal path delta
encoding. In a minimal path system you do delta encoding between each
nearest pair of delta-enabled elements in the path. This makes
intermediate cacheing easier (less hackish) but means everyone pays
the computational cost. In maximal path delta encoding the two
furthest apart elements in the chain do the encode/decode. Personally
I prefer maximal path encoding (and that is what rproxy implements)
but Peter Barker prefers minimal path encoding (Peter is the co-author
of rproxy).

> We also have a means whereby this differencing can be performed on
> a compressed file, generating compressed differences, again without
> requiring server or cache modifications including addition of transfer
> encodings.

Do you mean using existing compressed files (eg. gzip, zip, bzip2) or
do you mean using a modified compression algorithm? 

> The cost for these benefits is an additional HTTP request per transfer,
> meaning an extra network round-trip.

ugggh, that is a really big downside. It would break the normal flow
of HTTP which is a serious price to pay. 

> I understand that there is no formal working group for your proposals.
> Please reply indicating your interest in discussing our work and the
> processes by which we can contribute to formulating an RFC that includes
> some of the advantages I have mentioned.

I'd certainly be interested in further discussions. 

My plan at this stage is to play some more with the design of rproxy
then to implement it as a patch to squid, apache and mozilla. I'll
then get it deployed on some really large sites and see how it stands
up to a real battering. I haven't yet started to look into a
standardisation process because I wanted the protocol extensions to be
well and truly tested before going down that path.

Cheers, Tridge

From mogul  Fri Aug 20 15:33:15 1999
Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM)
	id AA10154; Fri, 20 Aug 1999 15:33:15 -0700
Message-Id: <9908202233.AA10154@youra.pa.dec.com>
To: http-delta
Subject: Comments on Delta Encoding in HTTP
From: danielh@crosslink.net
X-Original-Date: Thu, 19 Aug 1999 00:52:03 -0400
Date: Fri, 20 Aug 99 15:33:15 -0700
Sender: mogul
X-Mts: smtp

18 Aug 1999

From: Daniel Hellerstein (danielh@crosslink.net)
To: Jeffrey Mogul

Re: Comments on "Delta Encoding in HTTP"

I've read (a few times) the Draft encoding in the HTTP Internet-Draft, and
would like to make a few comments.  Since I'm not sure what the
appropriate forum for such comments may, I figure it's reasonable to send
them to you for a preliminary vetting.

Let me start by commending the quality of the writing, it's generally
quite good.  Aside from my major comment, most of my comments reflect
problems  I had comprehending the complete picture.

Lastly, I'm considering an experimental implementation of this proposal 
in my "SRE-http" web server. Do you anticipate any major changes  to this
proposal (abstracting from the major change I mention below!)



Major Comment:

The proposed use of templates is problemmatic. In particular, an
additional GET is required for each DTemplate, with no guarantee that the
results of this GET will ever be used. Even if a well designed user-agent
makes these requests in a way that does not effect the client's percieved
response time, these extra requests will reduce available bandwidth for
everyone else.

Was any thought given to a scheme where the template is returned first,
after which the client requests a delta against this template?
Alternatively, the template and the delta could be returned as a
multi-part document.

Perhaps this would use a new status code, say 228 Delta Template. Also,
the client could signal it's unwillingness to accept this "delta template
response" via a new request header, or with a  "no-template" 
Accept-encoding?


Minor comments:

1) Page 7, definition of instance.

It would be useful to further clarify the relationship between  resources,
instances, and entities.  For example:

  "One can think of an instance as a snapshot in the life of 
   a resource.  Diagramattically:
       a resource  -- yields --> an instance,  
       an instance -- yields --> an entity.
  where the entity incorporates the effects of content-encodings and range
  extractions.
  
2) Page 9, point 5.

   the phrase
      "... and an appropriately encoded body"
   is a bit terse. Perhaps
      "... and the appropriate range(s) from the possibly encoded body."

3) Page 10, before section 5.
  
   It might be useful to remind the reader that the client should decode
   using the reverse order of methods listed in the Content-encoding
   header. Thus, given a response header of 
       Content-encoding: vcgiff,gzip
   the client should first decompress, and then apply the reverse delta
encoding.

4) Page 14, para 3.

  The sentence
     (Presumably, the interrupted response used the same delta encoding,
if
      any.)
  seems too weak; it is hard to image any other circumstance for which
  a client would want a range of a delta content-encoding delta (since
  the entity body returned by the server is a piece of the delta).
  Hence, instead of "Presumably", how about "It is expected that ".

  Basically, it took me more then one read to understand that a range of a
  delta content-encoding means something like "return a piece of the
output
  of vcdiff"; implying that the client supplied byte range has little to
  do with the byte range of the newly created instance.  Reiterating
  this may be overkill,  but not letting people think otherwise
  is important.

5) 226 and 227 status codes

   Why use two new status codes, instead of one.  That
   is, use 226 for both delta and range of delta, and
   add a new response header (or modify an existing response
   header) to indicate that "this is a range response".
   I would guess that using 227 provides a useful hint to
   clients (that they should look for a Content-Range header)?


6)Page 18, second paragraph

  For clarity sake, how about this change:

     Suppose that the server's current instance has entity tag "B", and
     that the server also has retained a copy of the instance with entity
     tag "A". Then, the server could compute the difference between
     "B" and "A" and respond with:

7) Top of page 19.

   Further stressing the concept of range of delta content encoding,
   how about:

       selection, and returns a 227 (Range of Delta) response with
       an entity body containing bytes 900 to 999 of the vcdiff
       computed difference.

8) Page 22
   
   LRU is never defined (least recently used?)

9) Page 24.
   It might be useful to add this example:

   Suppose :
    a)the client requests /help/foo.bar,
      and the server responds with:
        HTTP/1.1 200 OK
        Etag: "abc"

    b)the client then requests /help/fun.bar,
      and the server responds with:

       HTTP/1.1 200 OK
       Etag: "efg"
       DCluster: "//bar.example.net/help/"

    c) Then, if the client re-requests /help/foo.bar, it should add
       a request header of:
         If-none-match:"abc","efg"

 
10) Page 25.

    It might be useful to add (after the "It would not make sense
paragraph")
    a reminder that use of broad uniqueness scope may also increase
    the work the server must do to ensure that no two URIs yield the same 
    entity tag.
  
  
11) Page 28.

   In the example containing
      DTemplate: "http://bar.example.net/foo.tplt"/etag="pqr"
   What if an etag of "zyx" is returned by the server in response to a 
   request for foo.tplt? Should the next request for foo.html
   use If-none-match: "pqr" or If-none-match" "zyx".  I'ld think the
   latter, but the example does not make that clear.

   Also, the following modification may be useful:
       This means that for any Request-URI matching the prefix specified
in
       the Dcluster header field, the URI specified in the DTemplate field
       is an appropriate template; and If-none-match should use "pqr"
       (assuming that "pqr" is the etag for foo.tplt).

12) Pag 32, 12.3.1

    The phrase
        "... or if the uniqueness scope for an entity tag of any instance
of
        the requested resource has ever included aonther resource".
    seems unnecessary.  That is, if the client only included one etag
    in the If-none-match, and the server didn't make any uniqueness scope
    errors, why would there ever be any ambiguity (since the client has
    the entity associated with this single etag, as does the server)?

13) Page 32, 12.3.2

    The descripton of Dcluster never mentions it's prime purpose -- to
    identify an "etag" to use for a set of Request-Uris.  Instead, 12.3.2
    is framed in terms of uniqueness scopes.  While these are important
    constraints affecting DCluster, it's not the reason one would use
    DCluster.



-----------------------------------------------------------
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------

From mogul  Fri Aug 20 15:43:43 1999
Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM)
	id AA10428; Fri, 20 Aug 1999 15:43:43 -0700
Message-Id: <9908202243.AA10428@youra.pa.dec.com>
To: http-delta
Subject: Re: Delta encoding in HTTP
From: Fred Douglis <douglis@research.att.com>
Date: Fri, 20 Aug 99 15:43:43 -0700
Sender: mogul
X-Mts: smtp

And speaking of the I-D, you should probably know that a patent that we 
(myself, Misha, Gaurav, Phong, and Jagadish) filed way back when we first 
did the optimistic delta work.  It issued earlier this month:  

``Method for reducing the delay between the time a data page is requested
and the time the data page is displayed,'' 
<a href="http://www.patents.ibm.com/details?patent_number=5931904">U.S. 
patent 5,931,904</a>, August 3, 1999.

Its impact on the I-D is left as an exercise for the reader -- after all 
this time and multiple iterations, I'm not even sure I know myself what 
this thing claims!  But I think it's fair to say that AT&T will be 
reasonable about it and wants to see this technology through the IETF.  I 
can't speak for AT&T in an official capacity as far as licensing goes, 
however.

Fred

From mogul  Fri Aug 20 15:51:10 1999
Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM)
	id AA10490; Fri, 20 Aug 1999 15:51:10 -0700
Message-Id: <9908202251.AA10490@youra.pa.dec.com>
To: http-delta
Subject: FYI: IETF rules on patents
Date: Fri, 20 Aug 99 15:51:10 -0700
From: mogul
X-Mts: smtp

Since the issue of patents relevant to delta encoding has come up:

From RFC2026, "The Internet Standards Process -- Revision 3"

10.3.2. Standards Track Documents

   (A)  Where any patents, patent applications, or other proprietary
      rights are known, or claimed, with respect to any specification on
      the standards track, and brought to the attention of the IESG, the
      IESG shall not advance the specification without including in the
      document a note indicating the existence of such rights, or
      claimed rights.  Where implementations are required before
      advancement of a specification, only implementations that have, by
      statement of the implementors, taken adequate steps to comply with
      any such rights, or claimed rights, shall be considered for the
      purpose of showing the adequacy of the specification.

(I'm not sure if this is the only IETF rule regarding patents
relevant to proposals for standards!)

At any rate, I'd encourage Fred or one of the other AT&T people
to take a look at the claims in their patent, at some point not
too far in the future, to see if there is anything there that
would affect the proposal we've been working on.  It may be that
the difference between "optimistic deltas" claimed in the AT&T
patent, and the "non-optimistic" approach in the current
proposal, avoid problems.

Likewise, I'm going to assume that the Open Software Associates
Limited patent doesn't conflict with this proposal, unless
someone from OSA thinks we should assume otherwise.

-Jeff

From yarong@exchange.microsoft.com  Sat Aug 21 15:48:54 1999
Received: from pobox1.pa.dec.com by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM)
	id AA19608; Sat, 21 Aug 1999 15:48:54 -0700
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA01399; Sat, 21 Aug 1999 15:48:54 -0700
Received: from doggate.exchange.microsoft.com (doggate.exchange.microsoft.com [131.107.88.55])
	by mail2.digital.com (8.9.2/8.9.2/WV2.0g) with ESMTP id PAA27202;
	Sat, 21 Aug 1999 15:48:53 -0700 (PDT)
Received: by doggate.exchange.microsoft.com with Internet Mail Service (5.5.2232.9)
	id <QGYGNLFV>; Sat, 21 Aug 1999 15:48:10 -0700
Message-Id: <078292D50C98D2118D090008C7E9C6A603C964EF@STAY.platinum.corp.microsoft.com>
From: "Yaron Goland (Exchange)" <yarong@exchange.microsoft.com>
To: "'mogul@pa.dec.com'" <mogul@pa.dec.com>, http-delta@pa.dec.com
Subject: Patent Issues
Date: Sat, 21 Aug 1999 15:47:58 -0700
Mime-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2232.9)
Content-Type: text/plain;
	charset="iso-8859-1"

Folks I really don't want to know about these patents. So far I have managed
to delete every piece of mail involving any reference to a patent, potential
or issued. 

I don't know what the patents cover and I don't want to know. I leave that
nonsense to the lawyers. 

So if you could please do me the kindness of putting the word "patent" into
the subject line so I can delete your e-mail unread I and my attorneys would
appreciate it.

	Thanks,

		Yaron

From cjh@osa.com.au  Sun Aug 22 18:13:21 1999
Received: from pobox1.pa.dec.com by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM)
	id AA20382; Sun, 22 Aug 1999 18:13:21 -0700
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA14159; Sun, 22 Aug 1999 18:13:20 -0700
Received: from osa.osa.com.au (osa.osa.com.au [203.6.130.129])
	by mail1.digital.com (8.9.2/8.9.2/WV2.0g) with ESMTP id SAA01439
	for <http-delta@pa.dec.com>; Sun, 22 Aug 1999 18:13:14 -0700 (PDT)
Received: (from uucp@localhost) by osa.osa.com.au (8.8.5/8.6.9) id LAA21644 for <http-delta@pa.dec.com>; Mon, 23 Aug 1999 11:13:09 +1000
Received: from UNKNOWN(15.16.33.1), claiming to be "redgum.osa.com.au"
 via SMTP by osa.osa.com.au, id smtpda21582; Mon Aug 23 01:13:07 1999
Received: from magpie.osa.com.au (magpie.osa.com.au [15.16.36.3]) by redgum.osa.com.au (8.6.9/8.6.9) with ESMTP id LAA13148 for <http-delta@pa.dec.com>; Mon, 23 Aug 1999 11:10:25 +1000
Received: from magpie.osa.com.au ([127.0.0.1]) by magpie.osa.com.au
	 with esmtp (ident cjh using rfc1413) id m11Iicz-0001frC
	(Debian Smail-3.2.0.102 1998-Aug-2 #2); Mon, 23 Aug 1999 11:10:25 +1000 (EST)
Message-Id: <m11Iicz-0001frC@magpie.osa.com.au>
To: http-delta@pa.dec.com
Subject: Re: FYI: IETF rules on patents 
In-Reply-To: Your message of "Fri, 20 Aug 1999 15:51:10 MST."
             <9908202251.AA10490@youra.pa.dec.com> 
Date: Mon, 23 Aug 1999 11:10:25 +1000
From: Clifford Heath <cjh@osa.com.au>

> Likewise, I'm going to assume that the Open Software Associates
> Limited patent doesn't conflict with this proposal, unless
> someone from OSA thinks we should assume otherwise.

Nothing in the current draft conflicts with our patent application.
If I propose material that would conflict, it will be under open
licencing terms (to be decided).  But at present, I just want some
tweaks that generalise the proposed standard to make it more effective
for rsync-like delta computation.

------------------------------------------------------------
Clifford Heath                    http://www.osa.com.au/~cjh
Open Software Associates Limited       mailto:cjh@osa.com.au
29 Ringwood Street / PO Box 4414       Phone  +613 9871 1694
Ringwood VIC 3134      AUSTRALIA       Fax    +613 9871 1711
------------------------------------------------------------
Proven Solution Deployment for the Global Enterprise

From mogul@pa.dec.com  Mon Aug 23 14:55:13 1999
Received: from pobox1.pa.dec.com by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM)
	id AA23517; Mon, 23 Aug 1999 14:55:13 -0700
Received: from youra.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA27124; Mon, 23 Aug 1999 14:55:13 -0700
Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM)
	id AA23515; Mon, 23 Aug 1999 14:55:11 -0700
Message-Id: <9908232155.AA23515@youra.pa.dec.com>
To: Clifford Heath <cjh@osa.com.au>
Cc: http-delta@pa.dec.com
Subject: Relationship between current Delta Encoding draft and OSA's design
In-Reply-To: Your message of "Fri, 20 Aug 99 15:30:05 PDT."
             <9908202230.AA10103@youra.pa.dec.com> 
Date: Mon, 23 Aug 99 14:55:11 -0700
From: mogul@pa.dec.com
X-Mts: smtp

Clifford Heath <cjh@osa.com.au> wrote:

    I understand that there is no formal working group for your
    proposals.  Please reply indicating your interest in discussing our
    work and the processes by which we can contribute to formulating an
    RFC that includes some of the advantages I have mentioned.

and also

    But at present, I just want some tweaks that generalise the
    proposed standard to make it more effective for rsync-like delta
    computation.

If you have specific changes that you would like to propose to
make to the current draft (draft-mogul-http-delta-01.txt), please
suggest them on this mailing list.

My inclination would be to suggest that any major changes be
proposed in the context of another document, rather than as
a revision to draft-mogul-http-delta-01.txt -- it sounds from
your relatively vague description that you are proposing a
distinctly different mechanism.

Our main concern at this point is to avoid putting anything
in the Delta Encoding spec that would create significant problems
for other extensions to HTTP, such as the OSA design or the
"Rsync in HTTP" design.  At the moment, my assumption is that
there is no such conflict.

-Jeff

From mogul@pa.dec.com  Mon Aug 23 15:04:16 1999
Received: from pobox1.pa.dec.com by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM)
	id AA23718; Mon, 23 Aug 1999 15:04:16 -0700
Received: from youra.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA11173; Mon, 23 Aug 1999 15:04:16 -0700
Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM)
	id AA23662; Mon, 23 Aug 1999 15:04:14 -0700
Message-Id: <9908232204.AA23662@youra.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Relationship between Rsync and Delta Encoding in HTTP 
In-Reply-To: Your message of "Fri, 20 Aug 99 15:31:07 PDT."
             <9908202231.AA10055@youra.pa.dec.com> 
Date: Mon, 23 Aug 99 15:04:14 -0700
From: mogul@pa.dec.com
X-Mts: smtp

Andrew Tridgell <tridge@linuxcare.com> writes:
   for those of you who haven't see the rproxy paper I gave at CALU you
   can grab a copy at ftp://samba.org/pub/tridge/rproxy/

Thanks for the pointer; can you define "CALU" for those of us
not aware of this?

I gathered from this paper that your proposal involves adding
a new content-coding ("rsync") and a new HTTP header ("Rsync-Signature",
although perhaps you should think of a shorter name?)  The
paper doesn't give a careful specification, and you also wrote:

    >  - difference files are cachable by unmodified existing web caches.
    we have a way of doing that in rproxy (using a content-encoding
    trick) although I'm the first to admit it's a bit of a hack. I'll
    be interested to see what your solution is.

so it sounds like there are protocol details that aren't obvious
from the paper.

Again, I would ask whether you seen any potential conflict between
the draft-mogul-http-delta-01.txt specification, and your own work.
If not, we should probably not try to couple the two proposals.

    My plan at this stage is to play some more with the design of
    rproxy then to implement it as a patch to squid, apache and
    mozilla. I'll then get it deployed on some really large sites and
    see how it stands up to a real battering. I haven't yet started to
    look into a standardisation process because I wanted the protocol
    extensions to be well and truly tested before going down that
    path.

I agree, don't try to standardize too soon.  However, it might
help other people to critique your design if you could provide
a preliminary specification.

-Jeff

From tridge@samba.anu.edu.au  Thu Aug 26 21:12:53 1999
Received: from pobox1.pa.dec.com by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM)
	id AA03292; Thu, 26 Aug 1999 21:12:53 -0700
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA26460; Thu, 26 Aug 1999 21:12:51 -0700
Received: from samba.anu.edu.au (samba.anu.edu.au [150.203.164.44])
	by mail2.digital.com (8.9.2/8.9.2/WV2.0g) with ESMTP id VAA20215;
	Thu, 26 Aug 1999 21:12:39 -0700 (PDT)
Received: (from localhost user: 'tridge', uid#148) by samba.anu.edu.au
	id <S12869326AbPH0DPH>; Fri, 27 Aug 1999 13:15:07 +1000
Sender: Andrew Tridgell <tridge@samba.anu.edu.au>
From: <tridge@linuxcare.com>
To: mogul@pa.dec.com
Cc: http-delta@pa.dec.com
In-Reply-To: <9908232204.AA23662@youra.pa.dec.com> (mogul@pa.dec.com)
Subject: Re: Relationship between Rsync and Delta Encoding in HTTP
Reply-To: tridge@linuxcare.com
References:  <9908232204.AA23662@youra.pa.dec.com>
Message-Id: <19990827031517Z12869326-13538+45@samba.anu.edu.au>
Date:   Fri, 27 Aug 1999 13:15:07 +1000

Sorry for the slow reply on this, I've been at a couple of
US conferences. I'm on a plane on the way back now :)

> Thanks for the pointer; can you define "CALU" for those of us
> not aware of this?

CALU is "conference of australian linux users". I know it wasn't
exactly the best forum for introducing this stuff, it just happened to
be the first conference I was going to after Peter and I did the work.

> I gathered from this paper that your proposal involves adding
> a new content-coding ("rsync") and a new HTTP header ("Rsync-Signature",
> although perhaps you should think of a shorter name?)  The
> paper doesn't give a careful specification, and you also wrote:

yes, that's right. I'm not fussed about the name of the header, and I
quite deliberately don't tie down the exact spec just yet as I'm
looking for comments on the general method rather than precisely what
each bit on the wire should mean. 

> so it sounds like there are protocol details that aren't obvious
> from the paper.

that is certainly true. Everything we've actually implemented is
available in the rproxy cvs area (also available as
ftp://samba.org/pub/unpacked/rproxy/) so you can see specifics there,
but I should once again point out that although the implementation
does work (and a few people actively use it) it is far from complete.

> Again, I would ask whether you seen any potential conflict between
> the draft-mogul-http-delta-01.txt specification, and your own work.
> If not, we should probably not try to couple the two proposals.

I'll answer that in a separate email when I get home and have a copy
of the draft in front of me.

> I agree, don't try to standardize too soon.  However, it might
> help other people to critique your design if you could provide
> a preliminary specification.

the code provides that to a large extent (it is deliberately kept very
simplistic) but I do plan on writing a better spec once I've finished
with the few conferences I'm doing at the moment.

Cheers, Tridge

From mogul@pa.dec.com  Fri Oct  1 15:55:59 1999
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA01209; Fri, 1 Oct 1999 15:55:59 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA19715; Fri, 1 Oct 1999 15:55:58 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA16282; Fri, 1 Oct 1999 15:55:58 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <199910012255.PAA16282@wera.pa.dec.com>
To: danielh@crosslink.net
Cc: http-delta@pa.dec.com
Subject: Re: Comments on Delta Encoding in HTTP 
In-Reply-To: Your message of "Fri, 20 Aug 1999 15:33:15 PDT."
             <9908202233.AA10154@youra.pa.dec.com> 
Date: Fri, 01 Oct 1999 15:55:58 -0700
X-Mts: smtp

I'm really sorry that it took me 6+ weeks to respond to your
message.  My only excuse is that this isn't exactly my "day job".

    Let me start by commending the quality of the writing, it's
    generally quite good.  Aside from my major comment, most of my
    comments reflect problems  I had comprehending the complete
    picture.

Thanks.

    Lastly, I'm considering an experimental implementation of this
    proposal in my "SRE-http" web server. Do you anticipate any major
    changes  to this proposal (abstracting from the major change I
    mention below!)

No.

    Major Comment:

    The proposed use of templates is problemmatic. In particular, an
    additional GET is required for each DTemplate, with no guarantee
    that the results of this GET will ever be used. Even if a well
    designed user-agent makes these requests in a way that does not
    effect the client's percieved response time, these extra requests
    will reduce available bandwidth for everyone else.

    Was any thought given to a scheme where the template is returned
    first, after which the client requests a delta against this
    template?  Alternatively, the template and the delta could be
    returned as a multi-part document.

The template mechanism is intended to support approaches such
as HTML Macros:
   9.  Fred Douglis, Antonio Haro, and Michael Rabinovich.  HPP: HTML
   Macro-Preprocessing to Support Dynamic Document Caching.  Proc.
   USENIX Symposium on Internet Technologies and Systems, USENIX,
   Monterey, CA, December, 1997, pp. 83-94.
and you should read that paper to get a better sense of why it
might or might not pay off.  The basic idea is to significantly
reduce the bandwidth requirements for repeated accesses to a group
of similar pages, but this approach does NOT try to minimize the
number of HTTP requests.

    Perhaps this would use a new status code, say 228 Delta Template.
    Also, the client could signal it's unwillingness to accept this
    "delta template response" via a new request header, or with a
    "no-template" Accept-encoding?
    
I think we already addressed this issue with:

   Note that an origin server ought not necessarily send a DTemplate
   header field on every response; doing so could waste network
   bandwidth, if the recipient is not delta-capable.  Instead, the
   server should employ heuristics to decide whether to send this header
   field.  For example, it might be worth sending it whenever the
   client's request message indicates its willingness to accept a
   delta-encoded response, and when the If-None-Match field in the
   request does not already specify the entity-tag of the template
   resource.

I'd encourage you to work with Fred Douglis and his co-authors
on the HTML Macros paper, if you think that this part of the
draft is badly designed.  Note that it's completely optional,
though.
    
    Minor comments:
    
    1) Page 7, definition of instance.
    
    It would be useful to further clarify the relationship between  resources,
    instances, and entities.  For example:
    
      "One can think of an instance as a snapshot in the life of 
       a resource.  Diagramattically:
	   a resource  -- yields --> an instance,  
	   an instance -- yields --> an entity.
      where the entity incorporates the effects of content-encodings and range
      extractions.

I'll try to add some clarification here.
      
    2) Page 9, point 5.
    
       the phrase
	  "... and an appropriately encoded body"
       is a bit terse. Perhaps
	  "... and the appropriate range(s) from the possibly encoded body."

Ditto.
    
    3) Page 10, before section 5.
      
       It might be useful to remind the reader that the client should decode
       using the reverse order of methods listed in the Content-encoding
       header. Thus, given a response header of 
	   Content-encoding: vcgiff,gzip
       the client should first decompress, and then apply the reverse delta
    encoding.
    
Good point - not that we think that client implementors are stupid,
but it's probably a good idea to make this explicit.

    4) Page 14, para 3.
    
      The sentence
	 (Presumably, the interrupted response used the same delta encoding,
    if
	  any.)
      seems too weak; it is hard to image any other circumstance for which
      a client would want a range of a delta content-encoding delta (since
      the entity body returned by the server is a piece of the delta).
      Hence, instead of "Presumably", how about "It is expected that ".

That's basically the dictionary definition of "presumably".
    
      Basically, it took me more then one read to understand that a
      range of a delta content-encoding means something like "return a
      piece of the output
      of vcdiff"; implying that the client supplied byte range has
      little to do with the byte range of the newly created instance.
      Reiterating this may be overkill,  but not letting people think
      otherwise is important.
    
I don't pretend that this is simple stuff, but I'm not sure what
to say that hasn't already been said.

    5) 226 and 227 status codes
    
       Why use two new status codes, instead of one.  That
       is, use 226 for both delta and range of delta, and
       add a new response header (or modify an existing response
       header) to indicate that "this is a range response".
       I would guess that using 227 provides a useful hint to
       clients (that they should look for a Content-Range header)?

Yes, this is by analogy with the 200/206 distinction for supporting
range responses without deltas.  I think it is generally better to
make things as explicit as possible, especially if this can be
done without actually adding more fields to the headers.    
    
    6)Page 18, second paragraph
    
      For clarity sake, how about this change:
    
	 Suppose that the server's current instance has entity tag "B", and
	 that the server also has retained a copy of the instance with entity
	 tag "A". Then, the server could compute the difference between
	 "B" and "A" and respond with:

Good suggestion.
    
    7) Top of page 19.
    
       Further stressing the concept of range of delta content encoding,
       how about:
    
	   selection, and returns a 227 (Range of Delta) response with
	   an entity body containing bytes 900 to 999 of the vcdiff
	   computed difference.
    
OK.

    8) Page 22
       
       LRU is never defined (least recently used?)
    
Sorry (and you're right about the definition).

    9) Page 24.
       It might be useful to add this example:
    
       Suppose :
	a)the client requests /help/foo.bar,
	  and the server responds with:
	    HTTP/1.1 200 OK
	    Etag: "abc"
    
	b)the client then requests /help/fun.bar,
	  and the server responds with:
    
	   HTTP/1.1 200 OK
	   Etag: "efg"
	   DCluster: "//bar.example.net/help/"
    
	c) Then, if the client re-requests /help/foo.bar, it should add
	   a request header of:
	     If-none-match:"abc","efg"
    
Actually, I'm not sure this necessarily makes sense, but it points
out a question that is currently only implicit in the spec: for
what period of time is a DCluster header value valid?

We don't want the DCluster header applicability to expire
when the response that it came with expires, because this would
severely limit the utility of delta encoding.  However, a
server knows at least that, at the time that it first sends a
DCluster header, it has to start maintaining the constraints on
entity tags implied by the header value (i.e., that entity
tags issued for resources covered by the header field are
unique).

But what can we say about its meaning with respect to entity
tags that were put into the cache *before* the DCluster header
was received?  In other words, in your example above, suppose
step (a) takes place many days (or months or years) before
step (b).  And note that if the client is re-requesting /help/foo.bar
in step (c), that's probably because the cache entry created in
step (a) has expired.

So we are in a situation where there is no obvious way to guarantee
that the constraint on entity-tag values implied by the DCluster
header received at step (b) actually applies to the entity tag
received in step (a).  It might, but it might not, and if we
allow an arbitrarily long gap here, then we make it impossible
for a server administrator to forget about any previously-issued
entity tag, no matter how long ago.

So I see two solutions: come up with a mechanism that lets the
server specify how far back in time to go in applying a DCluster
header, or simply to say that it never applies to previously
received cache entries.

I'd vote for the latter, and I'll add something to the effect
that

	The uniqueness scope specified by a DCluster header is valid
	only for entity tags received in the same response or in
	subsequent responses, never for entity tags received in
	previous responses.

and, by analogy

	The URI specified by a DTemplate header is valid only for use
	with entity tags received in the same response or in subsequent
	responses, never for use with entity tags received in previous
	responses.

How about that?
     
    10) Page 25.
    
	It might be useful to add (after the "It would not make sense
    paragraph")
	a reminder that use of broad uniqueness scope may also increase
	the work the server must do to ensure that no two URIs yield the same 
	entity tag.

I this is such a small point, in comparison to the other reason
for not having an over-broad uniqueness scope, that it isn't worth
saying.  Also, IETF specs typically concentrate on issues of
interoperability and network efficiency, and give implementors
as much freedom as they want to create work for themselves.
      
    11) Page 28.
    
       In the example containing
	  DTemplate: "http://bar.example.net/foo.tplt"/etag="pqr"
       What if an etag of "zyx" is returned by the server in response to a 
       request for foo.tplt? Should the next request for foo.html
       use If-none-match: "pqr" or If-none-match" "zyx".  I'ld think the
       latter, but the example does not make that clear.

I think it should be obvious that if you wait too long, the server
might replace the base instance and delete the original one (i.e.,
make the instance with etag="pqr" unavailable).  The issue here,
then, isn't whether the client should use one or the other in its
If-None-Match header (I think the protocol needs to work right in
either case), but rather what the server should return if
the client says If-none-match: "pqr", and that instance no longer
exists.  And the answer to that is obvious, you get back a status-200
response.

Again, the IETF practice is to avoid specifying stuff that isn't
required for interoperability or overall performance, so I think
we can leave the client implementors some freedom in this respect.
    
       Also, the following modification may be useful:
	   This means that for any Request-URI matching the prefix
	   specified in the Dcluster header field, the URI specified in
	   the DTemplate field is an appropriate template; and
	   If-none-match should use "pqr" (assuming that "pqr" is the
	   etag for foo.tplt).
    
I'll add something like that.

    12) Pag 32, 12.3.1
    
	The phrase
	    "... or if the uniqueness scope for an entity tag of any instance
    of
	    the requested resource has ever included aonther resource".
	seems unnecessary.  That is, if the client only included one etag
	in the If-none-match, and the server didn't make any uniqueness scope
	errors, why would there ever be any ambiguity (since the client has
	the entity associated with this single etag, as does the server)?

I'm not 100% sure of reasoning behind this (even though I probably
wrote it myself), but I think the concern was what would happen if
the response ended up in a proxy cache.  It would then be hard to
know whether it could be safely used later on.  I suppose we could
give the origin server the option of either including the
Delta-Base header or a "Vary: if-none-match" header (except that
the Vary header would have to list a number of fields that could
potentially carry the request etag!).  Again, I think it's safer
to make this information explicit, since if there is any ambiguity
it is likely to lead to cache transparency errors sooner or later.
    
    13) Page 32, 12.3.2
    
	The descripton of Dcluster never mentions it's prime purpose --
	to identify an "etag" to use for a set of Request-Uris.
	Instead, 12.3.2 is framed in terms of uniqueness scopes.  While
	these are important constraints affecting DCluster, it's not
	the reason one would use DCluster.

Huh?  A DCluster header *never* specifies an etag, it *always*
specifies a set of Request-URIs.  Which is, by definition,
a uniqueness scope.

-Jeff

From mogul  Thu Oct  7 13:34:45 1999
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA28488; Thu, 7 Oct 1999 13:34:45 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <199910072034.NAA28488@wera.pa.dec.com>
To: http-delta
Subject: Slightly revised HTTP Delta Encoding draft 
Date: Thu, 07 Oct 1999 13:34:45 -0700
X-Mts: smtp

Based on the comments from several of you, I've made some minor
revisions to the HTTP Delta Encoding draft; the revised version
is temporarily on:
    ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-delta-02.txt

If nobody objects, I'll submit this draft to the IETF within
the next few days.  Then maybe Bala can take care of asking
the IESG to bless this as a proposed standard.

We'll probably also need to issue revised versions of the
digest and vcdiff drafts, since they have both technically
expired.

Thanks
-Jeff

P.S.: Yes, I know that reference 10 in the draft above may be wrong;
I've fixed it but it takes a while to propagate the new version
through our firewall.

From mogul  Mon Oct 25 18:01:50 1999
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA03501; Mon, 25 Oct 1999 18:01:50 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <199910260101.SAA03501@wera.pa.dec.com>
To: http-delta
Subject: draft-mogul-http-delta-02.txt released by IETF
Date: Mon, 25 Oct 1999 18:01:50 -0700
X-Mts: smtp

The latest revision of "Delta encoding in HTTP" is available at
    http://www.ietf.org/internet-drafts/draft-mogul-http-delta-02.txt

I believe that Bala intends to asked the IESG to consider this
version as a Proposed Standard.

-Jeff

From mogul  Tue Dec  7 15:35:00 1999
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA29555; Tue, 7 Dec 1999 15:35:00 -0800 (PST)
Message-Id: <199912072335.PAA29555@wera.pa.dec.com>
To: http-delta
From: <danielh@crosslink.net>
X-original-Date: Fri, 12 Nov 1999 14:11:28 -0500
X-originally-To: mogul@pa.dec.com
Subject: a delta encoding and range conundrum...
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v1.61 b62 
Date: Tue, 07 Dec 1999 15:35:00 -0800
Sender: mogul
X-Mts: smtp


While implementing delta encoding, this transfer delta encoding conudundrum arose:

Consider a  request:
  get /foo/bar.html  http/1.1
  host: mysite.net
  If-none-match: "ver1"
  TE: diff-e
where we assume that "ver1" is a prior instance of /foo/bar.html

Let's suppose that the current instance of foo.bar is different then the 
"ver1" instance, and let's assume that it has an etag of "ver2".  Let's assume
that bytes 500 to 500 of the "ver2" instance are different then the "ver1" 
intsance. Then the response could be something lke:

  http/1.1 226 Delta
  Delta-base: "ver1"
  Transfer-encoding: diff-e
  Content-length: 98
  Etag: "ver2"

Instead, suppose the client requests a range, using:
  get /foo/bar.html  http/1.1
  host: mysite.net
  If-none-match: "ver1"
  Range: bytes=200-299
  TE: diff-e

This range hasn't changed; hence the DIFF -e is empty. That is, bytes 200-299
of "ver1" are the same as bytes 200-299 of "ver2".

So what should the server do? I can see 3 possibilities:
a) return an empty response, and assume the client will take this as meaning
    "no difference"
b) return a 304, with an etag of "ver1", and hope the client will assume that 
   this means that the requested range has not changed (with no implications 
   as  to the rest of the resource).
c) Avoid the hassle,  and send the whole thing (don't do any encoding) 

Solution b makes the most sense, but it does depend on the client 
agreeing to this interpretation (in the context of a range request).


-----------------------------------------------------------
danielh@crosslink.net
-----------------------------------------------------------

From mogul  Tue Dec  7 15:37:03 1999
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA06083; Tue, 7 Dec 1999 15:37:03 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <199912072337.PAA06083@wera.pa.dec.com>
To: <danielh@crosslink.net>
cc: http-delta
Subject: Re: a delta encoding and range conundrum... 
In-reply-to: Your message of "Fri, 12 Nov 1999 14:11:28 EST."
             <199911121952.OAA12695@lycanthrope.crosslink.net> 
Date: Tue, 07 Dec 1999 15:37:03 -0800
X-Mts: smtp

Sorry it took me so long to reply to this.  I've been postponing
work on the Delta draft because I had other deadlines that had
to be settled first.

You wrote:

    While implementing delta encoding, this transfer delta encoding
    conudundrum arose:
    
    Consider a  request:
      get /foo/bar.html  http/1.1
      host: mysite.net
      If-none-match: "ver1"
      TE: diff-e
    where we assume that "ver1" is a prior instance of /foo/bar.html
    
    Let's suppose that the current instance of foo.bar is different
    then the "ver1" instance, and let's assume that it has an etag of
    "ver2".  Let's assume that bytes 500 to 500 of the "ver2" instance
    are different then the "ver1" intsance. Then the response could be
    something lke:
    
      http/1.1 226 Delta
      Delta-base: "ver1"
      Transfer-encoding: diff-e
      Content-length: 98
      Etag: "ver2"
    
That example is actually illegal, because of this from section 4.4
from RFC2616:

   The transfer-length of a message is the length of the message-body as
   it appears in the message; that is, after any transfer-codings have
   been applied. When a message-body is included with a message, the
   transfer-length of that body is determined by one of the following
   (in order of precedence):

   [...]
   2.If a Transfer-Encoding header field (section 14.41) is present and
     has any value other than "identity", then the transfer-length is
     defined by use of the "chunked" transfer-coding (section 3.6),
     unless the message is terminated by closing the connection.

   3.If a Content-Length header field (section 14.13) is present, its
     decimal value in OCTETs represents both the entity-length and the
     transfer-length. The Content-Length header field MUST NOT be sent
     if these two lengths are different (i.e., if a Transfer-Encoding
     header field is present). If a message is received with both a
     Transfer-Encoding header field and a Content-Length header field,
     the latter MUST be ignored.

That is, you can't send both Transfer-Encoding and Content-Length!
However, I think this bug in your example is unrelated to the main
issue.

    Instead, suppose the client requests a range, using:
      get /foo/bar.html  http/1.1
      host: mysite.net
      If-none-match: "ver1"
      Range: bytes=200-299
      TE: diff-e
    
    This range hasn't changed; hence the DIFF -e is empty. That is,
    bytes 200-299 of "ver1" are the same as bytes 200-299 of "ver2".

    So what should the server do? I can see 3 possibilities:
    a) return an empty response, and assume the client will take
     this as meaning "no difference"
    b) return a 304, with an etag of "ver1", and hope the client
     will assume that this means that the requested range has not
     changed (with no implications as  to the rest of the resource).
    c) Avoid the hassle,  and send the whole thing (don't do any encoding) 
    
    Solution b makes the most sense, but it does depend on the client 
    agreeing to this interpretation (in the context of a range request).

I'll start by reminding you about section 4 of the draft (titled
"Relationship between content-coding, transfer-coding, and ranges").
    
Remember that transfer-codings, unlike Content-codings, are hop-by-hop.
This is a key distinction, because if you really are talking about
using a transfer-coding here, then the decision about whether
to apply the delta transfer-coding MUST be made after all of the
other relevant decisions (in particular, the choice between 200, 226,
and 304 status codes).  So one important tool in thinking about this
is that the example has to work if you remove all of the transfer
coding stuff.

I think (b) makes no sense at all in this case, since the underlying
resource variant has definitely changed (by the way you set up the
example).  If you used no transfer codings, it would be entirely
wrong to send 304, and adding a transfer coding isn't allow to
change this.

(c) is always legal, but it seems like a cop out.  So what is
the "right" way to use a delta transfer-coding in this example?

We start by constructing the respose that the server would
send without the transfer-coding:

	HTTP/1.1 206 Partial Content
	Etag: "ver2"
	Content-type: text/html
	Content-Length: 100
	Content-Range: bytes=200-299/1234
	Date: whatever

	<100 bytes of content>

This is the result of steps 1-6 in section 4 of the draft.
Now, because we want to try to apply a delta transfer-coding
(step 7), we would do the following:

    (7a) identify the base instance for the delta, which in
    this case is "ver1"
    
    (7b) make sure that we have a copy of that base instance;
    this might not be possible if the transfer-coding is
    being originally applied at an intermediate proxy cache!
    
    (7c) generate the required sub-range (bytes 200-299) of
    the base instance ("ver1").
    
    (7d) compute the delta between the result of step 7c
    and the "<100 bytes of content>" resulting from step 6.
    I'll assume you want to use diff-e here.  Let's assume
    that the result of this requires 17 bytes.
    
    (7e) replace the body resulting from step 6 with the output
    from 7d, using a chunked encoding (because of the
    requirements of section 4.4 of RFC2616), add these fields to
    the response headers:
	Transfer-encoding: diff-e
        Delta-base: "ver1"
    and (because of RFC2616, section 4.4), remove the
    Content-length header.

So, the result would be

    	HTTP/1.1 206 Partial Content
	Etag: "ver2"
	Content-type: text/html
	Content-Range: bytes=200-299/1234
	Transfer-encoding: diff-e
        Delta-base: "ver1"
	Date: whatever

	17
	<17 bytes of diff-e result>
	0

Note that section 4 of the draft says:
    Ranges are used for two main purposes:

      1. to complete a partial response after a premature
         termination of a message

      2. to obtain just selected sections of an instance.

   The former use of Range is consistent with the use of delta encoding
   as a content-coding; the latter requires the use of delta encoding as
   a transfer-coding.

Implicitly, your example falls into case (2), since it doesn't
make sense to use If-none-match in case (1) - you'd probably
be using If-Range in that case, or perhaps If-Match (in the
subcase where the client doesn't want to receive anything if
the underlying resource has changed).

All of this brings out one point, which is perhaps implicit
in the current delta draft, but which probably needs to be
made explicit.  This is the use of "Delta-base" in a response
using a delta transfer-coding.

In your example, since the request only identifies one possible
base version, the Delta-base response header is superfluous;
the requester knows what the only possible base version is.
But if the requester had sent, e.g.,

      get /foo/bar.html  http/1.1
      host: mysite.net
      If-none-match: "ver1", "ver0"
      Range: bytes=200-299
      TE: diff-e

then it would not be possible to use a delta encoding without
sending Delta-Base.

Currently, the delta draft spec (section 12.3.1) only discusses
the use of Delta-Base in conjunction with delta content-codings.
But there doesn't seem any reason not to include it in responses
using delta tranfer-codings, as long as the recipient strips
the Delta-Base header if it also strips the delta transfer-coding.

That is, I would change this, in section 12.3.1:

   A Delta-Base header field MUST be included in a 226 (Delta) or 227
   (Range of Delta) response if the request included more than one
   entity tag in its If-None-Match header field, or if the uniqueness
   scope for an entity tag of any instance of the requested resource has
   ever included another resource.  Any 226 or 227 response MAY include
   a Delta-base header.

to this:

   A Delta-Base header field MUST be included in a 226 (Delta) or
   227 (Range of Delta) response, or in a response that uses a
   delta transfer-coding, if the request included more than one
   entity tag in its If-None-Match header field, or if the
   uniqueness scope for an entity tag of any instance of the
   requested resource has ever included another resource.

   Any 226 or 227 response MAY include a Delta-base header.  A
   Delta-Base header MAY be included in a response using a delta
   transfer-coding, but if so, and if a forwarding agent also
   removes the delta transfer-coding, the Delta-Base header MUST
   be removed before the message is forwarded.

OK?

-Jeff

From mogul  Wed Dec  8 15:08:52 1999
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA04727; Wed, 8 Dec 1999 15:08:52 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <199912082308.PAA04727@wera.pa.dec.com>
To: danielh@crosslink.net
cc: http-delta
Subject: Re: a delta encoding and range conundrum... 
In-reply-to: Your message of "Wed, 08 Dec 1999 00:09:11 EST."
             <199912080525.AAA21225@lycanthrope.crosslink.net> 
Date: Wed, 08 Dec 1999 15:08:52 -0800
X-Mts: smtp

    >Remember that transfer-codings, unlike Content-codings, are hop-by-hop.
    >This is a key distinction, because if you really are talking about using
    >a transfer-coding here, then the decision about whether to apply the
    >delta transfer-coding MUST be made after all of the other relevant
    >decisions (in particular, the choice between 200, 226, and 304 status
    >codes).  So one important tool in thinking about this is that the example
    >has to work if you remove all of the transfer coding stuff.
    
    I definitely missed that: that 226 and 227 are ONLY used when a
    delta content-encoding has been applied.  I think that should be
    stated explicitily somewhere (if you'ld like, I'll look for a good
    place to put such a reminder).

Please feel free to suggest something (the more specific, the better,
although I might want to edit it).

    >So, the result would be
    >    	HTTP/1.1 206 Partial Content
    >	Etag: "ver2"
    >	Content-type: text/html
    >	Content-Range: bytes=200-299/1234
    >	Transfer-encoding: diff-e
    >   Delta-base: "ver1"
    >	Date: whatever
    
    >	17
    >	<17 bytes of diff-e result>
    >	0
    
    Shouldn't that be:
	    Transfer-encoding: diff-e,chunked
    
Yup, my mistake.

    One small point -- on my platform,
       DIFF -e  foo.1 copy_of_foo.1
    yields an empty string (a 0 length response).  So you'ld end up chunking
    an empty string and hope the recipient figures out that a
    chunked "empty string" means "no change".
    
Presumably, all users of a delta coding (including diff -e) agree
on the meaning of all legal coding outputs.
    
Thanks
-Jeff

From mogul  Fri Mar 10 10:07:39 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA16314; Fri, 10 Mar 2000 10:07:39 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <200003101807.KAA16314@wera.pa.dec.com>
To: http-delta
Subject: Delta-encoding: revised drafts submitted to the IETF
Date: Fri, 10 Mar 2000 10:07:39 -0800
X-Mts: smtp

It took us way too long, but ...

Almost 4 hours before the deadline for submitting Internet-Drafts
prior to the next IETF meeting, we've submitted the following
revised drafts to the IETF:

    draft-mogul-http-delta-03.txt ("Delta encoding in HTTP")
    
    draft-mogul-http-digest-02.txt ("Instance Digests in HTTP")
    
    draft-korn-vcdiff-01.txt
	("The VCDIFF Generic Differencing and Compression Data Format")

These three documents constitute the core of the Delta-encoding
specification.

Once these have made it through the queue of pending I-D
announcements, the plan is to issue a "last call" for
advancing delta-encoding on the IETF standards track, and
then to submit a request to the IESG.  I believe that Bala
had volunteered to do these steps.

-Jeff

From mogul  Mon Mar 20 18:06:56 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA26378; Mon, 20 Mar 2000 18:06:56 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <200003210206.SAA26378@wera.pa.dec.com>
To: http-delta
Subject: URLs for the revised Delta-encoding Internet-Drafts
Date: Mon, 20 Mar 2000 18:06:56 -0800
X-Mts: smtp

In case anyone needs these URLs for reference purposes:

	Title		: Instance Digests in HTTP
	Author(s)	: J. Mogul, A. van Hoff
	Filename	: draft-mogul-http-digest-02.txt
	Pages		: 12
	Date		: 14-Mar-00
	
	http://www.ietf.org/internet-drafts/draft-mogul-http-digest-02.txt

and

	Title		: Delta encoding in HTTP
	Author(s)	: J. Mogul, B. Krishnamurthy, Y. Goland, A. van Hoff,
                          F. Douglis, A. Feldmann 
	Filename	: draft-mogul-http-delta-03.txt
	Pages		: 45
	Date		: 14-Mar-00
	
	http://www.ietf.org/internet-drafts/draft-mogul-http-delta-03.txt

Note that Yaron Goland is no longer at Microsoft (I discovered this
after the I-D submission deadline), but may be reached as yaron@goland.org
I will update this in the next draft.

-Jeff

From danielh@crosslink.net  Mon Mar 20 21:37:14 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id VAA02902; Mon, 20 Mar 2000 21:37:14 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA06803; Mon, 20 Mar 2000 21:37:14 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id VAA05549
	for <http-delta@pa.dec.com>; Mon, 20 Mar 2000 21:37:13 -0800 (PST)
Received: from smtp.crosslink.net (dyn56.c5200-1.springfield.236.crosslink.net [207.199.142.57]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id AAA31305 for <http-delta@pa.dec.com>; Tue, 21 Mar 2000 00:37:11 -0500
Message-Id: <200003210537.AAA31305@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Tue, 21 Mar 2000 00:34:00 -0500
To: http-delta@pa.dec.com
In-Reply-To: <200003210206.SAA26378@wera.pa.dec.com>
Subject: A possible problem: when are etags assigned
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

Since I'm currently revising the delta-encoding module in my server, the
following conunudrum is of immediate concern...

My reading of draft-mogul-http-delta-03.txt indicates that there may be a
problem concerning the proper moment to 
assign an etag to an instance.

a) Draft 3 suggests that an etag should be assigned
   BEFORE content encoding (such as GZIP compression), 
  and also before range extraction.  

b) Iit is not clear whether this reading agrees with the sense of
    rfc2616. Others (for example, Koen Holtman) have suggested
    that they read RFC2616 to dictate that an etag be assigned after
   content-encoding, but before range extraction. 

So, am I just mis-understanding draft 3 -- in which case it would be
useful to add a note clarifying this point.  Or, is there really an
ambiguity?

Personally, I like the idea of assigning an etag before  content-encoding;
 an etag identified, content-encoded (which may mean delta encoded)
"entity body" is unlikely to be of future interest to the client; whereas
an etag identified unencoded instance may be of great interest (for use as
in a future If-None).  Roughly speaking, this does mean that the
user-agent should cache the de-content-encoded entity body,  but the
de-content-encoding has to be done anyways.

-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Tue Mar 21 15:56:22 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA03494; Tue, 21 Mar 2000 15:56:22 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA03399; Tue, 21 Mar 2000 15:56:21 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA03351; Tue, 21 Mar 2000 15:56:21 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003212356.PAA03351@wera.pa.dec.com>
To: danielh@crosslink.net
Cc: http-delta@pa.dec.com
Subject: Re: A possible problem: when are etags assigned 
In-Reply-To: Your message of "Tue, 21 Mar 2000 00:34:00 EST."
             <200003210537.AAA31305@lycanthrope.crosslink.net> 
Date: Tue, 21 Mar 2000 15:56:21 -0800
X-Mts: smtp

Ouch.  I think you're right, there is a problem here.
(On the one hand: why couldn't you have pointed this
out a year ago!  On the other hand: this is exactly
why the IETF process values "working code", and your
efforts to implement something have certainly proved
useful to this process.)

I can't say that I'm entirely sure I've figured this
out (I've spent several sessions wandering around in
the fresh air and trying to think this through), but
here's my current take.

The best way that I know how to resolve this kind of
question is to look at all the possible choices, and
then for each choice, see whether I can construct
a plausible scenario that leads to a bad result (or
to a contradiction).

It's quite clear that the entity tag must be assigned
*before* a delta content coding.  Otherwise, the entity
tag would be useless in deciding how to combine a delta
with a previous instance.

However, you write:
   Others (for example, Koen Holtman) have suggested that they
   read RFC2616 to dictate that an etag be assigned after
   content-encoding, but before range extraction.

I'm not always in agreement with Koen, but this time I think
he may be right.

Consider this scenario:
	(1) Content author creates foo.html
	(2) some software does "gzip -c foo.html >foo.html.gz"

Should foo.html and foo.html.gz have the same entity tag?

On the one hand, one could argue that these two files represent
identical content, but one of them is encoded differently.

On the other hand, we have three practical issues that suggest
that these two files should not have the same entity tag:

(A) RFC2616 section 3.11 (Entity Tags) says:
   A "strong entity tag" MAY be shared by two entities of a
   resource only if they are equivalent by octet equality.

Unfortunately, since RFC2616 uses an ambiguous definition
for "entity", it's a little hard to pin down what this means.
Strictly speaking, this might not even allow two different
ranges of the same instance to share an entity tag, but that
seems preposterous (and Koen seems to agree).  However, it
does argue against assigning the same entity tag to foo.html
and foo.html.gz

(B) As a practical matter, I believe that most (all?) existing
servers would not recognize that foo.html and foo.html.gz are
different encodings of the same content (for one thing, it might
be computationally expensive to verify this), and so it would
be difficult to get these servers to assign the same entity tag.

(C) Although we expect the ultimate client (e.g., browser) that
receives a message to be able to decode a content-coding, we
can't in general have the same expectation for intermediate
proxies - they might not be able to decode all content-codings.
And so it would be confusing if a server first sends foo.html
with entity tag "XYZZY", and then later sends a range of the
same file, with the same entity tag, but with a different
content-coding having been applied.  I believe that this is
not actually an error situation - the proxy could revert to
being a non-caching tunnel in this case - but it shows how
complex things get if we allow entity tags to be assigned before
content-codings.

So it looks like we have a contradiction: the entity tag must
be assigned before a delta content-coding, but after content-codings
in general.  Ouch.

There are three ways to resolve this contradiction: by kludging
(e.g., making delta encoding a special kind of content-coding
that is applied after the entity tag is assigned - yuck!), by
banning the combination of delta content-coding with any other
content-codings (this is probably not a useful approach, or
by realizing that it was a mistake to treat delta encoding as
a form of content-encoding, after all.

The kludge approach might require only minor tweaks to the
document, but I think it would lead to a big mess.

The last approach seems cleanest, but would require a number of
changes, including but certainly not limited to these:

(1) Section 4 (Relationship between content-coding, transfer-coding,
and ranges) needs to be changed to make it clear that the instance
is the result of a possible content coding, not an input to it.

(2) Applying similar changes to the I-D on instance digests
(we need to be consistent about assigning entity tags and
instance digests at the same point!)

(3) Creating a new message header (e.g., "DE") so that we would
send:
      HTTP/1.1 226 Delta
      ETag: "1acl059"
      DE: vcdiff
      Delta-base: "337pey"
      Date: Tue, 25 Nov 1997 18:30:05 GMT
and a new non-terminal, e.g., delta-encoding, and changing the
BNF so that "vcdiff", etc. are examples of delta-encoding, not
content-coding.

(4) Various changes to related text, examples, etc.

Does this make sense to the rest of you?  I guess I have some
work cut out.  Fortunately (?) the IETF won't be publishing any
new I-Ds for about two weeks.

-Jeff

From mogul  Wed Mar 22 10:46:07 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA12789; Wed, 22 Mar 2000 10:46:06 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <200003221846.KAA12789@wera.pa.dec.com>
To: http-delta
Subject: PLEASE send Delta-related messages to http-delta@pa.dec.com
Date: Wed, 22 Mar 2000 10:46:06 -0800
X-Mts: smtp

NOT just to me.

I'll be resending a bunch of on-topic messages that others have sent to me.

Thanks,
-Jeff

P.S.: Remember, http-delta-request@pa.dec.com for mailing list
additions/deletions/changes.

From mogul  Wed Mar 22 10:48:01 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA10135; Wed, 22 Mar 2000 10:48:01 -0800 (PST)
Message-Id: <200003221848.KAA10135@wera.pa.dec.com>
To: http-delta
From: <danielh@crosslink.net>
Reply-To: danielh@crosslink.net
Orginal-Date: Tue, 21 Mar 2000 23:42:50 -0500
In-Reply-To: <200003212356.PAA03351@wera.pa.dec.com>
Subject: Re: A possible problem: when are etags assigned
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 
Date: Wed, 22 Mar 2000 10:48:01 -0800
Sender: mogul
X-Mts: smtp

>Ouch.  I think you're right, there is a problem here.
>(On the one hand: why couldn't you have pointed this
>out a year ago!  On the other hand: this is exactly
>why the IETF process values "working code", and your
>efforts to implement something have certainly proved
>useful to this process.)

Alas, I've been laboring under some etag misconceptions.
For example, I completely missed the part about etags being assigned
before range extraction; which now makes sense to me
(i.e.; it allows for range extraction of an index which can then be used
to retrieve selected chapters; in the acrobat-selectively-
reading-a-large-pdf sense).

The realization about content-encoding
and etags only came when I read draft 3 and pondered the
significance of ...geez, I can't remember just what sub-clause got me
wondering.    Whatever, it's fortuitous that the
latest draft happen to come around just about the time I
was tinkering with the delta module (for reasons that had
nothing to do with etags!)

>It's quite clear that the entity tag must be assigned
>*before* a delta content coding.  Otherwise, the entity
>tag would be useless in deciding how to combine a delta
>with a previous instance.

Perhaps not useless, but crippled. Allow me to belabor the point, just to
make sure...

Consider the case:
  a) at 1PM, the client requests foo.html, recieves a response with an
etag of "def"
  b) at 8PM, the client re-requests foo.html, with If-none: "def" and
Accept-encoding: Gdiff
     He recieves a delta-content-encoded response, with an etag of "ghi"
If "ghi" refers to the "pre-encoded instance" from step b,  then
    c) at 9PM, the client can re-re-request foo.html, with If-none:
"def","ghi"
       The server then can use "ghi" (the instance used in step b), which
is probably
       more similar to the current (9PM) instance.
However, if "ghi" refers to the actual "entity body" (the difference file
returned at 8PM), then  "ghi" is almost certainly useless as a
base-instance

>I'm not always in agreement with Koen, but this time I think he may be
>right.
>Consider this scenario:
>	(1) Content author creates foo.html
>	(2) some software does "gzip -c foo.html >foo.html.gz"
>Should foo.html and foo.html.gz have the same entity tag?
>On the one hand, one could argue that these two files represent identical
>content, but one of them is encoded differently.

I like that notion ... but it does require some tortured parsing of what
an "entity" (versus an "entity body" and "entity contents")

>On the other hand, we have three practical issues that suggest that these
>two files should not have the same entity tag:
>(A) RFC2616 section 3.11 (Entity Tags) says:
>   A "strong entity tag" MAY be shared by two entities of a
>   resource only if they are equivalent by octet equality.
>Unfortunately, since RFC2616 uses an ambiguous definition
>for "entity", it's a little hard to pin down what this means. 
>Strictly speaking, this might not even allow two different ranges of the same
>instance to share an entity tag, but that seems preposterous (and Koen
>seems to agree).  However, it does argue against assigning the same
>entity tag to foo.html and foo.html.gz

So one could justify tortured parsing... and even cite precedent.  But I
agree, it's not a pleasing argument

>(B) As a practical matter, I believe that most (all?) existing servers
>would not recognize that foo.html and foo.html.gz are different encodings
>of the same content (for one thing, it might be computationally expensive
>to verify this), and so it would be difficult to get these servers to
>assign the same entity tag.

That may not be a disaster -- there's nothing saying that consecutive
responses for the same resource must have the same etag; they strongly
SHOULD, but it's not illegal if they don't.  So if the default behavior of
a server is to assign an etag based on file name (and date/size/whatever),
then these would get different etags. Admittedly, this does limit how
frequently delta encoding will succeed, but I don't see other major
problems.

>(C) Although we expect the ultimate client (e.g., browser) that receives
>a message to be able to decode a content-coding, we can't in general have
>the same expectation for intermediate
>proxies - they might not be able to decode all content-codings. And so it
>would be confusing if a server first sends foo.html with entity tag
>"XYZZY", and then later sends a range of the same file, with the same
>entity tag, but with a different content-coding having been applied.  I
>believe that this is not actually an error situation - the proxy could
>revert to being a non-caching tunnel in this case - but it shows how
>complex things get if we allow entity tags to be assigned before
>content-codings.

That sounds like a trump argument -- overburdening may break a possibly
fragile system of proxies.

>So it looks like we have a contradiction: the entity tag must be assigned
>before a delta content-coding, but after content-codings in general. 
>Ouch.

Yeah. More complications.

>There are three ways to resolve this contradiction: by kludging (e.g.,
>making delta encoding a special kind of content-coding that is applied
>after the entity tag is assigned - yuck!),

As an implementor, let me second that. Especially considering all the
emphasis put on "you must encode in the same order as the  accept-encoding
lists".

> by banning the combination of
>delta content-coding with any other content-codings (this is probably not
>a useful approach,

More then not useful, but potentially lethal -- I suspect that the average
response would benefit more from GZIP then from GDIFF (as an example).
Having both is crucial.

BTW: the emphasis on vcdiff is frustrating for me -- unless the situation
has changed, I can find no samples of a vcdiff encoder. At least GDIFF was
easy to understand and fairly easy to implement (given that I had
implemented Rsync independently)

>by realizing that it was a mistake to treat delta encoding as a form of
>content-encoding, after all.

The conflict between squeezing more stuff into a box, versus expanding it.

>The kludge approach might require only minor tweaks to the
>document, but I think it would lead to a big mess.

It might not be all that hard to do quick & dirty, but it's got a real
spagehtti code flavor to it.

>The last approach seems cleanest, but would require a number of changes,
>including but certainly not limited to these:

>(1) Section 4 (Relationship between content-coding, transfer-coding, and
>ranges) needs to be changed to make it clear that the instance is the
>result of a possible content coding, not an input to it.

Isn't the instance an "input" to content coding,
whereas the entity is an "output" from content coding?

That is, from section 4 the sequence is:
  a) use request string to match a resource
  b) use request headers, etc. to select a variant of the resource After
step b, we have an "instance" -- and it would be useful to assign it an
"etag" (though precedent suggests that this is not practical)
  c) Content-encoding (including delta encoding) is done
We now have an instance, that can be subject to
  d) Range extraction.
  e) Transfer encoding
  f) etc.


>(2) Applying similar changes to the I-D on instance digests
>(we need to be consistent about assigning entity tags and
>instance digests at the same point!)

I haven't read up on instance digests yet... better do that.

>(3) Creating a new message header (e.g., "DE") so that we would send:
>      HTTP/1.1 226 Delta
>      ETag: "1acl059"
>      DE: vcdiff
>      Delta-base: "337pey"
>      Date: Tue, 25 Nov 1997 18:30:05 GMT
>and a new non-terminal, e.g., delta-encoding, and changing the BNF so
>that "vcdiff", etc. are examples of delta-encoding, not content-coding.

I don't think that's sufficient -- following Koen's rules, the etag from
above would be assigned to the "vcdiff'ed" output, not to the current
instance (i.e.; to whatever vcdiff compared 337pey to). There needs to be
some way to tell the client "here's an identifier for the current
instance". For example, one could also add:

   Itag: "447pey"

Or perhaps modify the above:

    DE: vcdiff="447pey"

This would mean "a vcdiff delta encoding was applied to an un-encoded
instance which has an etag of as "447pey". 

Then, in a future request for the same resource (or for a resource in the
same uniqueness scope)  the client will know that "447pey" is a good
candidate to include in an If-none-match: Of course, the client will also
have to store the de-vcdiff'ed response as "447pey".

Actually, perhaps one could have :
  content-encoding: diff-e;"447pey",gzip     

Which might save a few bytes.  But I think I like the DE: idea better, it
makes it clear that the client should:
  a) use the entity body, and the delta-base, in a "de-vcdiff" stop
  b) cache the results of this de-vcdiff, using an entity tag of "447pey"
  c) upon re-re-request, include "447pey" in an If-None-Match
  d) 1ac1059 refers to the "difference file" (contained in the "entity
body"). It's a bit funny -- announcing the availability of an entity tag
for something that never left the server -- that clients/proxies/et-al
have to reconstruct.  But that's no worse then reconstructing an entity
from various parts.

One question: how could 1ac1059 be used in the future. It's possible that
the client could ask for two deltas --- one delta of current instance
against a commly held base instance, and  a second delta of this
first-delta against 1ac1059.   But that strikes me as overkill.   Hence,
1ac1059 is not very useful, but it  does preserve the prevailing notion of
what an etag is supposed to be.

>(4) Various changes to related text, examples, etc.

>Does this make sense to the rest of you? 
Basically, though discovering all the little sub clauses where confusion
may lurk should be fun.

> I guess I have some work cut out.  Fortunately (?) the IETF won't be publishing any new I-Ds for about
>two weeks.

I'll help where  I can.

- -----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
- -----------------------------------------------------------


------- End of Forwarded Message


From mogul  Wed Mar 22 10:48:42 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA10056; Wed, 22 Mar 2000 10:48:42 -0800 (PST)
Message-Id: <200003221848.KAA10056@wera.pa.dec.com>
To: http-delta
From: <danielh@crosslink.net>
Reply-To: danielh@crosslink.net
Original-Date: Wed, 22 Mar 2000 00:52:46 -0500
In-Reply-To: <200003212356.PAA03351@wera.pa.dec.com>
Subject: Re: A possible problem: when are etags assigned
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 
Date: Wed, 22 Mar 2000 10:48:42 -0800
Sender: mogul
X-Mts: smtp

This is an important paragraph to revise or emphasize (in delta-03 and
digest-02)

   It is convenient to think of an entity tag, in HTTP/1.1, as being
   associated with an instance, rather than an entity.  That is, for a
   given resource, two different response messages might include the
   same entity tag, but two different instances of the resource should
   never be associated with the same (strong) entity tag.

- -----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
- -----------------------------------------------------------

From mogul  Wed Mar 22 10:49:22 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA12274; Wed, 22 Mar 2000 10:49:22 -0800 (PST)
Message-Id: <200003221849.KAA12274@wera.pa.dec.com>
To: http-delta
From: <danielh@crosslink.net>
Reply-To: danielh@crosslink.net
Original-Date: Wed, 22 Mar 2000 00:56:02 -0500
In-Reply-To: <200003212356.PAA03351@wera.pa.dec.com>
Subject: Re: A possible problem: when are etags assigned
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 
Date: Wed, 22 Mar 2000 10:49:22 -0800
Sender: mogul
X-Mts: smtp

Also, from digest-02

     Note: the digest is computed before the application of any
      content-coding, because if a delta-content-coding [8] is used,
      the computation of the digest after the computation of the
      delta would not provide a digest useful for checking the
      integrity of the reassembled instance.

You might want to add:
      content-coding or any range extraction, because if a
delta-content-coding [8] is used,


-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------

From mogul  Wed Mar 22 10:49:56 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA12870; Wed, 22 Mar 2000 10:49:56 -0800 (PST)
Message-Id: <200003221849.KAA12870@wera.pa.dec.com>
To: http-delta
From: <danielh@crosslink.net>
Reply-To: danielh@crosslink.net
Original-Date: Wed, 22 Mar 2000 00:59:39 -0500
In-Reply-To: <200003212356.PAA03351@wera.pa.dec.com>
Subject: Re: A possible problem: when are etags assigned
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 
Date: Wed, 22 Mar 2000 10:49:56 -0800
Sender: mogul
X-Mts: smtp

Content-md5 is defined against the post content encoded, but pre-transfer
encoded entity. Is it defined before or after range extraction?

-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------

From mogul@pa.dec.com  Wed Mar 22 10:58:10 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA11701; Wed, 22 Mar 2000 10:58:10 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA04052; Wed, 22 Mar 2000 10:58:10 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA12112; Wed, 22 Mar 2000 10:58:10 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003221858.KAA12112@wera.pa.dec.com>
To: danielh@crosslink.net
Cc: http-delta@pa.dec.com
Subject: Re: A possible problem: when are etags assigned 
In-Reply-To: Your message of "Wed, 22 Mar 2000 10:49:56 PST."
             <200003221849.KAA12870@wera.pa.dec.com> 
Date: Wed, 22 Mar 2000 10:58:10 -0800
X-Mts: smtp

    Content-md5 is defined against the post content encoded, but
    pre-transfer encoded entity. Is it defined before or after
    range extraction?

Darned if I know.  I've repeatedly tried to make the point that
the term "entity", as defined in RFC2616, is misleading and
possibly ambiguous.  [I made the argument during the drafting
of the spec, but I was voted down.]

I believe that because the spec says that Content-MD5:
   is an MD5 digest of the entity-body for the purpose of providing an
   end-to-end message integrity check (MIC) of the entity-body. 

and because "entity-body" is used in the BNF as follows:

       message-body = entity-body
                    | <entity-body encoded as per Transfer-Encoding>

that it is strictly after range extraction (since range extraction
is end-to-end and so clearly isn't a transfer-coding).  But this
is a tenuous inference, and I'm sure people will implement it
both ways.

This makes Content-MD5 effectively useless for ensuring integrity
in the presence of ranges and delta encodings, which is why we
wrote the Digest I-D - to more carefully define a header.

I have to admit that I didn't understand this failing of Content-MD5
during the drafting of RFC2616, or else I would certainly have
used it as ammunition in my fight against the term "entity".  But
I figured it out too late.

-Jeff

From danielh@crosslink.net  Wed Mar 22 11:44:06 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA14504; Wed, 22 Mar 2000 11:44:06 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA06206; Wed, 22 Mar 2000 11:44:06 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id LAA25756
	for <http-delta@pa.dec.com>; Wed, 22 Mar 2000 11:44:05 -0800 (PST)
Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id OAA06558 for <http-delta@pa.dec.com>; Wed, 22 Mar 2000 14:44:04 -0500
Message-Id: <200003221944.OAA06558@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Wed, 22 Mar 2000 14:43:30 -0500
To: http-delta@pa.dec.com
Subject: weak etags?
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

I'm wondering if weak-etags may offer a possible solution to the 
"etag before or after delta coding" conundrum. 

I suspect not, but I don't have a deep understanding of  weak etags.


 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Wed Mar 22 11:52:18 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA25588; Wed, 22 Mar 2000 11:52:18 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA08834; Wed, 22 Mar 2000 11:52:18 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA15524; Wed, 22 Mar 2000 11:52:17 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003221952.LAA15524@wera.pa.dec.com>
To: danielh@crosslink.net
Cc: http-delta@pa.dec.com
Subject: Vcdiff
In-Reply-To: Your message of "Wed, 22 Mar 2000 10:48:01 PST."
             <200003221848.KAA10135@wera.pa.dec.com> 
Date: Wed, 22 Mar 2000 11:52:17 -0800
X-Mts: smtp

Daniel writes:
    BTW: the emphasis on vcdiff is frustrating for me -- unless the
    situation has changed, I can find no samples of a vcdiff encoder.
    At least GDIFF was easy to understand and fairly easy to implement
    (given that I had implemented Rsync independently)

We may have to re-evaluate that as the specification moves along
the IETF standards track.  A lot of us are frustrated about the
lack of available code, but since the vcdiff spec is going to
follow the IETF standards track, it will have to meet the usual
requiremnent for two independently-developed interoperable
implementations, and that (I hope) will include at least one
open-source version.

If this hasn't happened by the time that the delta spec reaches
Draft Standard status, we'll almost certainly have to remove any
dependency on vcdiff.  For now, we can view this as a lever to
"encourage" an open source implementation of vcdiff, which is
about the best known coding format on purely technical grounds.

-Jeff

From mogul@pa.dec.com  Wed Mar 22 12:06:26 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id MAA15567; Wed, 22 Mar 2000 12:06:26 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA22621; Wed, 22 Mar 2000 12:06:25 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id MAA15788; Wed, 22 Mar 2000 12:06:25 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003222006.MAA15788@wera.pa.dec.com>
To: danielh@crosslink.net
Cc: http-delta@pa.dec.com
Subject: Re: A possible problem: when are etags assigned 
In-Reply-To: Your message of "Wed, 22 Mar 2000 10:48:01 PST."
             <200003221848.KAA10135@wera.pa.dec.com> 
Date: Wed, 22 Mar 2000 12:06:25 -0800
X-Mts: smtp

    Alas, I've been laboring under some etag misconceptions.
    For example, I completely missed the part about etags being assigned
    before range extraction; which now makes sense to me
    (i.e.; it allows for range extraction of an index which can then be used
    to retrieve selected chapters; in the acrobat-selectively-
    reading-a-large-pdf sense).

Even for byte-range retrievals, the ordering (entity tag stays
constant for various different range retrievals) was obvious
almost from the start - hence the If-Range header in HTTP/1.1.
Sorry if that wasn't clear :-)
    
    The realization about content-encoding
    and etags only came when I read draft 3 and pondered the
    significance of ...geez, I can't remember just what sub-clause got me
    wondering.    Whatever, it's fortuitous that the
    latest draft happen to come around just about the time I
    was tinkering with the delta module (for reasons that had
    nothing to do with etags!)
    
I should note that this part hasn't changed since draft-00,
as far as I can remember.

    >It's quite clear that the entity tag must be assigned
    >*before* a delta content coding.  Otherwise, the entity
    >tag would be useless in deciding how to combine a delta
    >with a previous instance.
    
    Perhaps not useless, but crippled. Allow me to belabor the point,
    just to make sure...
    
    Consider the case:
      a) at 1PM, the client requests foo.html, recieves a response with an
    etag of "def"
      b) at 8PM, the client re-requests foo.html, with If-none: "def" and
    Accept-encoding: Gdiff
	 He recieves a delta-content-encoded response, with an etag of "ghi"
    If "ghi" refers to the "pre-encoded instance" from step b,  then
	c) at 9PM, the client can re-re-request foo.html, with If-none:
    "def","ghi"
	   The server then can use "ghi" (the instance used in step b),
	   which is probably more similar to the current (9PM) instance.

Right - but what if another client requests, at 8pm, foo.html
without delta-encoding, and gets the same instance as (b).  Does
that client receive "Etag: ghi" or some other entity tag?  And if
it is another entity tag, then we can have the perverse situation
where an intermediate cache is storing the same instance under
two entity tags.
	
    However, if "ghi" refers to the actual "entity body" (the difference file
    returned at 8PM), then  "ghi" is almost certainly useless as a
    base-instance

Please banish any thoughts about entity tags being associated with
"actual entity bodies"!  The word "entity" is going to give us all
headaches.
    
    >I'm not always in agreement with Koen, but this time I think he may be
    >right.
    >Consider this scenario:
    >	(1) Content author creates foo.html
    >	(2) some software does "gzip -c foo.html >foo.html.gz"
    >Should foo.html and foo.html.gz have the same entity tag?
    >On the one hand, one could argue that these two files represent identical
    >content, but one of them is encoded differently.
    
    I like that notion ... but it does require some tortured parsing of what
    an "entity" (versus an "entity body" and "entity contents")
    
Note that I strenuously avoid using the term entity, precisely because
of this confusion.

    >(B) As a practical matter, I believe that most (all?) existing servers
    >would not recognize that foo.html and foo.html.gz are different encodings
    >of the same content (for one thing, it might be computationally expensive
    >to verify this), and so it would be difficult to get these servers to
    >assign the same entity tag.
    
    That may not be a disaster -- there's nothing saying that consecutive
    responses for the same resource must have the same etag; they strongly
    SHOULD, but it's not illegal if they don't.  So if the default behavior of
    a server is to assign an etag based on file name (and date/size/whatever),
    then these would get different etags. Admittedly, this does limit how
    frequently delta encoding will succeed, but I don't see other major
    problems.
    
But delta encoding is only worth doing if it is likely to succeed
often enough to amortize the protocol overheads (and implementation
overheads).  Note that our SIGCOMM paper suggests that aggregating
references from lots of clients is an important way to improve
the performance of delta encoding, so losing this aggregation by
assigning multiple entity tags to the same instances is probably
a mistake.

    >The last approach seems cleanest, but would require a number of changes,
    >including but certainly not limited to these:
    
    >(1) Section 4 (Relationship between content-coding, transfer-coding, and
    >ranges) needs to be changed to make it clear that the instance is the
    >result of a possible content coding, not an input to it.
    
    Isn't the instance an "input" to content coding,
    whereas the entity is an "output" from content coding?
    
I think the problem is that the original HTTP/1.1 layering is:

	variant
		apply content-coding
	[unnamed thing]
		apply range selection
	entity
		apply transfer-coding
	message-body

We want to insert delta encoding in two places; one is as a
hop-by-hop transfer-coding, which still seems to be working.
The other is as an end-to-end coding, which would fit in like
this:

	variant
		apply content-coding
	[unnamed thing 1]
		apply delta encoding
	[unnamed thing 2]
		apply range selection
	entity
		apply transfer-coding
	message-body

We tried to cram end-to-end delta encoding into the content-coding
bucket, which meant that I got fuzzy about using the term "instance"
to describe "unnamed thing 1" and/or "unnamed thing 2".

One possible way to resolve this might be to add a general
"instance manipulation" layer, i.e.,

	variant
		apply content-coding
	instance
		apply instance manipulation:
			(delta encoding, range selection, etc.)
	entity
		apply transfer-coding
	message-body

And then I think it becomes clear that the "instance" is really
the output of the content-coding, contrary to what I wrote in
the I-D.  (But I tried to make it the input to the content-coding
because it has to be the input to the delta encoding, and we thought
that we could make end-to-end delta encoding a content-coding.)

Then the question arises: is it useful (or even correct) to
describe both end-to-end delta encoding and range selection
as part of the same "instance manipulation" layer?  Or is
this excess generality?  Also, we have to deal with the fact
that the headers for range support are already defined, so
it's not as if we could now glom these two things together into
a common HTTP header mechanism.  However, it's possible that
some other instance manipulations might be proposed later
(there's a paper on "cache-based compaction" which might
possibly fit in here; see 
    M. C. Chan and T. Woo.  Cache-based Compaction: A New Technique for
    Optimizing Web Transfer.  In Proc. IEEE Infocom '99, pages
    117-125.  New York, NY, March, 1999.
for more details.)

    >(3) Creating a new message header (e.g., "DE") so that we would send:
    >      HTTP/1.1 226 Delta
    >      ETag: "1acl059"
    >      DE: vcdiff
    >      Delta-base: "337pey"
    >      Date: Tue, 25 Nov 1997 18:30:05 GMT
    >and a new non-terminal, e.g., delta-encoding, and changing the BNF so
    >that "vcdiff", etc. are examples of delta-encoding, not content-coding.
    
    I don't think that's sufficient -- following Koen's rules, the etag from
    above would be assigned to the "vcdiff'ed" output, not to the current
    instance (i.e.; to whatever vcdiff compared 337pey to). There needs to be
    some way to tell the client "here's an identifier for the current
    instance".

To repeat my point: the word "entity" causes endless confusion.
An "entity tag" REALLY IS an "an identifier for the current instance."
We just got the terms wrong in RFC2616.

And Koen's rule doesn't apply if we declare that delta encoding
is NOT a content-coding, and so uses the instance as an input.
In which case I think the details are pretty straightforward.

-Jeff

From mogul@pa.dec.com  Wed Mar 22 12:11:45 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id MAA15925; Wed, 22 Mar 2000 12:11:45 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA22073; Wed, 22 Mar 2000 12:11:45 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id MAA15691; Wed, 22 Mar 2000 12:11:44 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003222011.MAA15691@wera.pa.dec.com>
To: <danielh@crosslink.net>
Cc: http-delta@pa.dec.com
Subject: Re: weak etags? 
In-Reply-To: Your message of "Wed, 22 Mar 2000 14:43:30 EST."
             <200003221944.OAA06558@lycanthrope.crosslink.net> 
Date: Wed, 22 Mar 2000 12:11:44 -0800
X-Mts: smtp

    I'm wondering if weak-etags may offer a possible solution to the 
    "etag before or after delta coding" conundrum. 
    
    I suspect not, but I don't have a deep understanding of  weak etags.
    
"Not" is correct.  Weak etags aren't much use for delta-encoding,
because you can assign the same weak etag to two different
octet-strings.  (E.g., if they are the same HTML file except for
an advertising banner URL.)

This makes delta DEcoding impossible, because if you can't be
sure about whether you have exactly the right input strings
(base instance and delta), you might generate garbage when you
combine them in the decoding phase.

-Jeff


From danielh@crosslink.net  Wed Mar 22 12:34:30 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id MAA07291; Wed, 22 Mar 2000 12:34:30 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA03036; Wed, 22 Mar 2000 12:34:29 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id MAA18579
	for <http-delta@pa.dec.com>; Wed, 22 Mar 2000 12:34:28 -0800 (PST)
Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id PAA24947 for <http-delta@pa.dec.com>; Wed, 22 Mar 2000 15:34:23 -0500
Message-Id: <200003222034.PAA24947@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Wed, 22 Mar 2000 15:32:15 -0500
To: http-delta@pa.dec.com
In-Reply-To: <200003222011.MAA15691@wera.pa.dec.com>
Subject: Re: weak etags?
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

>    I'm wondering if weak-etags may offer a possible solution to the 
>    "etag before or after delta coding" conundrum. 
>    I suspect not, but I don't have a deep understanding of  weak etags.
>    
>"Not" is correct.  Weak etags aren't much use for delta-encoding, because
>you can assign the same weak etag to two different
>octet-strings.  (E.g., if they are the same HTML file except for an
>advertising banner URL.)

Which is not the same as differences that are due
to an application of content-encoding.
So forget weak etags.






 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Wed Mar 22 13:03:35 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA17805; Wed, 22 Mar 2000 13:03:34 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA12807; Wed, 22 Mar 2000 13:03:34 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA18108; Wed, 22 Mar 2000 13:03:34 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003222103.NAA18108@wera.pa.dec.com>
To: danielh@crosslink.net
Cc: http-delta@pa.dec.com, mogul@pa.dec.com
Subject: Re: A possible problem: when are etags assigned 
In-Reply-To: Your message of "Wed, 22 Mar 2000 10:49:22 PST."
             <200003221849.KAA12274@wera.pa.dec.com> 
Date: Wed, 22 Mar 2000 13:03:34 -0800
X-Mts: smtp

Daniel writes:
    Also, from digest-02

     Note: the digest is computed before the application of any
      content-coding, because if a delta-content-coding [8] is used,
      the computation of the digest after the computation of the
      delta would not provide a digest useful for checking the
      integrity of the reassembled instance.

    You might want to add:
	  content-coding or any range extraction, because if a
    delta-content-coding [8] is used,

Actually, by clarifying things so that the "instance" is
the output of the content-coding (possibly the identity
coding), I can simplify the Digest spec by changing this
note to be something like:

     Note: the digest is computed after the application of any
      content-coding, but before the application of any
      end-to-end delta coding[8], or any range extraction.
      The computation of the digest after the computation of the delta
      or range would not provide a digest useful for checking the
      integrity of the reassembled instance.

Thanks
-Jeff

From danielh@crosslink.net  Wed Mar 22 13:35:33 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA14891; Wed, 22 Mar 2000 13:35:33 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA13715; Wed, 22 Mar 2000 13:35:33 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA24458
	for <http-delta@pa.dec.com>; Wed, 22 Mar 2000 13:35:33 -0800 (PST)
Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id QAA14083 for <http-delta@pa.dec.com>; Wed, 22 Mar 2000 16:35:32 -0500
Message-Id: <200003222135.QAA14083@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Wed, 22 Mar 2000 16:28:42 -0500
To: http-delta@pa.dec.com
Subject: what is the instance?
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

>>    Also, from digest-02
>>     Note: the digest is computed before the application of any
>>      content-coding, because if a delta-content-coding [8] is used,
>>      the computation of the digest after the computation of the
>>      delta would not provide a digest useful for checking the
>>      integrity of the reassembled instance.
>>    You might want to add:
>>	  content-coding or any range extraction, because if a
>>             delta-content-coding [8] is used,

>Actually, by clarifying things so that the "instance" is
>the output of the content-coding (possibly the identity
>coding), I can simplify the Digest spec by changing this
>note to be something like:
>     Note: the digest is computed after the application of any
>      content-coding, but before the application of any
>      end-to-end delta coding[8], or any range extraction.
>      The computation of the digest after the computation of the delta
>      or range would not provide a digest useful for checking the
>      integrity of the reassembled instance.

But that's a major redefinition of "instance", as defined by:

   instance         The entity that would be returned in a status-200
                   response to a GET request, at the current time, for
                   the selected variant of the specified resource, but
                   without the application of any content-coding or
                   transfer-coding.

Thus, if gzip is used as  a content coding, then instance is the "compressed" 
output.  And this compresssed output is basically useless when used
as a base for future differences.  And doesn't it make more sense for
the "instance tag" to ignore such transient concerns as what form
of compression (or lack of compression) was applied to the content
of interest?

So I'ld argue that "instance" should retain it's meaning; and some other
term (say, "encoded-instance") be used for the results of content-coding.


-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From danielh@crosslink.net  Wed Mar 22 13:38:14 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA04945; Wed, 22 Mar 2000 13:38:14 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA22622; Wed, 22 Mar 2000 13:38:13 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA00790
	for <http-delta@pa.dec.com>; Wed, 22 Mar 2000 13:38:13 -0800 (PST)
Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id QAA15013 for <http-delta@pa.dec.com>; Wed, 22 Mar 2000 16:38:11 -0500
Message-Id: <200003222138.QAA15013@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Wed, 22 Mar 2000 16:37:28 -0500
To: http-delta@pa.dec.com
In-Reply-To: <200003222006.MAA15788@wera.pa.dec.com>
Subject: Re: A possible problem: when are etags assigned
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

>    For example, I completely missed the part about etags being assigned
>    before range extraction; which now makes sense to me.....
>Even for byte-range retrievals, the ordering (entity tag stays constant
>for various different range retrievals) was obvious almost from the start
>- hence the If-Range header in HTTP/1.1. Sorry if that wasn't clear :-)

That's why I like to see RFC's err on the side of clarifying 
the seemingly obvious (though this does lead to bulkier documents)

>>>It's quite clear that the entity tag must be assigned
>>>*before* a delta content coding.  Otherwise, the entity
>>>tag would be useless in deciding how to combine a delta
>> >with a previous instance.
>>    Perhaps not useless, but crippled. Allow me to belabor the point,
>>    just to make sure...
   
>> Consider the case:
>>      a) at 1PM, the client requests foo.html, recieves a response with an
>>         etag of "def"
>>      b) at 8PM, the client re-requests foo.html, with If-none: "def" and
>>          Accept-encoding: Gdiff
>>He recieves a delta-content-encoded response, with an etag of "ghi"
>> If "ghi" refers to the "pre-encoded instance" from step b,  then
>>	c) at 9PM, the client can re-re-request foo.html, with If-none:
>>             "def","ghi"
>> The server then can use "ghi" (the instance used in step b),
>>  which is probably more similar to the current (9PM) instance.

>Right - but what if another client requests, at 8pm, foo.html without
>delta-encoding, and gets the same instance as (b).  Does that client
>receive "Etag: ghi" or some other entity tag?  And if it is another
>entity tag, then we can have the perverse situation where an intermediate
>cache is storing the same instance under two entity tags.

As currently structured, the non-encoded response (to client 2) should get
etag: "ghi".  An intermediate cache will then store the response to the
first client (the difference file) as "333pey", and the response to the second client 
(the unenocded stuff) as "ghi".   Note that a smart intermediate cache, that
happened to have a copy of "abc", should be able to use "333pey) to
generate a copy of "ghi"

>    However, if "ghi" refers to the actual "entity body" (the difference
>   file  returned at 8PM), then  "ghi" is almost certainly useless as a
>    base-instance
>Please banish any thoughts about entity tags being associated with
>"actual entity bodies"!  The word "entity" is going to give us all
>headaches.

Sounds good to me, but it's going to be a constant headache explaining it to others --
it's such an obvious conlusion to jump to!
    
>>>(B) As a practical matter, I believe that most (all?) existing servers
>>>would not recognize that foo.html and foo.html.gz are different ...
>    
>>    That may not be a disaster -- there's nothing saying that consecutive
>>    responses for the same resource must have the same etag; they
>>   strongly  SHOULD, but it's not illegal if they don't.  So if the default  ...
>But delta encoding is only worth doing if it is likely to succeed often
>enough to amortize the protocol overheads (and implementation overheads). 
>Note that our SIGCOMM paper suggests that aggregating references from
>lots of clients is an important way to improve the performance of delta
>encoding, so losing this aggregation by assigning multiple entity tags to
>the same instances is probably a mistake.

I  agree -- I just was throwing up a possible pallative, one that I'm happy to see
shot down.
   
>>    Isn't the instance an "input" to content coding,
>>    whereas the entity is an "output" from content coding?
>    
>I think the problem is that the original HTTP/1.1 layering is:
>	variant
>		apply content-coding
>	[unnamed thing]
Also   ..
                          assign etag
>		apply range selection
>	entity
>		apply transfer-coding
>	message-body


>We want to insert delta encoding in two places; one is as a
>hop-by-hop transfer-coding, which still seems to be working. The other is
>as an end-to-end coding, which would fit in like this:

>	variant
>		apply content-coding
>	[unnamed thing 1]
>		apply delta encoding
>	[unnamed thing 2]
>		apply range selection
>	entity
>		apply transfer-coding
>	message-body
>We tried to cram end-to-end delta encoding into the content-coding
>bucket, which meant that I got fuzzy about using the term "instance" to
>describe "unnamed thing 1" and/or "unnamed thing 2".

What about when the order is  gdiff,gzip --  one first does a delta
encoding, and then a more traditional content encoding.  This order is
much more likely to be useful then gzip,gdiff (gzip on the "snapshot", followed by
gdiff of this compressed "file" on something held in common by server and client).

>One possible way to resolve this might be to add a general
>"instance manipulation" layer, i.e.,

>	variant
>		apply content-coding
>	instance
>		apply instance manipulation:
>			(delta encoding, range selection, etc.)
>	entity
>		apply transfer-coding
>	message-body
>And then I think it becomes clear that the "instance" is really the
>output of the content-coding, contrary to what I wrote in the I-D.

This seems to contradict:

   instance         The entity that would be returned in a status-200
                   response to a GET request, at the current time, for
                   the selected variant of the specified resource, but
                   without the application of any content-coding or
                   transfer-coding.

That is,  the "instance" is what exists BEFORE content encoding
of any kind.  

Am I confused? If so, what name should be given to 

       a <em>snapshot</em> in the life of a resource.

>I tried to make it the input to the content-coding because it has to be
>the input to the delta encoding, and we thought that we could make
>end-to-end delta encoding a content-coding.)

>>>(3) Creating a new message header (e.g., "DE") so that we would send:
>>>      HTTP/1.1 226 Delta
>>>      ETag: "1acl059"
>>>      DE: vcdiff
>>>      Delta-base: "337pey"
>>>      Date: Tue, 25 Nov 1997 18:30:05 GMT
>>>and a new non-terminal, e.g., delta-encoding, and changing the BNF so
>>>that "vcdiff", etc. are examples of delta-encoding, not content-coding.
>    
>>    I don't think that's sufficient -- following Koen's rules, the etag from
>>    above would be assigned to the "vcdiff'ed" output, not to the current
>>    instance (i.e.; to whatever vcdiff compared 337pey to). There needs to be
>>    some way to tell the client "here's an identifier for the current
>>    instance".
In the sense of: "I am retaining a full, un-encoded copy of the current instance,
and you can refer to it using the following identifier"

>To repeat my point: the word "entity" causes endless confusion. An
>"entity tag" REALLY IS an "an identifier for the current instance." We
>just got the terms wrong in RFC2616.

Again, how are we defining "current instance" -- is it before or after old-fashioned (i.e.; GZIP)
content encoding. If so, then the "entity tag" an identifier of the current instance. If not, then "entity
tag" does NOT identify the current instance.

I'm agnostic on recieved terminology (having been a marginal contributor to rfc2616). 
But I do like the notion of the "current instance" as meaning "preencoded" 
-- the body of a response that you would send if you  had instanteous communication and
 slow processing (though how headers fit into this puzzle requires some thought).

Which means that  terms are  needed for
a) what is produced by a differencing the current instance against a commonly held instance
b) what is produced by content-encoding a "current instance", or by content encoding the
   results of a differencing
c) what is produced by range extraction

>And Koen's rule doesn't apply if we declare that delta encoding is NOT a
>content-coding, and so uses the instance as an input. In which case I
>think the details are pretty straightforward.

Koen's rule:  "etag is assigned after content encoding, but before range extraction".

But where would delta occur -- before or after content coding. Should that be
flexible (say, based on order of appearance in accept-encoding), or should it be dictated 
to always occur before content coding (which would simplify implementation)

 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Wed Mar 22 15:11:04 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA23436; Wed, 22 Mar 2000 15:11:04 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA14258; Wed, 22 Mar 2000 15:11:04 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA23720; Wed, 22 Mar 2000 15:11:03 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003222311.PAA23720@wera.pa.dec.com>
To: <danielh@crosslink.net>
Cc: http-delta@pa.dec.com
Subject: Re: what is the instance? 
In-Reply-To: Your message of "Wed, 22 Mar 2000 16:28:42 EST."
             <200003222135.QAA14083@lycanthrope.crosslink.net> 
Date: Wed, 22 Mar 2000 15:11:03 -0800
X-Mts: smtp

I wrote:
>Actually, by clarifying things so that the "instance" is
>the output of the content-coding (possibly the identity
>coding), I can simplify the Digest spec by changing this
>note to be something like:
>     Note: the digest is computed after the application of any
>      content-coding, but before the application of any
>      end-to-end delta coding[8], or any range extraction.
>      The computation of the digest after the computation of the delta
>      or range would not provide a digest useful for checking the
>      integrity of the reassembled instance.

Daniel wrote:
   But that's a major redefinition of "instance", as defined by:

      instance     The entity that would be returned in a status-200
                   response to a GET request, at the current time, for
                   the selected variant of the specified resource, but
                   without the application of any content-coding or
                   transfer-coding.

Correct.  So I need to change that definition to

      instance     The entity that would be returned in a status-200
                   response to a GET request, at the current time, for
                   the selected variant of the specified resource,
		   with the application of zero or more content-codings,
		   but without the application of any end-to-end
		   delta-encoding, range selection, or transfer-coding.

Or perhaps replace "any end-to-end delta-encoding, range selection,"
with "any instance manipulation".

    Thus, if gzip is used as  a content coding, then instance is the
    "compressed" output.  And this compresssed output is basically
    useless when used as a base for future differences.

I'm not sure I understand this.  If you are taking the delta
between two versions (instances) of foo.html.gz, it should work.
What wouldn't work is if you wanted to cache only the uncompressed
representation - but content-coding is end-to-end, and so caches
aren't supposed to store the decoded version unless they can
transparently restore the coding (or store a second, encoded
copy as well).

    And doesn't it
    make more sense for the "instance tag" to ignore such transient
    concerns as what form of compression (or lack of compression) was
    applied to the content of interest?

It depends what you mean by "transient".  Transfer-codings are
definitely transient, but as a practical matter, many servers
implement Content-codings by storing the coded version, not the
plaintext.   Which makes these not so transient.

However, one could certainly argue that an origin server should
be able to generate either a compressed content-coding of a
resource (for a cache's first retrieval), or an end-to-end
delta encoded representation (for a second retrieval).  And
since (as far as I know) differencing algorithms such as vdelta
don't do as well if the inputs are compressed, what you might
really want is a form of "vcdiff" where both the encoder
and the decoder decompress the base instance before doing
the differencing or reconstitution.  But then we need a protocol
syntax to specify that the receiver needs to do this step.

The problem is made trickier because *in theory* there are
potentially two kinds of Content-codings: those that are
effectively compressions (and so one might want to remove
them before computing a delta), and those that don't
interfere with computing a delta, and so don't need to be
removed.  In practice, no content-coding of the latter
class has yet been defined.

    So I'ld argue that "instance" should retain it's meaning; and some
    other term (say, "encoded-instance") be used for the results of
    content-coding.

Well, this begs the question of what to associate the entity
tag with.  Is it the instance or the content-coded-instance?
I don't think it's feasible to modify the entity tag mechanisms
already in RFC2616!

I'll think about this some more.  Any suggestions from other
people.

-Jeff

From danielh@crosslink.net  Wed Mar 22 19:57:32 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id TAA01710; Wed, 22 Mar 2000 19:57:32 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA21671; Wed, 22 Mar 2000 19:57:32 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id TAA26973
	for <http-delta@pa.dec.com>; Wed, 22 Mar 2000 19:57:31 -0800 (PST)
Received: from smtp.crosslink.net (dyn37.c5200-1.springfield.236.crosslink.net [207.199.142.38]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id WAA28060 for <http-delta@pa.dec.com>; Wed, 22 Mar 2000 22:57:28 -0500
Message-Id: <200003230357.WAA28060@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Wed, 22 Mar 2000 22:54:03 -0500
To: http-delta@pa.dec.com
Subject: What is the instance?
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 


Jeff wrote
>  So I need to change that definition to
>      instance     The entity that would be returned in a status-200
>                   response to a GET request, at the current time, for
>                   the selected variant of the specified resource,
>		   with the application of zero or more content-codings,
>		   but without the application of any end-to-end
>		   delta-encoding, range selection, or transfer-coding.

This needs careful thought -- it's a big change from the prior meaning  of
instance (as the content BEFORE any content coding).   More below on this
...


>>    Thus, if gzip is used as  a content coding, then instance is the
>>    "compressed" output.  And this compresssed output is basically
>>    useless when used as a base for future differences.
>I'm not sure I understand this.  If you are taking the delta between two
>versions (instances) of foo.html.gz, it should work. What wouldn't work
>is if you wanted to cache only the uncompressed representation 
>- but content-coding is end-to-end, and so caches aren't supposed to store the
>decoded version unless they can transparently restore the coding (or
>store a second, encoded copy as well).

The "useless" refers to the difficulty (as Jeff notes below) of getting a 
useful delta between two gzipped (or otherwise compressed) versions of 
nearly the same thing.  The use of a common denominator
(of both client and server caching decoded versions) makes it easier to
generate deltas in the future.  

Given this, some identifier is needed for these un-encoded versions; 
something similar to etag.  The trick is for the server to send two tags,
a standard "etag" for the  content contained in the current response (such
as a difference file, a gzipped file,  a range of either of these, etc.),
and an "itag" (or "o-etag"?) that identifies  the  original (un encoded &
un  differenced) content, a copy of which the server is presumably
committed to retaining for awhile.

And I don't see why this "wouldn't work".
Yes, this does complicate matters for an intermediate cache -- 
it should retain the response as sent (that is, encoded)
identified by the "etag", and also a decoded version identified by the "itag". 
This allows the cache to perform a delta. However, if the cache does not attempt to
save a decoded version, you are no worse off (since the "itag" does not match
the encoded version, it will pass through a request that contains an 
If-None: "itag_value" request header)

>However, one could certainly argue that an origin server should be able
>to generate either a compressed content-coding of a resource (for a
>cache's first retrieval), or an end-to-end delta encoded representation
>(for a second retrieval).  And since (as far as I know) differencing
>algorithms such as vdelta don't do as well if the inputs are compressed,
>what you might really want is a form of "vcdiff" where both the encoder
>and the decoder decompress the base instance before doing
>the differencing or reconstitution.  But then we need a protocol syntax
>to specify that the receiver needs to do this step.

>The problem is made trickier because *in theory* there are
>potentially two kinds of Content-codings: those that are
>effectively compressions (and so one might want to remove
>them before computing a delta), and those that don't
>interfere with computing a delta, and so don't need to be
>removed.  In practice, no content-coding of the latter
>class has yet been defined.

Hence my point about use of un-encoded versions as the "common denominator"


>    So I'ld argue that "instance" should retain it's meaning; and some
>    other term (say, "encoded-instance") be used for the results of
>    content-coding.

>Well, this begs the question of what to associate the entity tag with. 
>Is it the instance or the content-coded-instance? I don't think it's
>feasible to modify the entity tag mechanisms already in RFC2616!

It seems we're stuck with the etag being assigned after content-coding.
Hence my idea of an "itag", which is really an etag for something (I want
to say an entity, but I'm afraid to) that was never sent, but which
knowledgable parties can readily create.

>I'll think about this some more.  Any suggestions from other people.

!

-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------





-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Thu Mar 23 11:32:36 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA29071; Thu, 23 Mar 2000 11:32:36 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA06686; Thu, 23 Mar 2000 11:32:36 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA31495; Thu, 23 Mar 2000 11:32:36 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003231932.LAA31495@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: What is the instance? 
In-Reply-To: Your message of "Wed, 22 Mar 2000 22:54:03 EST."
             <200003230357.WAA28060@lycanthrope.crosslink.net> 
Date: Thu, 23 Mar 2000 11:32:36 -0800
X-Mts: smtp

 <danielh@crosslink.net> writes:
    Jeff wrote
    >  So I need to change that definition to
    >      instance     The entity that would be returned in a status-200
    >                   response to a GET request, at the current time, for
    >                   the selected variant of the specified resource,
    >		   with the application of zero or more content-codings,
    >		   but without the application of any end-to-end
    >		   delta-encoding, range selection, or transfer-coding.
    
    This needs careful thought -- it's a big change from the prior
    meaning  of instance (as the content BEFORE any content coding).

Right - but I invented the definition of "instance" as part of
the design of the delta encoding specification.  I don't think
anyone else has based any designs on this definition.

So if the definition is wrong, then I think it's appropriate to
change it.

    Given this, some identifier is needed for these un-encoded versions; 
    something similar to etag.  The trick is for the server to send two tags,
    a standard "etag" for the  content contained in the current response (such
    as a difference file, a gzipped file,  a range of either of these, etc.),
    and an "itag" (or "o-etag"?) that identifies  the  original (un encoded &
    un  differenced) content, a copy of which the server is presumably
    committed to retaining for awhile.
    
I think adding another identifier is a major increase in complexity.
It seems to me that if we could avoid this step, without having to
do something ugly, it would be worth something.  I'm still trying to
think this through, though.

-Jeff


From danielh@crosslink.net  Thu Mar 23 12:24:25 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id MAA01169; Thu, 23 Mar 2000 12:24:24 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA32012; Thu, 23 Mar 2000 12:24:24 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id MAA17251
	for <http-delta@pa.dec.com>; Thu, 23 Mar 2000 12:24:23 -0800 (PST)
Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id PAA16902 for <http-delta@pa.dec.com>; Thu, 23 Mar 2000 15:24:18 -0500
Message-Id: <200003232024.PAA16902@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Thu, 23 Mar 2000 14:49:25 -0500
To: http-delta@pa.dec.com
In-Reply-To: <200003231932.LAA31495@wera.pa.dec.com>
Subject: Instances again
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

Jeff wrote:
>>>  So I need to change that definition to
>> >      instance     The entity that would be returned in a status-200
>> >                   response to a GET request, at the current time, for
>> >                   the selected variant of the specified resource,
>> >		   with the application of zero or more content-codings,
>> >		   but without the application of any end-to-end
>> >		   delta-encoding, range selection, or transfer-coding.

Daniel wrote:   
>>    This needs careful thought -- it's a big change from the prior
>>    meaning  of instance (as the content BEFORE any content coding).

>Right - but I invented the definition of "instance" as part of the design
>of the delta encoding specification.  I don't think anyone else has based
>any designs on this definition.
>So if the definition is wrong, then I think it's appropriate to change
>it.
Of course. However, I am arguing that the  prior meaning should be retained.

Daniel wrote:
>>    Given this, some identifier is needed for these un-encoded versions; 
>>    something similar to etag.  The trick is for the server to send two
>>    tags,  a standard "etag" for the  content contained in the current response
>>    (such  as a difference file, a gzipped file,  a range of either of these, etc.),
>>    and an "itag" (or "o-etag"?) that identifies  the  original (unencoded &
>>    un  differenced) content, a copy of which the server is presumably
>>    committed to retaining for awhile.
>    
>I think adding another identifier is a major increase in complexity. It
>seems to me that if we could avoid this step, without having to do
>something ugly, it would be worth something.  I'm still trying to think
>this through, though.

I don't see it as being that big an increase in complexity.  And I don't see any 
way around the need for some way  for the client & server to idenify a 
"snapshot of the resource" that can be used for future deltas.
And there will be plenty of cases (i.e.; when a content coding has 
been applied) in which the etag just won't work.

Consider a scenario: when a server maintains a set of "pre encoded"
versions -- eg; foo.html and fool.htm.gz are both available. Then, a request of 

     GET /foo.html HTTP/1.1
     Accept-Encoding: vcdiff

would cause the server to send foo.html using

      HTTP/1.1 200 Okay
      Etag: "abc0"
      Date: Tue, 15 Mar 2000 18:30:05 GMT
      Content-Length: 2000

In contrast, a request of
     GET /foo.html HTTP/1.1
     Accept-Encoding: diff-e, gzip
   
would cause the server to return foo.htm.gz:
      HTTP/1.1 200 Okay
      Etag: "abcG"
      O-etag: "abcO"
      Date: Tue, 15 Mar 2000 18:30:08 GMT
      Content-Length: 1251
      Content-Encoding: gzip

The only complication is that the server must assign two tags -- one
for foo.html.gz, and one for foo.html.  But that's a minor hassle, given that
the server has to know that foo.html.gz is the "gzip encoding" of foo.html.

Note that the, upon reciept of this,  should "unGzip's" the response, and cache both
the response (identified with the "abcG" etag") and this unGzipped version (identified
with "abc0").

If, an hour later, the client wants a new version, he could use:

     GET /foo.html HTTP/1.1
     Accept-Encoding: vcdiff, gzip
     If-None-Match: "abc0"

Note the O-etag (or "itag", or whatever) header would only be added when there is evidence that
the  client is delta aware (say, because vcdiff is included in accept-encoding). Alternatively,
special etags could be sent -- say "abcG;;abc0" -- the first semi-colon used to indicate 
a variant-list validator, and the second to indicate the "un content encoded instance" tag.
Well, this might break some cache's abilities to do content negotiation, but perhaps there is
some other syntax that would work.




 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From avh@marimba.com  Thu Mar 23 12:30:51 2000
Return-Path: <avh@marimba.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id MAA01518; Thu, 23 Mar 2000 12:30:50 -0800 (PST)
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA14909; Thu, 23 Mar 2000 12:30:50 -0800
Received: from cobra.marimba.com (acheron.marimba.com [207.126.123.64])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id MAA22660;
	Thu, 23 Mar 2000 12:30:50 -0800 (PST)
Received: by cobra with Internet Mail Service (5.5.2650.21)
	id <HM5XKNKS>; Thu, 23 Mar 2000 12:28:34 -0800
Message-Id: <68C8F96D4999D311B0550008C71AA8AFA4D74F@cobra>
From: Arthur van Hoff <avh@marimba.com>
To: "'Jeffrey Mogul'" <mogul@pa.dec.com>, http-delta@pa.dec.com
Subject: RE: What is the instance? 
Date: Thu, 23 Mar 2000 12:28:33 -0800
Mime-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
Content-Type: text/plain;
	charset="iso-8859-1"


Hi Jeff,

Here are my 2 cents. In our products we use delta 
encoding purely based on MD5 checksums. This could 
reduce the confusion somewhat. In my opion a delta 
should be specified  to be the delta between two clearly
identified versions of an original file/resource/instance. 
This is easily and unambigously done using checksums instead 
of etags. I've never really liked this use of etags because they
have such a confusing definition. Would it be possible to
use the instance digest header from draft-mogul-http-digest-02.txt
instead of inventing additional headers besides etag?

Have fun,

	Arthur van Hoff


> -----Original Message-----
> From: Jeffrey Mogul [mailto:mogul@pa.dec.com]
> Sent: Thursday, March 23, 2000 11:33 AM
> To: http-delta@pa.dec.com
> Subject: Re: What is the instance? 
> 
> 
>  <danielh@crosslink.net> writes:
>     Jeff wrote
>     >  So I need to change that definition to
>     >      instance     The entity that would be returned in 
> a status-200
>     >                   response to a GET request, at the 
> current time, for
>     >                   the selected variant of the specified 
> resource,
>     >		   with the application of zero or more content-codings,
>     >		   but without the application of any end-to-end
>     >		   delta-encoding, range selection, or transfer-coding.
>     
>     This needs careful thought -- it's a big change from the prior
>     meaning  of instance (as the content BEFORE any content coding).
> 
> Right - but I invented the definition of "instance" as part of
> the design of the delta encoding specification.  I don't think
> anyone else has based any designs on this definition.
> 
> So if the definition is wrong, then I think it's appropriate to
> change it.
> 
>     Given this, some identifier is needed for these 
> un-encoded versions; 
>     something similar to etag.  The trick is for the server 
> to send two tags,
>     a standard "etag" for the  content contained in the 
> current response (such
>     as a difference file, a gzipped file,  a range of either 
> of these, etc.),
>     and an "itag" (or "o-etag"?) that identifies  the  
> original (un encoded &
>     un  differenced) content, a copy of which the server is presumably
>     committed to retaining for awhile.
>     
> I think adding another identifier is a major increase in complexity.
> It seems to me that if we could avoid this step, without having to
> do something ugly, it would be worth something.  I'm still trying to
> think this through, though.
> 
> -Jeff
> 

From danielh@crosslink.net  Thu Mar 23 13:15:24 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA01930; Thu, 23 Mar 2000 13:15:24 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA00769; Thu, 23 Mar 2000 13:15:24 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA24219
	for <http-delta@pa.dec.com>; Thu, 23 Mar 2000 13:15:23 -0800 (PST)
Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id QAA01529 for <http-delta@pa.dec.com>; Thu, 23 Mar 2000 16:15:22 -0500
Message-Id: <200003232115.QAA01529@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Thu, 23 Mar 2000 16:09:08 -0500
To: http-delta@pa.dec.com
In-Reply-To: <68C8F96D4999D311B0550008C71AA8AFA4D74F@cobra>
Subject: RE: What is the instance?
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

Athur wrote:
>Here are my 2 cents. In our products we use delta 
>encoding purely based on MD5 checksums. This could 
>reduce the confusion somewhat. In my opion a delta 
>should be specified  to be the delta between two clearly
>identified versions of an original file/resource/instance. 
>This is easily and unambigously done using checksums instead  of etags.
>I've never really liked this use of etags because they have such a
>confusing definition. Would it be possible to use the instance digest
>header from draft-mogul-http-digest-02.txt instead of inventing
>additional headers besides etag?

But the content-md5 header is defined on the post content-encoded (and
pre transfer-encoded) contents of a response (dare I say entity body). 
Thus, it suffers from the same problem as the etag -- it does not
identify the original file/resource/instance.

So, are you recommending the value of the instance digest header be used as
an "instance tag"? Which would mean that the instance digest header would
become an inseperable part of delta encoding.



-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From avh@marimba.com  Thu Mar 23 13:50:19 2000
Return-Path: <avh@marimba.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA05779; Thu, 23 Mar 2000 13:50:19 -0800 (PST)
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA20235; Thu, 23 Mar 2000 13:50:18 -0800
Received: from cobra.marimba.com (acheron.marimba.com [207.126.123.64])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA29118
	for <http-delta@pa.dec.com>; Thu, 23 Mar 2000 13:50:18 -0800 (PST)
Received: by cobra with Internet Mail Service (5.5.2650.21)
	id <HM5XKNT7>; Thu, 23 Mar 2000 13:48:03 -0800
Message-Id: <68C8F96D4999D311B0550008C71AA8AFA4D751@cobra>
From: Arthur van Hoff <avh@marimba.com>
To: "'danielh@crosslink.net'" <danielh@crosslink.net>, http-delta@pa.dec.com
Subject: RE: What is the instance?
Date: Thu, 23 Mar 2000 13:48:02 -0800
Mime-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
Content-Type: text/plain;
	charset="iso-8859-1"


Hi Daniel,

In our products there is no content-encoding as such,
only delta-encoding and transfer-encoding. Delta encoding
is done first, following by optional compression. We do support
range requests, but not combined with delta-encoding, that got
too complicated too quickly. In our case the content-encoding 
of the resource is implied by a mime type, and as a result
we can use the MD5 which was computed over the content-encoded
resource that is ultimately being transmitted. 

Jeff wrote a while back:
> Consider this scenario:
>	(1) Content author creates foo.html
>	(2) some software does "gzip -c foo.html >foo.html.gz"
>
> Should foo.html and foo.html.gz have the same entity tag?

In our case we assume that content author creates foo.html
or foo.html.gz, but that there is no further automatic content
encoding. In that scenario there is only one original resource,
either foo.html or foo.html.gz, and thus they have different 
checksums. 

Now this takes me back to Jeff's original comment:

> So it looks like we have a contradiction: the entity tag must
> be assigned before a delta content-coding, but after content-codings
> in general.  Ouch.

It appears that we need to define delta-encoding as something
which happens after content-encoding and before transfer-encoding.
If that is the case, then could we use the content-md5 header?

Have fun,

	Arthur van Hoff



> -----Original Message-----
> From: danielh@crosslink.net [mailto:danielh@crosslink.net]
> Sent: Thursday, March 23, 2000 1:09 PM
> To: http-delta@pa.dec.com
> Subject: RE: What is the instance?
> 
> 
> Athur wrote:
> >Here are my 2 cents. In our products we use delta 
> >encoding purely based on MD5 checksums. This could 
> >reduce the confusion somewhat. In my opion a delta 
> >should be specified  to be the delta between two clearly
> >identified versions of an original file/resource/instance. 
> >This is easily and unambigously done using checksums instead 
>  of etags.
> >I've never really liked this use of etags because they have such a
> >confusing definition. Would it be possible to use the instance digest
> >header from draft-mogul-http-digest-02.txt instead of inventing
> >additional headers besides etag?
> 
> But the content-md5 header is defined on the post content-encoded (and
> pre transfer-encoded) contents of a response (dare I say 
> entity body). 
> Thus, it suffers from the same problem as the etag -- it does not
> identify the original file/resource/instance.
> 
> So, are you recommending the value of the instance digest 
> header be used as
> an "instance tag"? Which would mean that the instance digest 
> header would
> become an inseperable part of delta encoding.
> 
> 
> 
> -----------------------------------------------------------
> Daniel Hellerstein
> danielh@crosslink.net
> http://www.srehttp.org
> -----------------------------------------------------------
> 

From mogul@pa.dec.com  Thu Mar 23 15:05:05 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA07374; Thu, 23 Mar 2000 15:05:05 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA01977; Thu, 23 Mar 2000 15:05:05 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA06634; Thu, 23 Mar 2000 15:05:05 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003232305.PAA06634@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: What is the instance? 
In-Reply-To: Your message of "Thu, 23 Mar 2000 12:28:33 PST."
             <68C8F96D4999D311B0550008C71AA8AFA4D74F@cobra> 
Date: Thu, 23 Mar 2000 15:05:05 -0800
X-Mts: smtp

Arthur van Hoff <avh@marimba.com> writes:

    Here are my 2 cents. In our products we use delta encoding
    purely based on MD5 checksums. This could reduce the
    confusion somewhat. In my opion a delta should be specified
    to be the delta between two clearly identified versions of an
    original file/resource/instance.  This is easily and
    unambigously done using checksums instead of etags. I've
    never really liked this use of etags because they have such a
    confusing definition. Would it be possible to use the
    instance digest header from draft-mogul-http-digest-02.txt
    instead of inventing additional headers besides etag?

First of all, I don't think there's a need to invent extra
identification headers - I'm working up a set of scenarios to
show how things should work without that, but I'm not sure I'll
finish that today.

Second, if you think about it, the "entity tag" carried in the
ETag: header is really an "instance tag".  That is, a unique
(strong) entity tag is associated with a unique instance.  But
the instance digest has to be assigned at precisely the same
point in the processing pipeline, because it's also unique per
instance.

So if you are computing an MD5 digest to produce a Digest:
header, then you can use the same string as the entity tag.
However, there is no requirement that these strings be the same.

In short, you can use essentially the mechanism you are already
using in your product - compute an MD5 digest of the instance -
and stick it in the ETag: header.  If you want to add the ability
to do an end-to-end integrity check, you would need to send the
same string in a Digest: header, because a client is required to
treat the ETag: header value as an opaque value (i.e., it cannot
assume that this value is a digest of the instance!)  Which means
that there is a slight inefficiency, with the protocol headers
carrying the same string twice.

On the other hand, an existing server that uses some other scheme
to construct entity tags (for example, I believe that IIS/5.0
isn't using an MD5 value, since their ETag is apparently a
20-nibble hex encoding of an 80-bit value) would not have to
modify its entity-tag creation code in order to support delta
encoding.  I.e., it's certainly not required to use an MD5 digest
as the entity tag.

Daniel Hellerstein adds:
    But the content-md5 header is defined on the post
    content-encoded (and pre transfer-encoded) contents of a
    response (dare I say entity body).  Thus, it suffers from the
    same problem as the etag -- it does not identify the original
    file/resource/instance.

That's a little confused.  The Content-MD5 header does provide a
digest of the "entity-body", not of the instance, so you're right
that it isn't sufficient for our purposes.  However, I'm pretty
sure that Arthur was referring to a "Digest:" header digest, not
a "Content-MD5:" header digest - and we defined the "Digest:"
header to be an instance digest, specifically for this purpose.

Remember, just because it is called an "entity tag" does not mean
that it has any actual connection with the "entity-body"!  (I
must be getting rather tiresome on this point by now.)

So the "entity tag", which is really an "instance tag", is indeed
the right thing for identifying an instance.  But so is an
instance digest - however, I think we want to continue to define
the delta encoding protocol in terms of entity tags, so that
servers aren't required to send both the entity tag and the
instance digest.

-Jeff

From mogul@pa.dec.com  Thu Mar 23 15:10:53 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA28283; Thu, 23 Mar 2000 15:10:53 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA23062; Thu, 23 Mar 2000 15:10:53 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA06595; Thu, 23 Mar 2000 15:10:53 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003232310.PAA06595@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: What is the instance? 
In-Reply-To: Your message of "Thu, 23 Mar 2000 13:48:02 PST."
             <68C8F96D4999D311B0550008C71AA8AFA4D751@cobra> 
Date: Thu, 23 Mar 2000 15:10:53 -0800
X-Mts: smtp

Arthur van Hoff <avh@marimba.com> writes:

    In our products there is no content-encoding as such,
    only delta-encoding and transfer-encoding. Delta encoding
    is done first, following by optional compression.

Am I correct in assuming that this optional compression is
conceptually part of the delta encoding, rather than a separate
coding step?  I.e., something like the output of the pipeline
	diff -e | gzip
or like the vcdiff format, which inherently combines compression
with the delta encoding?

    In our case we assume that content author creates foo.html
    or foo.html.gz, but that there is no further automatic content
    encoding. In that scenario there is only one original resource,
    either foo.html or foo.html.gz, and thus they have different 
    checksums. 

This is consistent with treating the output of the content-coding
as the instance - i.e., the point at which the entity tag is
assigned and the instance digest is computed.  In some cases
(e.g., foo.html) the content-coding is the identity transformation.
    
    Now this takes me back to Jeff's original comment:
    
    > So it looks like we have a contradiction: the entity tag must
    > be assigned before a delta content-coding, but after content-codings
    > in general.  Ouch.
    
    It appears that we need to define delta-encoding as something
    which happens after content-encoding and before transfer-encoding.

Right.  I'm leaning towards calling this the instance-manipulation
step, which seems like it might include other things besides delta
encoding.  Perhaps it conceptually includes range selection, although
we can't actually touch this because it's already specified in
HTTP/1.1.

    If that is the case, then could we use the content-md5 header?
    
No, for the reasons I gave in the previous message.

-Jeff

From danielh@crosslink.net  Thu Mar 23 15:27:51 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA09221; Thu, 23 Mar 2000 15:27:51 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA04170; Thu, 23 Mar 2000 15:27:51 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id PAA29260
	for <http-delta@pa.dec.com>; Thu, 23 Mar 2000 15:27:50 -0800 (PST)
Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id SAA10524 for <http-delta@pa.dec.com>; Thu, 23 Mar 2000 18:27:45 -0500
Message-Id: <200003232327.SAA10524@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Thu, 23 Mar 2000 18:19:35 -0500
To: http-delta@pa.dec.com
Subject: instance redux
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 


Arthur said ...
>In our case we assume that content author creates foo.html
>or foo.html.gz, but that there is no further automatic content encoding.
>In that scenario there is only one original resource, either foo.html or
>foo.html.gz, and thus they have different  checksums. 
>....
>It appears that we need to define delta-encoding as something 
>which happens after content-encoding and before transfer-encoding. 
<If that is the case, then could we use the content-md5 header?


But that's equivalent to the etag -- the etag is also defined after 
content-encoding,but before transfer encoding (and before range 
extraction)  [this belabors jeff's point] .

The problem isn't that the etag may not be globally unique, it's that if 
further content-coding occurs (in particular, compression)  it identifies 
something (the "compressed" stuff that forms the response body) that may
be of little use  to future delta-aware clients.   

                -----------------------------

I've been advocating a "two tag" solution, but there is an alternative:
    
    whenever a delta is asked for an instance that was content-encoded, the
    server should compute the delta against it's decoded version
    of the instance, and the client should apply the resulting difference to
    it's copy of the decoded instance.

This requires that the server and client:
  a) can identify which etags come from responses that 
     have been content-encoded
  b) use decoded instances to perform deltas WHENEVER this sort of etag 
     is used in a delta enabled request,
  c) use the encoded instance when these etags are used in a plain vanilla 
     conditional GET.


Consider the following sequence:

1) Client requests:
    GET foo.html http/1.1
    Accept-encoding: diff-e,gzip

2) Server finds the "foo.html" file, on-the-fly GZIP's it,
assigns an etag, and sends this gzip'ed file back with the following:
    HTTP/1.1 200 OK
    Date: Wed, 14 Mar 2000 14:00:00 GMT
    Content-Encoding: gzip
    Etag: "gz_abc"
 
3) After the client unGZIP's the response, it saves the results
   to foo.html.ver1.
   A while later, the client requests:
    GET foo.html http/1.1
    Accept-encoding: diff-e,gzip
    If-None-Match: "gz_abc"

4) The server again finds foo.htm, gzip's it, and assigns an
etag. If the etag is "gz_abc" (the gzip'ed file has not changed),
it returns:

    HTTP/1.1 304 Not Modified
    

5) On the other hand, suppost foo.html has changed.
   The server finds foo.html, gzip's it, and assigns an etag.  
   Since the etag has changed, a 304 response does not occur.
   Since the server knows that this etag is associated with an instance
   that had gzip content-encoding (say, because of the GZ_ prefix)
   the server will compute a delta between the latest version of foo.html, 
   and the version it sent in step 1.
   This difference file is then returned, along with:
    HTTP/1.1 226 Delta
    Date: Wed, 14 Mar 2000 14:00:00 GMT
    Content-Encoding: gdiff
    Delta-base: "gz_abc"
    Etag: "gz_abc_d1"

Or, the server could also gzip the different file and return:
    HTTP/1.1 226 Delta
    Date: Wed, 14 Mar 2000 14:00:00 GMT
    Content-Encoding: gdiff,gzip
    Delta-base: "gz_abc"
    Etag: "gz_abc_d1"

6) The client recieves this delta, possibly unGZIP's it, and uses
   it to reconstruct the newest version of foo.html. This reconstruction
   uses "foo.html.ver1" from step 3

There are probably holes in this, but it doesn't require changing the
definition of ETag, or adding a new tag.  

(I still like the two tag proposal better)





 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Thu Mar 23 16:26:45 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA09658; Thu, 23 Mar 2000 16:26:45 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA17669; Thu, 23 Mar 2000 16:26:45 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA11712; Thu, 23 Mar 2000 16:26:45 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003240026.QAA11712@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: instance redux 
In-Reply-To: Your message of "Thu, 23 Mar 2000 18:19:35 EST."
             <200003232327.SAA10524@lycanthrope.crosslink.net> 
Date: Thu, 23 Mar 2000 16:26:45 -0800
X-Mts: smtp

<danielh@crosslink.net> writes:
   I've been advocating a "two tag" solution, but there is an alternative:
    
    whenever a delta is asked for an instance that was content-encoded,
    the server should compute the delta against it's decoded version of
    the instance, and the client should apply the resulting difference
    to it's copy of the decoded instance.

I've been working on a similar approach, but with a slight twist.
I've added clarifying comments in [] brackets, and the twist is
in () parentheses:

    whenever a delta is asked for a [base] instance that was 
    [originally received] content-encoded, (AND when the delta
    response does not carry a content-coding, then)
    the server should compute the delta against its decoded version of
    the instance, and the client should apply the resulting difference
    to its copy of the decoded instance.

The difference is that this allows a simpler implementation of
the scenario where the server always stores the encoded version
(e.g., foo.html.gz) in its file system, and so it computes the
delta between two different instances of foo.html.gz, rather
than foo.html.  Which might or might not be the most efficient
in terms of coding density, but I think it's a potentially
useful extension of your rule.

I think it may also make sense to use a "deferred evaluation"
approach, at the receiving cache, to the decoding stage (if
necessary).  I.e., the cache should store the responses as
received, not after content decoding, and any necessary content
decoding is then done at the last minute.

    This requires that the server and client:
      a) can identify which etags come from responses that 
	 have been content-encoded

No problem, because the cache (client or proxy) stores the
response as sent by the server, which includes the Content-encoding:
header.  And the server consistently assigns the entity tag
after the content-coding step (which might be an identity coding).

I don't think there is any need to mark the entity tag as
being associated with a content-coded response - the responses
themselves are marked with Content-encoding: headers.

      b) use decoded instances to perform deltas WHENEVER this sort of
         etag is used in a delta enabled request,

Right, except that I would say "whenever this sort of response",
not "this sort of entity tag" - and (per my twist above) not
when both the original (200) response and the delta (226) response
are marked with the same Content-Encoding: header.

      c) use the encoded instance when these etags are used in a plain
      vanilla conditional GET.

Not an issue, if you accept my point of view that the entity tag
is assigned to the output of the (possibly identity) content coding.

-Jeff


From mogul  Fri Mar 24 11:30:32 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA16694; Fri, 24 Mar 2000 11:30:32 -0800 (PST)
Message-Id: <200003241930.LAA16694@wera.pa.dec.com>
To: http-delta
From: <danielh@crosslink.net>
Reply-To: danielh@crosslink.net
Original-Date: Thu, 23 Mar 2000 22:47:51 -0500
In-Reply-To: <200003240026.QAA11712@wera.pa.dec.com>
Subject: Re: instance redux
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 
Date: Fri, 24 Mar 2000 11:30:32 -0800
Sender: mogul
X-Mts: smtp

danielh wrote:
>   I've been advocating a "two tag" solution, but there is an
> alternative:
>    whenever a delta is asked for an instance that was content-encoded,
>    the server should compute the delta against it's decoded version of
>    the instance, and the client should apply the resulting difference
>    to it's copy of the decoded instance.

Jeff responded:
>>I've been working on a similar approach, but with a slight twist. I've
>>added clarifying comments in [] brackets, and the twist is in ()
>>parentheses:
>>    whenever a delta is asked for a [base] instance that was 
>>    [originally received] content-encoded, (AND when the delta
>>    response does not carry a content-coding, then)
>>    the server should compute the delta against its decoded version of
>>    the instance, and the client should apply the resulting difference
>>    to its copy of the decoded instance.

Okay, I give up. Forget the two tag solution (even though I kind of like
it), this approach is reasonable. That said ...

I'm having trouble understanding the meaning of the above. I assume the
first part means:
     whenever a client requests a delta against a base  
     instance that was original recieved (by this client),
    and this base instance was content  encoded, the server should 
    compute a delta of the current variant  against a decoded
    version this base instance.

but what does      
        (AND when the delta  response does not carry a content-coding,
then) mean.  My guess is:
       If the server then sends a delta response that also carries another
content
      coding, then the delta response is against the base instance as it
was
      orignally sent (that is,  without decoding).
If that's what you mean, I can't agree -- that would break a
   content-encoding: diff-e,gzip
response (that is, a gzip of a diff-e of the current contents against the
decoded base instance).

(more on this below, I think)

>The difference is that this allows a simpler implementation of the
>scenario where the server always stores the encoded version (e.g.,
>foo.html.gz) in its file system, and so it computes the delta between two
>different instances of foo.html.gz, rather than foo.html.  Which might or
>might not be the most efficient in terms of coding density, but I think
>it's a potentially useful extension of your rule.

Since most encodings are going to break delta applications, it's sort of
pointless to bother with supporting delta for pre-encoded files that are
not identified as such.
That is: If the server were to deliver foo.html.gz WITHOUT a
content-encoding (and hope the client intuits the need to unGZIP it), then
the server is implicitily punting on future deltas. If the server includes
an explicit Content-Encoding: gzip, then the rule of "undo the set the
list of content-codings before delta computation" is straightforward
(though it's speed may depend on how clever the server is about retaining
both the unencoded and encoded versions).

>I think it may also make sense to use a "deferred evaluation" approach,
>at the receiving cache, to the decoding stage (if necessary).  I.e., the
>cache should store the responses as
>received, not after content decoding, and any necessary content decoding
>is then done at the last minute.

I leave that to implementors to determine -- it will probably depend on
how popular delta requesting becomes!.

>>    This requires that the server and client:
>>      a) can identify which etags come from responses that 
>>	 have been content-encoded

>No problem, because the cache (client or proxy) stores the
>response as sent by the server, which includes the Content-encoding:
>header.  And the server consistently assigns the entity tag after the
>content-coding step (which might be an identity coding).

An aside: I think the conclusion is that the etag does identify the
instance, given your new definition of instance (as the stuff AFTER
content coding, but before range extraction and before transfer coding).

{Perhaps that should be made explicit -- clients and servers who support
delta are agreeing that the etag is associated with the
post-content-encoded, pre range extract, and pre transfer encoded,
contents.}

Therefore: In addition to the server (and client) associating
resource/etag pairs to "cached" copies of base instances, the server (and
client) should also retain the set of codings used to create this
instance.  That is, the association is between a "resource/etag" pair and
a
"base-instance/content-codings-used-to-create-this-base-instance" pair.

>I don't think there is any need to mark the entity tag as
>being associated with a content-coded response - the responses themselves
>are marked with Content-encoding: headers.

I agree.  My suggestion was just a hack to implement  the 
latter association (that is, the etag would contains a summary of the
Content-encoding headers)

>      b) use decoded instances to perform deltas WHENEVER this sort of
>         etag is used in a delta enabled request,

>Right, except that I would say "whenever this sort of response", not
>"this sort of entity tag" - and (per my twist above) not when both the
>original (200) response and the delta (226) response are marked with the
>same Content-Encoding: header.

I don't quite get this point -- unless you are saying that the delta
encoding info should NO LONGER be included in the content-encoding (or
accept-encoding) headers.  

I'm don't think that's necessary -- so long as both client's and servers
adhere to the "use content-encoding header to decode base instances before
 computing/applying differences" rule, there is no need to change the
spec.

Which is the biggest advantage of this approach over the two tag approach.

>      c) use the encoded instance when these etags are used in a plain
>      vanilla conditional GET.
>Not an issue, if you accept my point of view that the entity tag is
>assigned to the output of the (possibly identity) content coding.

Yes -- this just clarifies that when there is no delta encoding (or if the
client or server are not delta savvy), then the
usual rules (of how to deal with If-None-Match) apply (regardless of the
prescense or absence of a content-encoding header).

BTW: in the above, I assume that there will never be "deltas against
deltas", that a client will not request a delta against "ddd", where "ddd"
was the etag of a Content-encoding: gdiff response.

-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------

From mogul@pa.dec.com  Fri Mar 24 12:02:17 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id MAA13957; Fri, 24 Mar 2000 12:02:17 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA10011; Fri, 24 Mar 2000 12:02:17 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id MAA12626; Fri, 24 Mar 2000 12:02:16 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003242002.MAA12626@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: instance redux 
In-Reply-To: Your message of "Fri, 24 Mar 2000 11:30:32 PST."
             <200003241930.LAA16694@wera.pa.dec.com> 
Date: Fri, 24 Mar 2000 12:02:16 -0800
X-Mts: smtp

<danielh@crosslink.net> wrote:

    >>I've been working on a similar approach, but with a slight twist. I've
    >>added clarifying comments in [] brackets, and the twist is in ()
    >>parentheses:
    >>    whenever a delta is asked for a [base] instance that was 
    >>    [originally received] content-encoded, (AND when the delta
    >>    response does not carry a content-coding, then)
    >>    the server should compute the delta against its decoded version of
    >>    the instance, and the client should apply the resulting difference
    >>    to its copy of the decoded instance.

    I'm having trouble understanding the meaning of the above. I assume
    the first part means:
	whenever a client requests a delta against a base  
	instance that was original recieved (by this client),
	and this base instance was content  encoded, the server should 
	compute a delta of the current variant  against a decoded
	version this base instance.
    
    but what does      
	(AND when the delta  response does not carry a content-coding,
	then)
    mean.  My guess is:
	If the server then sends a delta response that also carries
	another content coding, then the delta response is against the
	base instance as it was orignally sent (that is,  without decoding).

Actually, I think I may have botched the wording slightly, and
I certainly made it more confusing than it should be.  How about
this:
    If the base instance response and the current delta response carry
    DIFFERENT content-codings, then the server computes the delta based
    on the UN-ENCODED representations of both the base instance and the
    current instance.

    If both the base instance response and the current delta response
    carry the SAME set of content-coding(s), then the server computes
    delta based on the ENCODED representations of both the base
    instance and the current instance.  (If the set of content-codings
    == {}, then there is no difference between "encoded" and
    "un-encoded" representations.)

So basically the server either has to remember what content-coding
it used when sending a base instance response, or (easier to
implement) it simply has to follow a deterministic rule that
doesn't depend on extraneous parameters.  The latter could be
slightly complicated because if, for a given URL, the server
has a choice of content-codings based on the client's Accept-*
headers, then it probably does have to remember what it sent
for a given entity tag.

But if the content-coding is deterministically based on the
Request-URI [e.g., "/foo.html" vs. "/foo.html.gz"), then this
isn't a problem.

The client has to remember what content-coding came with the
base instance response, but that's easy because it's right there
in the Content-Encoding header.

Later in your message, you write:
    Therefore: In addition to the server (and client) associating
    resource/etag pairs to "cached" copies of base instances, the
    server (and client) should also retain the set of codings
    used to create this instance.  That is, the association is
    between a "resource/etag" pair and a
    "base-instance/content-codings-used-to-create-this-base-instance"
    pair.

I think we're in agreement, if you'll allow "if the server
can reliably recompute the pair later on, then it doesn't have to
store it."

You wrote:
    If that's what you mean, I can't agree -- that would break a
       content-encoding: diff-e,gzip
    response (that is, a gzip of a diff-e of the current contents against the
    decoded base instance).

Right, but the whole point of this week's discussion is that we
have realized (thanks to you!) that delta encodings cannot be
described as content-codings.  Therefore, if you want the
output of diff-e to be compressed, then we need to express this
either as
	IM: diff-e
	Transfer-coding: gzip
(that is, do the compression hop-by-hop, which is probably not
what you meant), or define a new delta encoding format that
includes compression:
	IM: diff-e-gzip
or possibly we could define the IM header in such a way that
it allows an ordered series of instance manipulation steps,
including compression:
	IM: diff-e, gzip

I believe that the Marimba approach that Arthur described falls
into the "encoding format includes compression step" model.
    
    >The difference is that this allows a simpler implementation of the
    >scenario where the server always stores the encoded version (e.g.,
    >foo.html.gz) in its file system, and so it computes the delta between two
    >different instances of foo.html.gz, rather than foo.html.  Which might or
    >might not be the most efficient in terms of coding density, but I think
    >it's a potentially useful extension of your rule.
    
    Since most encodings are going to break delta applications,
    it's sort of pointless to bother with supporting delta for
    pre-encoded files that are not identified as such.

I almost agree with you - but I think it would be a good exercise
to think through all of the possible corner cases, to make sure
we haven't left any other bugs in the protocol design.  And remember
that although all existing content-codings are some form of
compression, this isn't necessarily true for the future.
(For example, one might plausibly imagine an encoding that takes
more bytes but simplifies the parsing of some content-type.)

I think if we can make the protocol work (i.e., make the spec
unambiguous) for all cases, without adding a lot of complexity,
then we are better off than if we just try to make it work for
the cases we think are likely.

    BTW: in the above, I assume that there will never be "deltas
    against deltas", that a client will not request a delta against
    "ddd", where "ddd" was the etag of a Content-encoding: gdiff
    response.

No!  I'm pretty sure that the results from our SIGCOMM paper
support the opposite choice.  Unlike lightning, delta opportunities
tend to strike multiple times in the same place.  That is,
one is likely to see a sequences of references to a given
URL, each resulting in a new instance, and each one expressable
as a small delta from the previous one.

The trick is that if the client has to decode the instances
before applying a delta, when it caches the result, it should
(in effect) re-encode the new instance before storing it.
That then should give a consistent interpretation to the
Content-Encoding header of the cached new instance - it has the
same value as the Content-Encoding header of the 226 (Delta)
response.

One might object that this is wasteful because it consumes
CPU time at the cache (client or proxy).  But we would
generally like to trade CPU time for bytes-on-the-wire,
because Moore's law continues to make CPU time cheaper
relative to bandwidth.

-Jeff

From danielh@crosslink.net  Fri Mar 24 15:37:54 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA28772; Fri, 24 Mar 2000 15:37:54 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA00260; Fri, 24 Mar 2000 15:37:54 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id PAA25211
	for <http-delta@pa.dec.com>; Fri, 24 Mar 2000 15:37:53 -0800 (PST)
Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id SAA09558 for <http-delta@pa.dec.com>; Fri, 24 Mar 2000 18:37:52 -0500
Message-Id: <200003242337.SAA09558@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Fri, 24 Mar 2000 18:36:55 -0500
To: http-delta@pa.dec.com
In-Reply-To: <200003242002.MAA12626@wera.pa.dec.com>
Subject: Re: instance redux**2
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

I comment on jeff's most recent message below, but here's what my current
thoughts and questions (mostly as a result of struggling with Jeff's comments)....

1) Define instance as being
   a) after content-codings
   b) before range extraction and before transfer codings

? Is an instance defined before, or after,  delta codings? This may depend
  on whether a "instance manipulation" step (see 3 below) is added.

   Base-instances are instances returned from prior requests for a given 
   resource. They may be content (and delta?) encoded.
   The server and client will cache these base instances, 
   and will associate a url/etag pair to each base instance. In addition, 
   each base instance will be associated with the set of content-encodings 
   (and possibly the set of delta encodings) used in 
    it's creation (this set of content-codings may be implicit; say, as
   based on file extensions).

  The current instance is the instance of the current request, which may be 
  content encoded 
? can it be delta-encoded

? What do we call the "current snapshot of the resource" --
    the pre-content-encoded, pre-delta-encoded thing?

2) Etags are assigned to the results AFTER content and delta codings, but before 
   range extraction and transfer codings.

3) ISSUE: should delta codings be

    **  combined with content-codings, or 
    **  treated as part of an "instance manipulation" step that occurs after 
         content-coding.

*   Adding an "instance manipulation" step may allow for greater 
     generalizations.
*   Keeping the current idea, of delta-encoding as a content-encoding, may be
     adequate for now.

4) Either approach should assume that in many cases future delta 
    enabled-requests   may refer to a base instance  that is content-encoded.  
    In these cases, the  server (and client) should either:
     a) un-encode the base instance, and use it to form a delta against the 
       "current snapshot of the resource"
     b) compute a delta against the (possibly encoded) base instance, and 
         the (possibly encoded)  current instance.

? In case b, instance is defined as "pre-delta-encoding". 

Case b will often be useless. For example, the delta of two gzipped files
may differ substantially, even though their unGzipped contents may 
be nearly identical.

However, automatically doing case a prevents delta computation of
future content-codings that may yield useful deltas (say, a content-coding 
that  facilitates parsing). Thus, the client needs some means of telling the 
server which case should be applied.

It may be easier to do this if an "instance manipulation step" is added --
see 7 below for more discussion .

5) In addition to referering to base-instances that are content-encoded, 
   clients can use "delta encoded"  responses as base instance. If they do, 
    prior to computing a delta the server (and client) will have to
    "undifference", as well as possibly   un-encode, the base instance. 
    This requires another piece of     information -- the identity of an
    "earlier base instance".
    Furthermore, this process could be convoluted and compute intensive --
    as when a sequence of delta-encoded base instances is generated 
    (with later instances pointing to earlier instances). 

     Thus, clients should  always include at least one non-delta-encoded 
     instance when forming a delta-enabled request.

6) Some provision for end-to-end encoding of difference files is necessary.  
    Currently, this is signaled by adding content-codes (such as  GZIP) 
    after a delta-codes in the Content-encoding header.  Something
    similar should be specifiable under the "instance manipulation" scenario.

7) In the instance manipulation scenario, compression (or other 
    content-like-codings) could occur in the content-coding stage, 
    or in the "instance manipulation"stage.
     Thus, the following rule is a possible replacement for  5:
       i)  encoding that is done in the "instance manipulation step" should 
          be removed  prior to computation of a delta
      ii)  encoding that is done in the "content-coding" step should NOT be 
           removed prior  to computation of a delta.

? For initial responses to delta aware clients, there needs to be some way
   of specifying 5i --  the content is encoded, and in a future delta request  
    decoding should be done prior to computing/applying the delta.
    However, this speification needs to compatible with
    non-delta aware requests.

Maybe two tags isn't such a bad idea after all :]



----------------------------------------------------------------------------------------
Jeff wrote
>Actually, I think I may have botched the wording slightly, and I
>certainly made it more confusing than it should be.  How about this:
>    If the base instance response and the current delta response carry
>    DIFFERENT content-codings, then the server computes the delta based
>    on the UN-ENCODED representations of both the base instance and the
>    current instance.
>    If both the base instance response and the current delta response
>    carry the SAME set of content-coding(s), then the server computes
>    delta based on the ENCODED representations of both the base
>    instance and the current instance.  (If the set of content-codings
>    == {}, then there is no difference between "encoded" and
>    "un-encoded" representations.)

I'm having a hard time understanding the above, and I think it's because
I'm not straight on the presuppositions.

For example:
  a) are you assuming that delta-encoding is a seperate, post content-encoding, step?
  b) what is the current delta response --- 
        If the "current delta response" is the "difference file
      computed by comparing a base instance against a current instance",
       then how can the base instance response and the delta response
       be comparable (one is original content, the other is a difference file)?


>>So basically the server either has to remember what content-coding it
>>used when sending a base instance response, or (easier to implement) it
>>simply has to follow a deterministic rule that doesn't depend on
>>extraneous parameters.  The latter could be slightly complicated because
>>if, for a given URL, the server has a choice of content-codings based on
>>the client's Accept-* headers, then it probably does have to remember
>>what it sent for a given entity tag.
>
>But if the content-coding is deterministically based on the
>Request-URI [e.g., "/foo.html" vs. "/foo.html.gz"), then this isn't a
>problem.

B) How the server "remembers" is unimportant -- it can retain a physical
record of a transaction, or it can use rules like the above (personally, I use the
latter whenever possible in my implemetations).

However, I would add that in this example:
     if the server applies a "this is gzip content encoded, so decode before differencing" rule to
    a base instance (that is to be used in  a delta-enabled response for foo.html.gz), it MUST be 
     the case that a "content-encoding: gzip" response header was included in all responses fo
     foo.html.gz.

>The client has to remember what content-coding came with the base
>instance response, but that's easy because it's right there in the
>Content-Encoding header.
Yep.

>Later in your message, you write:
>    Therefore: In addition to the server (and client) associating
>    resource/etag pairs to "cached" copies of base instances, the
>    server (and client) should also retain the set of codings
>    used to create this instance.  That is, the association is
>    between a "resource/etag" pair and a
>    "base-instance/content-codings-used-to-create-this-base-instance"
>    pair.
>I think we're in agreement, if you'll allow "if the server
>can reliably recompute the pair later on, then it doesn't have to store
>it."

C) Yes.  How the server maintains these associations, or whether it stores or
regenerates decoded versions of a base instance,  is not a concern of the rfc.

>You wrote:
>    If that's what you mean, I can't agree -- that would break a
>       content-encoding: diff-e,gzip
>    response (that is, a gzip of a diff-e of the current contents against
>the  decoded base instance).

>Right, but the whole point of this week's discussion is that we have
>realized (thanks to you!) that delta encodings cannot be described as
>content-codings.  

D) I'm not sure I agree with that : \  .... see E below..

[upon re-reading -- I'm starting to see the value in your logic, but I'm still 
not entirely convinced -- see F below]

Therefore, if you want the output of diff-e to be
>compressed, then we need to express this either as
>	IM: diff-e
>	Transfer-coding: gzip
>(that is, do the compression hop-by-hop, which is probably not what you
>meant), or define a new delta encoding format that includes compression:
>	IM: diff-e-gzip
>or possibly we could define the IM header in such a way that it allows an
>ordered series of instance manipulation steps,
>including compression:
> IM: diff-e, gzip

E) IM -- instance manipulation?   

Maybe  this extra layer is necessary.  
However, I think that for delta,  this new header is not necessary.

My original notion was to use a second tag -- an "original" tag (itag or o-etag).
But since that idea wasn't very popular, I (along with Jeff) divined that so long
as the client and server agree to de-code (to reverse the content-encodings applied to)
their respective copies of a base instance before computing or applying the delta, 
then things would work fine.

Given that,  one should be able to treat delta as a content-encoding.
There are some implementation concerns when the delta-encoding was
applied to a base instance (see H below)

>I believe that the Marimba approach that Arthur described falls into the
>"encoding format includes compression step" model.
My reading of Arthur is that compression is an optional step, after encoding?

    
>>>The difference is that this allows a simpler implementation of the
>>>scenario where the server always stores the encoded version (e.g.,
>>>foo.html.gz) in its file system, and so it computes the delta   between two
>>>different instances of foo.html.gz, rather than foo.html.  Which   might or
>>>might not be the most efficient in terms of coding density, but I
>>think it's a potentially useful extension of your rule.
>    
>>    Since most encodings are going to break delta applications,
>>    it's sort of pointless to bother with supporting delta for
>>    pre-encoded files that are not identified as such.

>I almost agree with you - but I think it would be a good exercise to
>think through all of the possible corner cases, to make sure we haven't
>left any other bugs in the protocol design.  And remember that although
>all existing content-codings are some form of compression, this isn't
>necessarily true for the future. (For example, one might plausibly
>imagine an encoding that takes more bytes but simplifies the parsing of
>some content-type.)

F) That's a good point.  So the delta spec should allow for cases where the
delta is computed against the instances (where, to reiterate, the instance
is what one has after content-encoding), and not against the "decoded"
instances.  

Oh brother, how does one do that.  You'ld have to have some way for the 
client to tell the server at what point to stop "encoding" (and to stop decoding).
In particualr, if  we have  a  html-parse,gzip encoding; , then the client would have
to tell the server to "html-parse the current "snapshot", unGzip the base instance,
compute the delta, and send it to me"

Perhaps this is why you are leaning to a IM header, which can become more complex.

>I think if we can make the protocol work (i.e., make the spec
>unambiguous) for all cases, without adding a lot of complexity, then we
>are better off than if we just try to make it work for the cases we think
>are likely.

G) Sometimes I wish I could disagree.

>    BTW: in the above, I assume that there will never be "deltas
>    against deltas", that a client will not request a delta against
>    "ddd", where "ddd" was the etag of a Content-encoding: gdiff
>    response.

>No!  I'm pretty sure that the results from our SIGCOMM paper support the
>opposite choice.  Unlike lightning, delta opportunities tend to strike
>multiple times in the same place.  That is, one is likely to see a
>sequences of references to a given URL, each resulting in a new instance,
>and each one expressable as a small delta from the previous one.

H) Okay, after thinking about it,  I think I agree. Here's what I think is happening...

Suppose the server  regenerates "unencoded" versions of base instances as needed. 
If  a given base instance was delta-encoded (that is, the base instance was a 
difference file from an earlier 226 response), then this "unencoding" will involve
undifferencing.

Thus, as long as the "base instance for this delta-encoded base instance" is available,
using "delta responses" as base instances for future delta responses
is okay, and does not break the "use unencoded instances when 
computing/applying deltas" rule.

But it looks awfully messy -- if you get a sequence of these, to compute the
"unencoded" base instance,  you might have to do recursive undifferencing 
of a set of base instances (where all but the first base instance of the set was
delta encoded).  Lose one of them, and you are out of luck .

>The trick is that if the client has to decode the instances
>before applying a delta, when it caches the result, it should (in effect)
>re-encode the new instance before storing it.
>That then should give a consistent interpretation to the
>Content-Encoding header of the cached new instance - it has the same
>value as the Content-Encoding header of the 226 (Delta) response.

I) Depends if you'll ever use the "encoded" base instance.  It seems 
a lot cheaper to retain the unencoded base instance, and risk loosing 
some flexibility.

>One might object that this is wasteful because it consumes
>CPU time at the cache (client or proxy).  But we would
>generally like to trade CPU time for bytes-on-the-wire,
>because Moore's law continues to make CPU time cheaper
>relative to bandwidth.

J) If the recursion gets deep enough, it could be a lot of cpu time!

Personally, in the near future I won't be adding support for delta encoding 
"when the base instance is a delta encoded response" -- it's kind of
scarey thinking about the details! Perhaps that looses some efficiency --

This "personal evidence" suggests a proviso:
     When a client specifies (through If-None-Match) base-instances 
     that may be used to form a delta response, it may include
     base-instances that are delta-encoded, and it SHOULD include at 
     least one non-delta-encoded base instance.



-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Fri Mar 24 16:10:54 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA28224; Fri, 24 Mar 2000 16:10:54 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA30609; Fri, 24 Mar 2000 16:10:54 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA26440; Fri, 24 Mar 2000 16:10:54 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003250010.QAA26440@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: instance redux**2 
In-Reply-To: Your message of "Fri, 24 Mar 2000 18:36:55 EST."
             <200003242337.SAA09558@lycanthrope.crosslink.net> 
Date: Fri, 24 Mar 2000 16:10:54 -0800
X-Mts: smtp

<danielh@crosslink.net> writes:

    I comment on jeff's most recent message below, but here's what my
    current thoughts and questions (mostly as a result of struggling
    with Jeff's comments)....
    
I think it's not really profitable for us to continue exchanging
lengthy email messages until I can put my re-design into a
self-contained, clear statement.  You shouldn't have to struggle
with it.

So rather than trying to address your comments, I'm off working
on that.  It probably won't be ready until early next week.

-Jeff

From mogul  Mon Mar 27 15:22:53 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA15063; Mon, 27 Mar 2000 15:22:53 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <200003272322.PAA15063@wera.pa.dec.com>
To: http-delta
Subject: Proposed redesign of delta spec to account for recent bug
Date: Mon, 27 Mar 2000 15:22:53 -0800
X-Mts: smtp

This is a sketch of what I propose to do to fix the delta encoding
(and instance digest) specifications, in order to fix the ambiguity
about when the entity tag is assigned.

Note that this is still a sketch, not an actual set of changes!
so there will probably be other details that come up.  Also, at
this point I don't see any need to change the treatment of
delta-transfer-codings (i.e., hop-by-hop delta encodings),
so that's not covered here.

Please read the whole message before replying!

-Jeff

========================================================================
New/Changed definitions:

My original definition for "instance" was wrong, and should be
replace by these two definitions:

   instance         The entity that would be returned in a status-200
                   response to a GET request, at the current time, for
                   the selected variant of the specified resource, with
                   the application of zero or more content-codings, but
                   without the application of any instance manipulations
                   or transfer-codings.

   instance manipulation
                   An operation on one or more instances which may
                   result in an instance being conveyed from server to
                   client in parts, or in more than one response
                   message.  For example, a range selection or a delta
                   encoding.  Instance manipulations are end-to-end, and
                   often involve the use of a cache at the client.

========================================================================
HTTP response generation pipeline:

Once the definition for "instance" is cleared up, the processing
pipeline for HTTP responses is:

	datatype	operation leading to next datatype
	========	==================================
        variant
	            |   apply content-coding
		    v
        instance
        	    |   apply instance manipulation:
                    v	        (delta encoding, range selection, etc.)
        entity-body
	            |   apply transfer-coding
		    v
        message-body

The entity tag is associated with a specific instance, not with
either a variant or an entity-body.  For strong entity tags,
at least, a given entity tag value is uniquely associated with
an instance, within a uniqueness scope.

========================================================================
New (semi-correct) BNF:

This is a rough idea of the new BNF necessary to support end-to-end
delta encoding, now that delta encoding is NOT being done as a form
of content-coding:

    instance-manipulation = "vcdiff" | "gdiff" | "diffe" | "gzip" | token
    
    IM = "IM" ":" #(instance-manipulation)
    
    A-IM = ("A-IM" | "Accept-IM") ":" #(instance-manipulation)

It may be possible, in theory, to describe range selection as a form
of instance manipulation, although this would require quite a bit
more syntax (to specify ranges).  Doing so might make it possible
to explicitly control whether range selection is done before or
after other instance manipulations.  I'm not sure this is worth
the effort.

It might also be worth thinking about allowing parameters to be
associated with instance-manipulation tokens, in case a particular
coding function can be parameterized.  For example,
	IM: mydiff;windowsize=37
Again, I'm not sure if this is worth the effort.

Note that instance-manipulations may be combined, so that
	IM: diffe, gzip
means that the server applied gzip to the output of diff -e.

========================================================================
Derivation rule:

Here is some pseudo-code that explains how a client, upon receiving
a 226 (Delta) response, should interpret it, based on the headers
in that response and in the cached base-instance response.

I have reviewed this once or twice for correctness, but since
it's pseudo-code I obviously haven't tested it.

    // variables and their meanings
    Rreceived : Just-received response [input]
    Inew : new instance derived from Rreceived [output]
    Rnew : new cachable response derived from Rreceived [output]
    Rold : some previously received response [temporary]
    Ccoding : content-coding [temporary]
    // end of variables.

    if status_code(Rreceived) == 226_Delta then
        // find cached base-instance response
	Rold = find_response(delta_base(Rreceived));
	if (Rold == NULL) then
	    error("Missing base instance!");
	endif

	if Content_encoding(Rold) == Content_encoding(Rreceived) then
	    Inew = apply_delta(body(Rold), body(Rreceived));
	    Rnew = Inew;	// keeps Content-Encoding hdr from Rreceived
	else
	    Ccoding = Content_encoding(Rreceived);
	    Inew = apply_delta(content_decode(Rold),content_decode(Rreceived));
	    if Ccoding == identity then
		Rnew = Inew;
	    else
		Rnew = apply_content_coding(Inew, Ccoding);
	    endif
	endif
    endif

    // Note: content_decode() applies the identity transformation if
    // "Content-Encoding" header is empty or missing.

========================================================================
Examples:

Here are several examples, starting with an initial (non-delta)
request/response, and then continuing with a number of different
ways of getting the same new content.  Note that example 2 yields
entity-tag "def", while examples 3 & 4 yield entity-tag "ghi",
because the former has a non-identity content-coding while the
latter two do not.  Even so, the decoded content is the same in
all three of those examples.

Again, I have fixed all the bugs I can find in these examples,
but that doesn't mean that I found them all.

(1) At time 14:00:00:
    GET /example.com/foo.html HTTP/1.1
    Host: example.com
    Accept-encoding: gzip
    
    HTTP/1.1 200 OK
    Date: Wed, 24 Dec 1997 14:00:00 GMT
    Etag: "abc"
    Content-encoding: gzip

etag = abc for instance = gzip(foo.html/14:00:00)

body is gzip(foo.html/14:00:00)

(2) At time 14:01:00 - alternative #1:
    GET /example.com/foo.html HTTP/1.1
    Host: example.com
    If-none-match: "abc"
    Accept-encoding: gzip
    A-IM: vcdiff

    HTTP/1.1 226 Delta
    Date: Wed, 24 Dec 1997 14:01:00 GMT
    Etag: "def"
    Delta-base: "abc"
    Content-encoding: gzip
    IM: vcdiff

etag = def for instance = gzip(foo.html/14:01:00)

message-body is
    vcdiff_delta(gzip(foo.html/14:00:00), gzip(foo.html/14:01:00))

new cache entry is stored with Content-encoding: gzip

(3) At time 14:01:00 - alternative #2:
    GET /example.com/foo.html HTTP/1.1
    Host: example.com
    If-none-match: "abc"
    Accept-encoding: gzip
    A-IM: vcdiff

    HTTP/1.1 226 Delta
    Date: Wed, 24 Dec 1997 14:01:00 GMT
    Delta-base: "abc"
    Etag: "ghi"
    IM: vcdiff

etag = ghi for instance = identity(foo.html/14:01:00)

message-body is
    vcdiff_delta(gunzip(gzip(foo.html/14:00:00)), foo.html/14:01:00)

new cache entry is stored with Content-encoding: identity

(4) At time 14:01:00 - alternative #3:
    GET /example.com/foo.html HTTP/1.1
    Host: example.com
    If-none-match: "abc"
    Accept-encoding: gzip
    A-IM: diffe, gzip

    HTTP/1.1 226 Delta
    Date: Wed, 24 Dec 1997 14:01:00 GMT
    Delta-base: "abc"
    Etag: "ghi"
    IM: diffe, gzip

etag = ghi for instance = identity(foo.html/14:01:00)

message-body is
   gzip(diffe_delta(gunzip(gzip(foo.html/14:00:00)), foo.html/14:01:00))

new cache entry is stored with Content-encoding: identity

========================================================================

From danielh@crosslink.net  Mon Mar 27 20:51:54 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id UAA30063; Mon, 27 Mar 2000 20:51:54 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA11557; Mon, 27 Mar 2000 20:51:54 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id UAA25709
	for <http-delta@pa.dec.com>; Mon, 27 Mar 2000 20:51:53 -0800 (PST)
Received: from smtp.crosslink.net (dyn11.c5200-3.springfield.236.crosslink.net [207.199.145.76]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id XAA24402 for <http-delta@pa.dec.com>; Mon, 27 Mar 2000 23:51:46 -0500
Message-Id: <200003280451.XAA24402@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Mon, 27 Mar 2000 23:46:27 -0500
To: http-delta@pa.dec.com
Subject: redesign of delta -- some comments
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

Some comments on Jeff Mogul's 3/27/00 proposal.

I) Reiterate the definition of instance

The entity tag is associated with a specific instance, not with >either a
variant or an entity-body.  For strong entity tags, >at least, a given
entity tag value is uniquely associated with      >an instance, within a
uniqueness scope. 

Since it's a crucial point that RFC2616 skimps on, I'ld add...
  Thus, the entity tag is assigned AFTER content coding, but BEFORE delta
encoding,
  range manipulation, etc.

II) Restating the rules

The rules that I read from your pseudo code and your examples are:

[... first, some definitions: 
    base-instance: a copy of a prior instance of a resource; which can be
                   retained by both client and server
]

The client side rules:
   a) Instances are stored, and are identified by etags.
      Per definition, instances are always post content-coding, but pre 
      delta encoding.
      In addition to storing the body of the instance, the
content-encoding
      must also be stored.
   b) When a delta-response is recieved, the undifferencing rules are:
      i) compare the content-encoding of the client's copy of the base 
         instanceand the content-encoding of the delta-response. 
      ii.a) If they are the same, recreate the current instance by
            applying the difference (that is, the body of the
delta-response)
            to the base instance -- do NOT decode the base instance 
            beforehand. 
      ii.b) If they are different,
        ii.b.1) Decode the base instance
        ii.b.2) Recreate an unencoded version of the current instance by 
                applying the difference to this decoded base instance, 
        ii.b.3) Content-encode (using the response's content-encoding)
this
                recreated current instance, and save (using the current
                etag)
  Note that step a necessitates step ii.b.3
  Also note that applying the "im" decoding  may be a several step process
-- 
  for example, the response may first need to be gunzip'ed, and then 
  undifferenced.

When the server is forming a delta-response, it has some flexibility. 
For example, if the original response had a Gzip content-encoding, the server
can:
  a) choose Gzip as a content-encoding, and compute a difference between two 
     Gzipped instances
  b) choose Gzip as a IM, and compute a difference between un-encoded 
     instances.

III) A possible shortcoming

Content-encoding wise, this is an all-or-nothing strategy. The server either 
uses the instances with all it's content-codings, or with none of them. 
This may miss some opportunities. For example:
   Consider a "qparse" content-encoding, that (in contrast to most    
   compression algorithims)is amenable to differencing. 
   Hence, it might be optimal to compute the delta after qparsing 
   (but before compression).
   Furthermore, assume "qparse" actually increases the size of 
   the response, hence should be combined with compression.

Example:
(1) At time 14:00:00:
    GET /example.com/foo.html HTTP/1.1
    Host: example.com
    Accept-encoding: qparse,gzip
    
    HTTP/1.1 200 OK
    Date: Wed, 24 Dec 1997 14:00:00 GMT
    Etag: "abc"
    Content-encoding: qparse,gzip

  etag = abc for instance = gzip(qparse(foo.html/14:00:00))

  message-body is gzip(qparse(foo.html/14:00:00))

How might the client and server agree to compute a delta on a qparsed response?
For example, to return:
    diffe_delta(gunzip(foo.html/14:00:00)),qparse(foo.html/14:01:00))

I don't see how it can be done with content codings.  
Perhaps if qparse could be an IM? 

Also, if qparse is an expensive operation that a powerful server 
is willing to undertake (but thin clients may wish to avoid),
it may be burdensome to require that the client
reconstruct the "instance" with all of the content-encodings.



-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Tue Mar 28 15:26:06 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA18628; Tue, 28 Mar 2000 15:26:06 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA08567; Tue, 28 Mar 2000 15:26:05 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA19123; Tue, 28 Mar 2000 15:26:05 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003282326.PAA19123@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: redesign of delta -- some comments 
In-Reply-To: Your message of "Mon, 27 Mar 2000 23:46:27 EST."
             <200003280451.XAA24402@lycanthrope.crosslink.net> 
Date: Tue, 28 Mar 2000 15:26:05 -0800
X-Mts: smtp

Daniel and I seem to have more or less converged, but there
are a few items from his most recent message that I should
respond to:

    The client side rules:
       a) Instances are stored, and are identified by etags.
	  Per definition, instances are always post content-coding, but pre 
	  delta encoding.
	  In addition to storing the body of the instance, the
	  content-encoding must also be stored.
       b) When a delta-response is recieved, the undifferencing rules are:
	  i) compare the content-encoding of the client's copy of the base 
	     instance and the content-encoding of the delta-response. 
	  ii.a) If they are the same, recreate the current instance by
		applying the difference (that is, the body of the
		delta-response)	to the base instance -- do NOT decode
		the base instance beforehand. 
	  ii.b) If they are different,
	    ii.b.1) Decode the base instance
	    ii.b.2) Recreate an unencoded version of the current instance by 
		    applying the difference to this decoded base instance, 
	    ii.b.3) Content-encode (using the response's content-encoding)
		    this recreated current instance, and save (using the
		    current etag)

This seems generally right.  (I might even steal some of this for
the rewrite of the spec, once I get around to it.)

      Note that step a necessitates step ii.b.3

Well, not exactly.  If the client doesn't put the result of ii.b.2
into a cache, but only uses it to render a page, then there is no
need to go through step ii.b.3.  Further, even a cache can avoid
this step by deferring it until the next use of the cached response,
and (because many cache entries are never re-used) so might avoid
ever doing the re-encoding.

However, a proxy cache that applies the delta decoding before
forwarding the response to a client has to restore the content-coding
as sent by the origin server - i.e., has to deliver exactly the
instance that the client would have received had delta encoding
not been used.  This could be important, for example, if the client
later receives a delta or range via a different proxy, to be applied
to this instance.

    III) A possible shortcoming
    
    Content-encoding wise, this is an all-or-nothing strategy. The
    server either uses the instances with all it's content-codings, or
    with none of them.

I'm not quite sure what you mean by that.

    This may miss some opportunities. For example:
       Consider a "qparse" content-encoding, that (in contrast to most    
       compression algorithims)is amenable to differencing. 
       Hence, it might be optimal to compute the delta after qparsing 
       (but before compression).
       Furthermore, assume "qparse" actually increases the size of 
       the response, hence should be combined with compression.
    
    Example:
    (1) At time 14:00:00:
	GET /example.com/foo.html HTTP/1.1
	Host: example.com
	Accept-encoding: qparse,gzip
	
	HTTP/1.1 200 OK
	Date: Wed, 24 Dec 1997 14:00:00 GMT
	Etag: "abc"
	Content-encoding: qparse,gzip
    
      etag = abc for instance = gzip(qparse(foo.html/14:00:00))
    
      message-body is gzip(qparse(foo.html/14:00:00))
    
    How might the client and server agree to compute a delta on a
    qparsed response?

One possibility would be to take advantage of an apparent loophole
in the sketchy specification I suggested yesterday; do the initial
request like so:

    (1) At time 14:00:00:
	GET /example.com/foo.html HTTP/1.1
	Host: example.com
	Accept-encoding: qparse,gzip
	A-IM: gzip
	
	HTTP/1.1 200 OK
	Date: Wed, 24 Dec 1997 14:00:00 GMT
	Etag: "abc"
	Content-encoding: qparse
	IM: gzip
	Vary: A-IM
    
      etag = abc for instance = qparse(foo.html/14:00:00)
    
      message-body is gzip(qparse(foo.html/14:00:00))
    
I.e., the server has a choice (based on the client's Accept-*
headers) whether to apply the gzip as a content-coding or as
an instance-manipulation.  I hadn't originally thought of a
good reason for the server to apply gzip as an IM without
first having done a compressible delta encoding, but here it
seems to make sense.

Note, however, that in order to prevent a cache from accidentally
forwarding this status-200 response to a client that doesn't
understand IM, it has to be labelled "Vary: A-IM".  (This probably
should be done for all of my examples, although in the other cases,
the 226 response status code will prevent incorrect treatment of
misdirected delta responses.)

Once this initial response has been received, and the client
wants to get a delta as an update, we have:

    (2) At time 14:00:01:
	GET /example.com/foo.html HTTP/1.1
	Host: example.com
	Accept-encoding: qparse,gzip
	A-IM: diffe,gzip
	
	HTTP/1.1 226 Delta
	Date: Wed, 24 Dec 1997 14:00:01 GMT
	Etag: "mno"
	Content-encoding: qparse
	IM: diffe,gzip
	Vary: A-IM
    
      etag = mno for instance = qparse(foo.html/14:01:00)
    
      message-body is
	gzip(diffe_delta(qparse(foo.html/14:00:00), qparse(foo.html/14:01:00)))

I.e., because the original message used gzip as an IM, not as a
content-coding, we don't need to take that into account when
computing or applying the delta.  The cache is free to store the
initial response with the "IM: gzip" encoding, but it has to
decode this before using the response to apply a subsequent delta.
    
-Jeff

From mogul  Wed Mar 29 13:13:19 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA31199; Wed, 29 Mar 2000 13:13:19 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <200003292113.NAA31199@wera.pa.dec.com>
To: http-delta
Subject: Another simplification(?): removing delta transfer-codings
Date: Wed, 29 Mar 2000 13:13:18 -0800
X-Mts: smtp

You may recall that the current Internet-Drafts for delta encoding
specify ways to do it either as a content-coding or as a
transfer-coding.

There are two stated motivations for using a delta-transfer-coding
instead of a delta-content-coding:
    (1) allow hop-by-hop deltas.
    (2) allow deltas to be applied after a Range selection.

My recollection is that #2 was the real reason.

Assume we are going to replace the use of deltas as
content-codings with deltas as instance manipulations.  How does
this affect the decision to support deltas as transfer-codings?

I'm not sure there is much important to reason #1.  After all,
either way, we have no mechanism to allow a client (or proxy)
to know whether the next-hop proxy/server on the path towards
the origin server supports any kind of delta, so I can't see
any obvious way for the client to know which form to ask for.

If a proxy does modify a request to imply support for deltas
(e.g., adding "A-IM: vcdiff" to a request that doesn't already
have it), then it's pretty clear that the proxy should apply
any delta response it receives, and convert the forwarded
response to a status-200 format.  I don't think this is any
harder than applying hop-by-hop deltas as transfer-codings.

Reason #2 was more important: we needed a way to distinguish
between "delta encoding before Range selection" and "Range
selection before delta encoding."  Under the old (erroneous)
definition of "instance", delta-content-coding always came
before Range selection, and delta transfer-coding always comes
after, so this allows the client to make its intentions clear
by specifying one or the other kind of delta encoding in its
request.

Now that delta encoding is NOT a content-coding, but rather
a form of instance manipulation, this approach doesn't work
(and I'm not sure I really ever thought through all of the
implications of Ranges and delta-transfer-codings - it might
not have worked anyway).  This forces us to find another way
to allow a client specify which order delta encoding and range
selection should be done in.  I.e., we have to face the problem
head-on, and the trick of using transfer-codings won't help.

So I'm proposing the following mechanism, which I think is a
simplification overall (even though it has somewhat of a kludgey
flavor):

(a) Delta encoding is *always* a form of instance-manipulation,
never a content-coding or transfer-coding.

(b) Range selection is explicitly defined as a form of instance
manipulation.

(c) We define a "range" literal as part of the registered set
of instance manipulations.

(d) If a client's request includes both a Range header and
an "A-IM: <some form of delta>" request header, then in order
to specify an ordering between these two instance manipulations,
the client must include the "range" literal in the A-IM header.
For example,

	GET /foo.html HTTP/1.1
	If-None-Match: "abc"
	Host: example.com
	Range: bytes=1-100
	A-IM: vcdiff,gdiff,range

specifies that if the server does use a delta-encoding, then
it must be applied BEFORE the range selection.  The response
would then be:

	HTTP/1.1 227 Range of Delta
	Etag: "def"
	Content-Range: bytes 1-100/12345
	IM: vcdiff,range
	Vary: A-IM

to make it clear to the recipient what order the instance
manipulations were applied.  Similarly, a client that
wants the delta encoding to be applied after the Range
selection would instead send:	

	A-IM: range,vcdiff,gdiff

in its request.

If the request contains something contradictory like

 	A-IM: vcdiff,range,vcdiff

then I would argue that this is "undefined", and the server
can do whatever it wants - i.e., we don't need to specify
all of the possible bogus combinations.

Note: It might not be necessary for the server to return
the "range" token in the IM: response header.  I suspect
that the use of "Vary: A-IM" prevents any ambiguities. But
I have to think more about this issue.

Making this change greatly simplifies the Internet-Draft,
although it does introduce some slight additional mechanism.

-Jeff

From danielh@crosslink.net  Wed Mar 29 13:51:17 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA07627; Wed, 29 Mar 2000 13:51:16 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA29565; Wed, 29 Mar 2000 13:51:16 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA11823
	for <http-delta@pa.dec.com>; Wed, 29 Mar 2000 13:51:15 -0800 (PST)
Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id QAA30697 for <http-delta@pa.dec.com>; Wed, 29 Mar 2000 16:51:14 -0500
Message-Id: <200003292151.QAA30697@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Wed, 29 Mar 2000 16:40:14 -0500
To: http-delta@pa.dec.com
In-Reply-To: <200003292113.NAA31199@wera.pa.dec.com>
Subject: Re: Another simplification(?): removing delta transfer-codings
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

Jeffrey Mogul said:
> .... There are two stated motivations for using a delta-transfer-coding
>instead of a delta-content-coding:
>    (1) allow hop-by-hop deltas.
>    (2) allow deltas to be applied after a Range selection.
>My recollection is that #2 was the real reason.

>Assume we are going to replace the use of deltas as
>content-codings with deltas as instance manipulations.  How does this
>affect the decision to support deltas as transfer-codings?

>I'm not sure there is much important to reason #1. ....
>If a proxy does modify a request to imply support for deltas (e.g.,
>adding "A-IM: vcdiff" to a request that doesn't already have it), then
>it's pretty clear that the proxy should apply any delta response it
>receives, and convert the forwarded response to a status-200 format.  I
>don't think this is any harder than applying hop-by-hop deltas as
>transfer-codings.

I concur.  I  note that during my first round of implementation of
delta, I found that it was easier to start with delta as a transfer encoding
(mostly because the rules seemed simpler).  Under the new proposal,
that's no longer a concern.

>Reason #2 was more important: we needed a way to distinguish between
>"delta encoding before Range selection" and "Range selection before delta
>encoding."  Under the old (erroneous) definition of "instance",
>delta-content-coding always came before Range selection, and delta
>transfer-coding always comes after, so this allows the client to make its
>intentions clear by specifying one or the other kind of delta encoding in
>its request.....
>So I'm proposing the following mechanism, which I think is a
>simplification overall (even though it has somewhat of a kludgey flavor):
>(a) Delta encoding is *always* a form of instance-manipulation, never a
>content-coding or transfer-coding.
>(b) Range selection is explicitly defined as a form of instance
>manipulation.
>(c) We define a "range" literal as part of the registered set of instance
>manipulations.
>(d) If a client's request includes both a Range header and
>an "A-IM: <some form of delta>" request header, then in order to specify
>an ordering between these two instance manipulations, the client must
>include the "range" literal in the A-IM header. For example,

It's a lot LESS kludgey then the old version (in the "I'll be happy to rework
my current implementation" sense)

>..... 
>Note: It might not be necessary for the server to return
>the "range" token in the IM: response header.  I suspect
>that the use of "Vary: A-IM" prevents any ambiguities. But
>I have to think more about this issue.

It seems dangerous to try and finesse the need to include Range in IM, since 
ordering (whether it is before or after a delta-code) has major importance.

 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From danielh@crosslink.net  Wed Mar 29 21:19:00 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id VAA02374; Wed, 29 Mar 2000 21:19:00 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA08434; Wed, 29 Mar 2000 21:19:00 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id VAA30340
	for <http-delta@pa.dec.com>; Wed, 29 Mar 2000 21:18:59 -0800 (PST)
Received: from smtp.crosslink.net (dyn05.c5200-1.springfield.236.crosslink.net [207.199.142.6]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id AAA21893 for <http-delta@pa.dec.com>; Thu, 30 Mar 2000 00:18:52 -0500
Message-Id: <200003300518.AAA21893@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Thu, 30 Mar 2000 00:08:28 -0500
To: http-delta@pa.dec.com
Subject: adding an IM content coding
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

It may not be necessary, but I'm wondering if there are cases where
http/1.1-but-not-IM-aware clients (or intermediate proxies) may recieve
delta-coded (or otherwise IM coded) responses.

If so, the content-coding will look normal, but they won't
be able to make sense of the response. 

I suspect that if all actors are well behaved, such a reciept will not
occur. In particular, I think the Vary: A-IM should prevent such events.  
But what if there is a subtle case that Vary doesn't cover, or (more
likely) a not-quite-kosher proxy.

To account for such mishaps, a special "IM" coding could be
added at the end of the content-coding. This would be discarded by
IM-aware clients. However, as an unrecognized coding, it  would  signal to
non-IM-aware clients that there is more here then a normal content-encoded
response.


-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul  Thu Mar 30 10:08:55 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA01412; Thu, 30 Mar 2000 10:08:55 -0800 (PST)
Message-Id: <200003301808.KAA01412@wera.pa.dec.com>
To: http-delta
From: <danielh@crosslink.net>
Reply-To: danielh@crosslink.net
Original-Date: Tue, 28 Mar 2000 21:54:27 -0500
In-Reply-To: <200003282326.PAA19123@wera.pa.dec.com>
Subject: Re: redesign of delta -- some comments
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 
Date: Thu, 30 Mar 2000 10:08:55 -0800
Sender: mogul
X-Mts: smtp

Daniel said:
>>    The client side rules: ....

Jeffrey  said:

>This seems generally right.  (I might even steal some of this for the
>rewrite of the spec, once I get around to it.)

Please do.

>>      Note that step a necessitates step ii.b.3
>Well, not exactly.  If the client doesn't put the result of ii.b.2 into a
>cache, but only uses it to render a page, then there is no need to go
>through step ii.b.3.  Further, even a cache can avoid this step by
>deferring it until the next use of the cached response, and (because many
>cache entries are never re-used) so might avoid ever doing the
>re-encoding.

That does loosen up a lot of possible problems -- the client doesn't have
to cache, or doesn't have to "reperform" the content codings, and doesn't
have to use the current instance as a future base instance.

>However, a proxy cache that applies the delta decoding before forwarding
>the response to a client has to restore the content-coding as sent by the
>origin server - i.e., has to deliver exactly the instance that the client
>would have received had delta encoding not been used.  This could be
>important, for example, if the client later receives a delta or range via
>a different proxy, to be applied to this instance.

When would that happen (given that content-coding and IM are end-to-end)?

>    This may miss some opportunities. For example:
>       Consider a "qparse" content-encoding, that (in contrast to most   
>       compression algorithims) is amenable to differencing. 
>       Hence, it might be optimal to compute the delta after qparsing 
>       (but before compression).  ....
>    How might the client and server agree to compute a delta on a
>    qparsed response?

>One possibility would be to take advantage of an apparent loophole in the
>sketchy specification I suggested yesterday; do the initial request like
>so:
>    (1) At time 14:00:00:
>	GET /example.com/foo.html HTTP/1.1
>	Host: example.com
>	Accept-encoding: qparse,gzip
>	A-IM: gzip
>	
>	HTTP/1.1 200 OK
>	Date: Wed, 24 Dec 1997 14:00:00 GMT
>	Etag: "abc"
>	Content-encoding: qparse
>	IM: gzip
>	Vary: A-IM

That notion had crossed my mind, but I was concerned about
interoperability with non IM aware clients & servers.
Allowing gzip to be both an accept-encoding and a A-IM, use
of Vary, and the 226 response seems to allay this concern.

>I.e., the server has a choice (based on the client's Accept-* headers)
>whether to apply the gzip as a content-coding or as an
>instance-manipulation.  I hadn't originally thought of a good reason for
>the server to apply gzip as an IM without
>first having done a compressible delta encoding, but here it seems to
>make sense.

Actually, Jeff came up with the idea of a content-coding that is designed
to facilitiate parsing (thus, "quick-parse").  
Note that since this sort of coding is used to speed up processing on the
client end (rather then to reduce bandwidth requirements) it would be
counter productive to expect the client to completely decode, and then
recode, a base instance.  Hence the rationale for allowing a delta against
a "partially" decoded instance.

>Note, however, that in order to prevent a cache from accidentally
>forwarding this status-200 response to a client that doesn't understand
>IM, it has to be labelled "Vary: A-IM".  (This probably should be done
>for all of my examples, although in the other cases, the 226 response
>status code will prevent incorrect treatment of misdirected delta
>responses.)

-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------

From mogul@pa.dec.com  Thu Mar 30 10:22:55 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA31599; Thu, 30 Mar 2000 10:22:55 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA18813; Thu, 30 Mar 2000 10:22:54 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA31734; Thu, 30 Mar 2000 10:22:54 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003301822.KAA31734@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: adding an IM content coding 
In-Reply-To: Your message of "Thu, 30 Mar 2000 00:08:28 EST."
             <200003300518.AAA21893@lycanthrope.crosslink.net> 
Date: Thu, 30 Mar 2000 10:22:54 -0800
X-Mts: smtp

<danielh@crosslink.net> writes:

    It may not be necessary, but I'm wondering if there are cases where
    http/1.1-but-not-IM-aware clients (or intermediate proxies) may
    recieve delta-coded (or otherwise IM coded) responses.

    If so, the content-coding will look normal, but they won't be able
    to make sense of the response.

    I suspect that if all actors are well behaved, such a reciept will
    not occur. In particular, I think the Vary: A-IM should prevent
    such events.  But what if there is a subtle case that Vary doesn't
    cover, or (more likely) a not-quite-kosher proxy.

That's the main reason why we use the 226 (Delta) or 227 (Range
of Delta) response codes.  HTTP caches do not store responses if
they don't understand the response code (this is explicit in
HTTP/1.1, and apparently true for known implementations of
HTTP/1.0).

"Vary: A-IM" is not actually sufficient, since an HTTP/1.0
cache would ignore that.

    To account for such mishaps, a special "IM" coding could be added
    at the end of the content-coding. This would be discarded by
    IM-aware clients. However, as an unrecognized coding, it  would
    signal to non-IM-aware clients that there is more here then a
    normal content-encoded response.

I don't think that is necessary.

By the way:

I'm toying with the idea of collapsing the two response status
codes from the existing draft ("Delta" and "Range of Delta") into
one code; e.g., 226 (Instance Manipulation Applied).  This would
require that a delta-aware proxy (one that *is* willing to cache
a Delta response) is also at least "aware" of Range, although it
would not actually need to implement Range.  So the "new" 226
would simply mean that the recipient needs to look for any of a
set of instance-manipulation-related headers in the response,
including both "IM" and "Content-Range".

The tricky part is that this wouldn't actually be sent if
the only instance manipulation were the Range selection,
which should instead yield the traditional 206 (Partial
Content) response.  But we're already stuck doing something
a little complex for combining Ranges and Deltas.

-Jeff

From danielh@crosslink.net  Thu Mar 30 11:05:50 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA19582; Thu, 30 Mar 2000 11:05:50 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA07644; Thu, 30 Mar 2000 11:05:50 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id LAA20825
	for <http-delta@pa.dec.com>; Thu, 30 Mar 2000 11:05:49 -0800 (PST)
Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id OAA11099 for <http-delta@pa.dec.com>; Thu, 30 Mar 2000 14:05:48 -0500
Message-Id: <200003301905.OAA11099@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Thu, 30 Mar 2000 13:56:36 -0500
To: http-delta@pa.dec.com
In-Reply-To: <200003301822.KAA31734@wera.pa.dec.com>
Subject: Re: adding an IM content coding
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

Jeff said

>I'm toying with the idea of collapsing the two response status codes from
>the existing draft ("Delta" and "Range of Delta") into one code; e.g.,
>226 (Instance Manipulation Applied).  This would require that a
>delta-aware proxy (one that *is* willing to cache a Delta response) is
>also at least "aware" of Range, although it would not actually need to
>implement Range.  So the "new" 226 would simply mean that the recipient
>needs to look for any of a set of instance-manipulation-related headers
>in the response, including both "IM" and "Content-Range".

I have not been a strong believer in the need for two codes, so collapsing
it to just a 226 doesn't bother me.

BTW
Reminder: If a
  Content-range: bytes=whatever
appears, then
  a) if IM: includes "range,vcdiff", the server is saying:
           "I computed a vcdiff delta, and then I extracted a ranges of this delta
  b) if IM: includes "vcdiff", the server is saying
         "I extracted a range from the base and current instance (perhaps after
         decoding), and then compute a delta of these ranges."

 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul  Thu Mar 30 14:57:29 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA26837; Thu, 30 Mar 2000 14:57:29 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <200003302257.OAA26837@wera.pa.dec.com>
To: http-delta
Subject: Another little bug in the delta spec: when to use Vary
Date: Thu, 30 Mar 2000 14:57:29 -0800
X-Mts: smtp

The other day, I wrote a message containing this example:

        HTTP/1.1 200 OK
        Date: Wed, 24 Dec 1997 14:00:00 GMT
        Etag: "abc"
        Content-encoding: qparse
        IM: gzip
        Vary: A-IM
    
and wrote:

    Note, however, that in order to prevent a cache from
    accidentally forwarding this status-200 response to a client
    that doesn't understand IM, it has to be labelled "Vary:
    A-IM".  (This probably should be done for all of my examples,
    although in the other cases, the 226 response status code
    will prevent incorrect treatment of misdirected delta
    responses.)

That's actually sort-of-wrong in two different ways.

First of all, the response status code probably ought to be
(following my suggestion earlier today)
	HTTP/1.1 226 Instance Manipulation Applied
or perhaps, for brevity (no sense in wasting bytes!)
	HTTP/1.1 226 IM Used
This is better than using Vary to prevent a proxy cache from
incorrectly forwarding the response - especially, since Vary
does not work for HTTP/1.0 proxies!

Second, the part about "this probably should be done for all of
my examples" was half right.  In the case where the instance
manipulation is something like gzip, I don't think you need
a Vary header.  This is probably also true if the request is
a simple Range selection, since the response is self-identifying.

But if the instance manipulation is a delta encoding, then the
result implicitly depends on the entity tag in the request's
If-None-Match header.

For example, consider this sequence of events (with some
of the mandatory headers ellided for simplicity):

(1) Client A sends, via a proxy
	GET /foo.html HTTP/1.1
	If-None-Match: "abc"
	A-IM: vcdiff

(2)  and the origin server responds
	HTTP/1.1 226 IM Used
	IM: vcdiff
	Etag: "ghi"

which is forwarded to A and also stored by the proxy cache.

(3) Client B sends, via the same proxy
	GET /foo.html HTTP/1.1
	If-None-Match: "def"
	A-IM: vcdiff

Can the proxy use its cached response to reply to client B?
No, because the delta is computed from the wrong base instance.

But there is nothing in the origin server's response (in step
#2) that would prevent this error.

We could make up elaborate rules on how caching proxies handle
226 responses, but they would effectively end up being equivalent
in effect to requiring the origin server to send
	Vary:If-None-Match
in its response at step #2.  Which, unfortunately, adds 20
bytes of header to a delta-encoding response, but seems to
be the Right Thing to do.

Note that it does not appear to be sufficient to require
the use of Delta-Base in the origin server's response.  This
does allow the recipient to check the results (i.e., it
makes the responses self-identifying), but it would still
require adding special-purpose rules to the proxy implementation.
Which, I believe, is not the Right Thing.  Although we
might add a rule allowing a proxy to ignore the "Vary: If-None-Match"
if it is willing to interpret the Delta-Base header, since this
could allow a high cache hit ratio when two clients send
overlapping but non-identical lists of entity tags in their
If-None-Match headers.

For some reason, I didn't get this right in the previous
I-Ds for delta encoding.  Oops.  Although maybe it was
obvious to anyone who tried to implement origin server
support for delta encoding?

-Jeff

From danielh@crosslink.net  Thu Mar 30 19:01:46 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id TAA21888; Thu, 30 Mar 2000 19:01:46 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA07565; Thu, 30 Mar 2000 19:01:45 -0800
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id TAA20629
	for <http-delta@pa.dec.com>; Thu, 30 Mar 2000 19:01:45 -0800 (PST)
Received: from smtp.crosslink.net (dyn60.c5200-2.springfield.236.crosslink.net [207.199.142.189]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id WAA32336 for <http-delta@pa.dec.com>; Thu, 30 Mar 2000 22:01:38 -0500
Message-Id: <200003310301.WAA32336@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Thu, 30 Mar 2000 21:52:21 -0500
To: http-delta@pa.dec.com
In-Reply-To: <200003302257.OAA26837@wera.pa.dec.com>
Subject: when to use Vary
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

>We could make up elaborate rules on how caching proxies handle 226
>responses, but they would effectively end up being equivalent in effect
>to requiring the origin server to send
>	Vary:If-None-Match
>in its response at step #2.  Which, unfortunately, adds 20
>bytes of header to a delta-encoding response, but seems to
>be the Right Thing to do.

Could an abbreviation be used? Say ...
   Vary: IFN
Non delta-aware proxies would find no match, and properly not use a cached
response.  Delta-aware proxies would know that IFN means
"if-none-match".

>For some reason, I didn't get this right in the previous
>I-Ds for delta encoding.  Oops.  Although maybe it was
>obvious to anyone who tried to implement origin server
>support for delta encoding?

Didn't think of it either. Which suggests the need for paragraph or two
listing
 "what origin servers should to to ensure that IM containing responses are
   properly handled by proxies of various vintages"

-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Fri Mar 31 09:56:07 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id JAA09375; Fri, 31 Mar 2000 09:56:07 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA18191; Fri, 31 Mar 2000 09:56:07 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id JAA23331; Fri, 31 Mar 2000 09:56:07 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200003311756.JAA23331@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: when to use Vary 
In-Reply-To: Your message of "Thu, 30 Mar 2000 21:52:21 EST."
             <200003310301.WAA32336@lycanthrope.crosslink.net> 
Date: Fri, 31 Mar 2000 09:56:07 -0800
X-Mts: smtp

<danielh@crosslink.net> writes:
    
    >We could make up elaborate rules on how caching proxies handle 226
    >responses, but they would effectively end up being equivalent in effect
    >to requiring the origin server to send
    >	Vary:If-None-Match
    >in its response at step #2.  Which, unfortunately, adds 20
    >bytes of header to a delta-encoding response, but seems to
    >be the Right Thing to do.
    
    Could an abbreviation be used? Say ...
       Vary: IFN
    Non delta-aware proxies would find no match, and properly not use a
    cached response.  Delta-aware proxies would know that IFN means
    "if-none-match".
    
No, this is not how Vary works.  Vary says "If the request contained
the named header, then if you cache this response, you cannot use
it to answer a subsequent request unless the named header and
its value matches exactly the header & value for the current request."
(Sorry, that's probaly not the most elegant way to put it).

And we can't use a different request header than If-None-Match,
since we want a non-delta-capable server to do a traditional
conditional request.

So it has to be "Vary: If-None-Match", period.

Or we need to insist that a proxy that caches a 226 response
needs to be fully aware of the matching rules, which might
be a possible option, but it's more complex to specify.
And we have to make the decision now, during the protocol design
phase, not after anything has been deployed.

-Jeff

From mogul@pa.dec.com  Wed Apr  5 16:45:02 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA21895; Wed, 5 Apr 2000 16:45:02 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA29356; Wed, 5 Apr 2000 16:45:02 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA23630; Wed, 5 Apr 2000 16:45:02 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200004052345.QAA23630@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Still wrestling with: when/whether to use Vary 
Date: Wed, 05 Apr 2000 16:45:02 -0700
X-Mts: smtp

I am almost done with a re-write of the Delta specification.
The one significant problem that I am still trying to resolve
is how to handle the caching of 226 (IM Used) responses.

Last week, I wrote:
    We could make up elaborate rules on how caching proxies handle
    226 responses, but they would effectively end up being equivalent
    in effect to requiring the origin server to send
	    Vary:If-None-Match
    in its response at step #2.  Which, unfortunately, adds 20
    bytes of header to a delta-encoding response, but seems to
    be the Right Thing to do.
    
    Note that it does not appear to be sufficient to require
    the use of Delta-Base in the origin server's response.  This
    does allow the recipient to check the results (i.e., it
    makes the responses self-identifying), but it would still
    require adding special-purpose rules to the proxy implementation.
    Which, I believe, is not the Right Thing.

But I gave this some more thought, and I decided that the right
approach was to either let the cache implement the "elaborate
rules" (which aren't TOO elaborate), or simply not cache 226
responses.  (A cache could store the result of decoding a 226
response, as a 200 or 206 cache entry.)

So, here are the rules that I came up with:

   A status-226 cache entry MUST NOT be used in response to a subsequent
   request under any of these conditions (a cache that never stores
   status-226 responses may ignore these tests):

      1. If any of the instance-manipulation values from the IM
         header field in the cached response do not appear in the
         subsequent request's A-IM header field.  The comparison
         between the headers is done using an exact match on each
         instance-manipulation value including any associated
         inparams values (see section 12.1).

      2. If the order of instance-manipulation values appearing in
         the cached IM header field differs from the order of that
         set of instance-manipulations in the A-IM header field of
         the subsequent request.

      3. If the cache implementation is not aware of the
         specification of any of the instance-manipulation values
         in the cached IM header field.

      4. If any of the instance-manipulation values in the cached
         IM header field is a delta-coding, and the cache entry
         includes a Base-Instance header field, and that
         Base-Instance entity tag is not one of the entity tags
         listed in an If-None-Match header field of the subsequent
         request.

      5. If any of the instance-manipulation values in the cached
         IM header field is a delta-coding, the cache entry does
         not include a Base-Instance header field, and the
         If-None-Match header field of the request that led to that
         cache entry does not match the If-None-Match header field
         of the subsequent request.

Rule #3 is the key rule - it allows us to add new instance-manipulations,
including delta-codings, rsync, and yet-to-be-determined technologies,
without worrying about whether caches would inappropriately store
the results.

Rules #1 and #2 allow a cache more flexibility (and hence potentially
a higher hit rate) than simply requiring "Vary: A-IM".  Rules #4 and
#5 allow more flexibility than simply requiring "Vary: If-None-Match".

So, for example, if Client A sends
	GET /foo.html HTTP/1.1
	If-None-Match: "abc", "def"
	A-IM: vcdiff, gdiff
and gets the response
	HTTP/1.1 226 IM Used
	Base-Instance: "def"
	ETag: "pqr"
	IM: vcdiff
which is then cached, and the Client B sends
	GET /foo.html HTTP/1.1
	If-None-Match: "abc", "ghi"
	A-IM: diffe, vcdiff
the cache entry is usable as a reply.

If we were to require the use of Vary, e.g.:
	HTTP/1.1 226 IM Used
	Base-Instance: "def"
	ETag: "pqr"
	IM: vcdiff
	Vary:A-IM,If-None-Match
then the cached response to A's request would not be useful
for B's request, and it would be 25 bytes longer, too.
    
Also, I would expect that in many cases the 226 response
would not be cached in any case, which makes the Vary
headers superfluous overhead.

However, this part is tricky enough that I would really
appreciate it if other people could review it carefully
(either now or in a day or so, when I have a full draft
ready to go).

-Jeff

From danielh@crosslink.net  Wed Apr  5 19:03:11 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id TAA07130; Wed, 5 Apr 2000 19:03:11 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA15174; Wed, 5 Apr 2000 19:03:10 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id TAA11480
	for <http-delta@pa.dec.com>; Wed, 5 Apr 2000 19:03:09 -0700 (PDT)
Received: from smtp.crosslink.net (dyn48.c5200-1.springfield.236.crosslink.net [207.199.142.49]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id WAA05290 for <http-delta@pa.dec.com>; Wed, 5 Apr 2000 22:03:05 -0400
Message-Id: <200004060203.WAA05290@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Wed, 05 Apr 2000 21:38:21 -0300
To: http-delta@pa.dec.com
In-Reply-To: <200004052345.QAA23630@wera.pa.dec.com>
Subject: Re: Still wrestling with: when/whether to use Vary
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

Jeffery wrote:
>Last week, I wrote:
>    We could make up elaborate rules on how caching proxies handle
>    226 responses, but they would effectively end up being equivalent
>......
>was to either let the cache implement the "elaborate rules" (which aren't
>TOO elaborate), or simply not cache 226 responses.  (A cache could store
>the result of decoding a 226 response, as a 200 or 206 cache entry.)

>So, here are the rules that I came up with:

>   A status-226 cache entry MUST NOT be used in response to a subsequent
>   request under any of these conditions (a cache that never stores
>   status-226 responses may ignore these tests):

>      1. If any of the instance-manipulation values from the IM
>         header field in the cached response do not appear in the
>         subsequent request's A-IM header field.  The comparison
>         between the headers is done using an exact match on each
>         instance-manipulation value including any associated
>         inparams values (see section 12.1).
Assuming that "inparams values" are things like "mode=1" in
"foodiff;mode=1"

>      2. If the order of instance-manipulation values appearing in
>         the cached IM header field differs from the order of that
>         set of instance-manipulations in the A-IM header field of
>         the subsequent request.
Interesting -- you are carrying forward the position of the earlier draft
that "the server must adhere to the ordering of encodings supplied by the
client". 
That is, if the client provides (let's assume that there are no
content-encodings).
    A-IM: gzip,gdiff
then if the server wants to use both encodings, it MUST first gzip, and
then gdiff (even though that may yield a much large response then gdiff
first, followed by gzip).

I never much liked that requirement, so I want to be sure that you really
think it should be maintained (I'm willing to cede to your judgement, so
long at it is a "considered judgement")


>      3. If the cache implementation is not aware of the
>         specification of any of the instance-manipulation values
>         in the cached IM header field.
Good point. 
For example:  if "rsync" were included as an A-IM, it
might be used along with a new header to (say, Rsync-Signature)  pass
crucial information (that is, instead of in If-None-Match.   By requiring
the proxy to know such intricacies of rsync as
an IM coding,  the IM-aware but rsync-unaware
proxy (that knows nothing about checking "Rsync-signature") will never
inappropriately return the cached response.


>      4. If any of the instance-manipulation values in the cached
>         IM header field is a delta-coding, and the cache entry
>         includes a Base-Instance header field, and that
>         Base-Instance entity tag is not one of the entity tags
>         listed in an If-None-Match header field of the subsequent
>         request.

What if there is only one etag in the original If-None-Match, and no
Base-Instance was returned (say, since the server figures it wasn't
necessary)?  The lack of Base-Instance etag should not disallow caching. 
Your rule 5 covers some of these cases, but not all of them.

Hmm, actually, you could amend the above and say
   "If the response contained no Base-Instance header, but the 
    request contained only one etag in it's If-None-Match header,
    then the server may implicitily add (for internal use only) a
   Base-Instance header containing this single etag's value.

  Which would take care of this special case.

>      5. If any of the instance-manipulation values in the cached
>         IM header field is a delta-coding, the cache entry does
>         not include a Base-Instance header field, and the
>         If-None-Match header field of the request that led to that
>         cache entry does not match the If-None-Match header field
>         of the subsequent request.

Is this strictly necessary, given my above "addendum".  I mean, will there
EVER be a case where the server does NOT include a Base-Instance header
when the request contained more then 1 etag in If-None-Match?


>Rule #3 is the key rule - it allows us to add new instance-manipulations,
>including delta-codings, rsync, and yet-to-be-determined technologies,
>without worrying about whether caches would inappropriately store the
>results.
Yes.  But it might be necessary to add:

    These caching rules apply to the use of delta encoding, and  
    simpler compressions (such as Gzip and deflat) as an Instance
    Manipulation. Future Instance Manipulations (such as rsync)
    may require their own additonal rules (such as the need  to
    check the rsync-signature header).

>Rules #1 and #2 allow a cache more flexibility (and hence potentially a
>higher hit rate) than simply requiring "Vary: A-IM".  Rules #4 and #5
>allow more flexibility than simply requiring "Vary: If-None-Match".
Good point.


>However, this part is tricky enough that I would really
>appreciate it if other people could review it carefully
>(either now or in a day or so, when I have a full draft
>ready to go).


One last issue, and it's more of a philosophical/political point. I've
come to see that the notion of "instance manipuation" is useful  and
proper. But, I'm just one lonely (and rather untested) voice.  I wonder
what longer term players in the http spec endeavour would say.  After all,
it is a significant addition to how a request should be handled, and there
might get folks nervous (or worse, treat it as not worth the risk of
muddying the waters).

-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Thu Apr  6 14:53:31 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA30617; Thu, 6 Apr 2000 14:53:31 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA27491; Thu, 6 Apr 2000 14:53:31 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA29119; Thu, 6 Apr 2000 14:53:31 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200004062153.OAA29119@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: Still wrestling with: when/whether to use Vary 
In-Reply-To: Your message of "Wed, 05 Apr 2000 21:38:21 -0300."
             <200004060203.WAA05290@lycanthrope.crosslink.net> 
Date: Thu, 06 Apr 2000 14:53:31 -0700
X-Mts: smtp

<danielh@crosslink.net> writes:
    >      2. If the order of instance-manipulation values appearing in
    >         the cached IM header field differs from the order of that
    >         set of instance-manipulations in the A-IM header field of
    >         the subsequent request.
    Interesting -- you are carrying forward the position of the earlier
    draft that "the server must adhere to the ordering of encodings
    supplied by the client".
    That is, if the client provides (let's assume that there are no
    content-encodings).
	A-IM: gzip,gdiff
    then if the server wants to use both encodings, it MUST first gzip,
    and then gdiff (even though that may yield a much large response
    then gdiff first, followed by gzip).

Well, I would say that in this case, even if the server "wants"
to use both encodings, it needs to make a reasonable choice
between using just one (which avoids the ordering issue) or
using both, but in a non-optimal order.  My guess is that using
just one is the right thing to do here.

The problem is that if we don't give the client a means to
insist on an ordering, then the client cannot control the
semantics of the resulting delta, and so what the server "wants"
to do here could be useless to the client.

    I never much liked that requirement, so I want to be sure that you
    really think it should be maintained (I'm willing to cede to your
    judgement, so long at it is a "considered judgement")
    
I suppose one option would be to provide a way for the client
to express "ordering doesn't matter to me" - except that I think
in most cases, it actually does matter (and I haven't come up
with a non-kludgey way of expressing this).  So I'm inclined
to treat this as a "considered judgement".
    
    >      4. If any of the instance-manipulation values in the cached
    >         IM header field is a delta-coding, and the cache entry
    >         includes a Base-Instance header field, and that
    >         Base-Instance entity tag is not one of the entity tags
    >         listed in an If-None-Match header field of the subsequent
    >         request.
    
    >      5. If any of the instance-manipulation values in the cached
    >         IM header field is a delta-coding, the cache entry does
    >         not include a Base-Instance header field, and the
    >         If-None-Match header field of the request that led to that
    >         cache entry does not match the If-None-Match header field
    >         of the subsequent request.
    
    What if there is only one etag in the original If-None-Match, and
    no Base-Instance was returned (say, since the server figures it
    wasn't necessary)?  The lack of Base-Instance etag should not
    disallow caching.  Your rule 5 covers some of these cases, but not
    all of them.
    
Which cases aren't covered by rules #4 and #5?  Your "what if"
example seems to be:
    Original client sends:
	GET /foo.html HTTP/1.1
	If-None-Match: "abc"
	A-IM:vcdiff
    Server responds:
	HTTP/1.1 226 IM Used
	Etag: "pqr"
	IM:vcdiff

and then the new request (the one that *might* be a cache hit) is
    New client sends:
	GET /foo.html HTTP/1.1
	If-None-Match: "abc"
	A-IM:vcdiff

is allowed as a cache hit by rule 4 (since there is no Delta-Base
[which is the correct header name, I screwed up when I wrote these
rules] in the cache entry), and also by rule 5 (since the If-None-Match
headers match).

Remember, rules #1-5 are the conditions where cache hits are NOT
allowed, so any situation that does NOT match any of these rules
is OK as far as caching goes.

    Hmm, actually, you could amend the above [rule #4] and say
       "If the response contained no Base-Instance header, but the 
	request contained only one etag in it's If-None-Match header,
	then the server may implicitily add (for internal use only) a
       Base-Instance header containing this single etag's value.
    
      Which would take care of this special case.

"Server"? or do you mean "cache"?

Anyway, that's not necessary, since rule #5 already allows a
cache hit in this case.

    Is this [rule #5] strictly necessary, given my above "addendum".  I
    mean, will there EVER be a case where the server does NOT include a
    Base-Instance header when the request contained more then 1 etag in
    If-None-Match?
    
No (well, not if I had remembered to call it "Delta-Base" instead
of Base-Instance!).  Delta-Base MUST be included if there is more
than one entity-tag in the If-None-Match (this has always been
in the Delta spec).
    
    >Rule #3 is the key rule - it allows us to add new instance-manipulations,
    >including delta-codings, rsync, and yet-to-be-determined technologies,
    >without worrying about whether caches would inappropriately store the
    >results.
    Yes.  But it might be necessary to add:
    
	These caching rules apply to the use of delta encoding, and  
	simpler compressions (such as Gzip and deflat) as an Instance
	Manipulation. Future Instance Manipulations (such as rsync)
	may require their own additonal rules (such as the need  to
	check the rsync-signature header).
    
I'll add something like that.

-Jeff

From mogul  Thu Apr  6 16:04:00 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA11687; Thu, 6 Apr 2000 16:04:00 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200004062304.QAA11687@wera.pa.dec.com>
To: http-delta
Subject: New Delta-encoding draft for your review
Date: Thu, 06 Apr 2000 16:04:00 -0700
X-Mts: smtp

This is NOT yet ready for submission to the IETF, but I wanted
to give the members of the http-delta list a chance to review this:
 ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-delta-04.6april2000.txt

It's actually longer than the previous (-03) draft, partly because
I added some of the explanations, clarifications, and examples
that seemed to be helpful on the mailing list discussion.   And
partly because there are some newish concepts.

Some stuff got removed, however.

-Jeff

From danielh@crosslink.net  Thu Apr  6 19:54:51 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id TAA11570; Thu, 6 Apr 2000 19:54:50 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA32054; Thu, 6 Apr 2000 19:54:49 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id TAA07746
	for <http-delta@pa.dec.com>; Thu, 6 Apr 2000 19:54:49 -0700 (PDT)
Received: from smtp.crosslink.net (dyn36.c5200-3.springfield.236.crosslink.net [207.199.145.101]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id WAA21199 for <http-delta@pa.dec.com>; Thu, 6 Apr 2000 22:54:47 -0400
Message-Id: <200004070254.WAA21199@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Thu, 06 Apr 2000 22:52:48 -0400
To: http-delta@pa.dec.com
Subject: comments on draft 4
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

I'm impressed -- it's quite coherent for a first revision!
I do have the following comments.

Comments:

On page 6 there is

  One can think of an instance as a snapshot in the life of a resource.

I think might be confusing -- it confused me in the earlier draft. In
particular, when I read "snapshot", I think of "pre content-encoding" --
it's the acutal contents that the client will see (or hear, or execute), 
irrespective of content-encoding that may have been applied for 
transmission/parsing/whatever efficiencies.  

Somehow, we need wording that emphasizes that this "life moment" is after 
content encoding (after all, content encoding can be applied on the fly). 
 If this is too cumbersome to say (and I can't think of a good way to  say
it), we'ld be better of dropping this otherwise pithy-in-a-good-way 
statement.  


Page 14:
      - It already has a cached response for that resource, whose
        entity tag is ``123xyz''.

To remind the reader that "instance" is the relevant concept, how about
      - It already has a cached response for that resource (that is, 
        a cached instnce), whose entity tag is ``123xyz''.

Page 16:

   transmission of unnecessary bytes, and this Reason-phase should not

should that be "reason-phrase"??


Page 21

  This response tells the client to apply the delta to the cached
   response with entity tag ``337pey'', and to associate the entity tag
   ``1acl059'' with the result.

It's might be a bit redundant, but I'ld add.

  This response tells the client to apply the delta to the cached
   response with entity tag ``337pey'', and to associate the entity tag
   ``1acl059'' with the result.

        Note that ``1acl059'' refers to the result of applying the delta
        to the cached response, it does NOT refer to the delta itself.
        That is, ``1acl059'' refers to the actual instance the server
would
        have sent if delta-encoding was not attempted.


Page 24

   Once a cluster-eligible response is cached, when the client is about
   to make a subsequent request, it would match the request-URI against
   all of the URL-prefixes in its cache.  The ``If-None-Match'' field in
   its request could then list the entity tags for all of the matching
   entries.  In some cases, it might be more efficient to list only a
   subset (such as the most recently received cache entries), to avoid
   excessive request header lengths.

If I correctly recollect earlier discussion of this point, then the  above
doesn't read broadly enough. How about...

   Once  cluster-eligible responses are cached, when the client is about
   to make a subsequent request, it would:
     a) match the request-URI against all of the URL-prefixes
       in its cache. 
     b) the client would then find all cache entires that started
        with one of these URL-prefixes; and only use cached entries
        recieved AFTER the URL-prefix was identified (that is, after 
        the response containing the DCluster that identifies the 
        URL-prefix).
   The ``If-None-Match'' field in ......



page 36;

      4. If any of the instance-manipulation values in the cached
         IM header field is a delta-coding, and the cache entry
         includes a Delta-Base header field, and that Delta-Base
         entity tag is not one of the entity tags listed in an
         If-None-Match header field of the subsequent request.


I would advocate modifying 4, to say

      4. If any of the instance-manipulation values in the cached
         IM header field is a delta-coding, and the cache entry
         includes a Delta-Base header field, and that Delta-Base
         entity tag is not one of the entity tags listed in an
         If-None-Match header field of the subsequent request.
              
         In some cases, a cache may implicitily define a Delta-Base header
         when the server neglects to add one. In pearticular, when the
server
         sends a delta response to a request that specified only one
         etag in an If-None-Match request header.
              

page 38
     1. If both the new (delta) response and the cached response
         have exactly the same set of content-codings, the client
         applies the delta response to the cached response without
         removing the content-codings from either response.

Might it be better to say "instance" instead of "response"

    2. If the new (delta) response and the cached response have a
         different set of content-codings, the client decodes the
         content-codings from both the delta response and the
         cached response, before applying the delta.

I'ld add 

    2. If the new (delta) response and the cached response have a
         different set of content-codings, the client decodes the
         content-codings from both the delta response and the
         cached response, before applying the delta. This implies
         that the server created the delta by first  content-decoding the
         current and base instance, and the applying the delta.

Page 39
   The body of this response would be the result of
   VCDIFF_DELTA(GUNZIP(GZIP(foo.html;"abc")), foo.html;"ghi"), or more
   simply VCDIFF_DELTA(foo.html;"abc", foo.html;"ghi").  The client
   would store as a new cache entry the entity foo.html;"ghi" (i.e.,
   without any content-coding), after recovering that entity by applying
   the delta to its previous cache entry.

I'ld add

   The body of this response would be the result of
   VCDIFF_DELTA(GUNZIP(GZIP(foo.html;"abc")), foo.html;"ghi"), or more
   simply VCDIFF_DELTA(foo.html;"abc", foo.html;"ghi").  The client
   would store as a new cache entry the entity foo.html;"ghi" (i.e.,
   without any content-coding), after recovering that entity by applying
   the delta to the an GZIP'ed version of its previous cache entry.

pg 41
   Note that a client might accept compression either as a
   content-coding or as an instance-manipulation.  For example:

       Accept-Encoding: gzip
       A-IM: gzip, diffe

Since diffe doesn't work on binary files, I'ld change diffe to gdiff.




-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul  Fri Apr 14 13:06:28 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA07652; Fri, 14 Apr 2000 13:06:28 -0700 (PDT)
Message-Id: <200004142006.NAA07652@wera.pa.dec.com>
To: http-delta
From: <danielh@crosslink.net>
X-Originally-To: <mogul@pa.dec.com>
Reply-To: danielh@crosslink.net
X-Original-Date: Thu, 06 Apr 2000 20:48:24 -0400
In-Reply-To: <200004062153.OAA29119@wera.pa.dec.com>
Subject: Re: Still wrestling with: when/whether to use Vary
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 
Date: Fri, 14 Apr 2000 13:06:28 -0700
Sender: mogul
X-Mts: smtp

[Note from Jeff: I'm remailing this to the list, because it
appears that Daniel meant to send it to everyone.  Sorry
for the delay, I've been swamped this week.]

Daniel said 
>>    ....then if the server wants to use both encodings, it MUST first gzip,
>>    and then gdiff (even though that may yield a much large response
>>    then gdiff first, followed by gzip).

Jeffery responded:
>Well, I would say that in this case, even if the server "wants" to use
>both encodings, it needs to make a reasonable choice between using just
>one (which avoids the ordering issue) or using both, but in a non-optimal
>order.  My guess is that using just one is the right thing to do here.

That's what I did before (just use one).

>....I suppose one option would be to provide a way for the client to express
>"ordering doesn't matter to me" - except that I think in most cases, it
>actually does matter (and I haven't come up with a non-kludgey way of
>expressing this).  So I'm inclined to treat this as a "considered
>judgement".

Good enough.  Sensible clients should be aware of these issues anyways.


>    
>    >      4. If any of the instance-manipulation values in the cached
>    >         IM header field is a delta-coding, and the cache entry
>    >         includes a Base-Instance header field, and that
>    >         Base-Instance entity tag is not one of the entity tags
>    >         listed in an If-None-Match header field of the subsequent
>    >         request.
>    
>    >      5. If any of the instance-manipulation values in the cached
>    >         IM header field is a delta-coding, the cache entry does
>    >         not include a Base-Instance header field, and the
>    >         If-None-Match header field of the request that led to that
>    >         cache entry does not match the If-None-Match header field
>    >         of the subsequent request.
>    
>>    What if there is only one etag in the original If-None-Match, and
>>    no Base-Instance was returned (say, since the server figures it
>>    wasn't necessary)?  The lack of Base-Instance etag should not
>>    disallow caching.  Your rule 5 covers some of these cases, but not
>>    all of them.
    
>Which cases aren't covered by rules #4 and #5?  Your "what if" example
>seems to be:
>    Original client sends:
>	GET /foo.html HTTP/1.1
>	If-None-Match: "abc"
>	A-IM:vcdiff
>    Server responds:
>	HTTP/1.1 226 IM Used
>	Etag: "pqr"
>	IM:vcdiff
>and then the new request (the one that *might* be a cache hit) is
>    New client sends:
>	GET /foo.html HTTP/1.1
>	If-None-Match: "abc"
>	A-IM:vcdiff

>is allowed as a cache hit by rule 4 (since there is no Delta-Base [which
>is the correct header name, I screwed up when I wrote these rules] in the
>cache entry), and also by rule 5 (since the If-None-Match headers match).

>Remember, rules #1-5 are the conditions where cache hits are NOT allowed,
>so any situation that does NOT match any of these rules is OK as far as
>caching goes.

At a later date, a client sends:
	GET /foo.html HTTP/1.1
	If-None-Match: "abc","cba"  
	A-IM:vcdiff

Here, the If-None-Headers do NOT match, but the cache should response
(since the implicit Delta-Base is "abc"). Which would be solved by ...

>>    Hmm, actually, you could amend the above [rule #4] and say
>>       "If the response contained no Base-Instance header, but the 
>>	request contained only one etag in it's If-None-Match header,
>>	then the server may implicitily add (for internal use only) a
>>       Base-Instance header containing this single etag's value.
>>    
>>      Which would take care of this special case.

>"Server"? or do you mean "cache"?
Oops, you are right. I meant "cache" (or "proxy" server)



- -----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
- -----------------------------------------------------------


------- End of Forwarded Message


From mogul  Fri Apr 14 13:14:07 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA16910; Fri, 14 Apr 2000 13:14:06 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200004142014.NAA16910@wera.pa.dec.com>
To: http-delta
Subject: Forwarded from Koen Holtman: comments on new delta draft
Date: Fri, 14 Apr 2000 13:14:06 -0700
X-Mts: smtp


------- Forwarded Message

Return-Path: koen@win.tue.nl
From: koen@win.tue.nl (Koen Holtman)
Message-Id: <200004121825.UAA09007@wsooti09.win.tue.nl>
Subject: comments on new delta draft
To: mogul@pa.dec.com (Jeffrey Mogul)
Date: Wed, 12 Apr 2000 20:25:41 +0200 (MET DST)

Hi Jeff,

I have read about half of the new delta encoding draft (intermediate
version dated 6 april) now, and want to give some advance comments.
Please forward these to the appropriate discussion list.

Overall, what I have seen of the design looks sound.  To my taste it
is quite heavy for an optimisation mechanism, I would tend towards
simplifying the thing by not handling all range scenarios, but that is
my personal feeling so you don't need to pay any attention to that.

Section 3 looks like a sound formalisation of response production in
HTTP/1.1 to me, I don't expect that these will be any interoperability
problems if you go with this model.  Your model implies that a
'variant' is a thing that does not have a content encoding but may
have one applied to it later on -- I think it is closer to the
original intention of HTTP/1.1 if you describe a `variant' as
something that already has a content-encoding or not -- i.e. the
content-encoding choice gets made during variant selection.  But in
any case the exact meaning of the term 'variant' has no impact on what
your specification requires on the wire, as far as I can see, so this
is not a critical thing to get right.

The delta mechanism allows a new response to be generated by merging
two cached responses (or one cached and one current).  In that case
the question arises what age and which cache-control header the
new response should have.  You should formalise this somewhere,
e.g. specify that the age should be the max() of the two ages, and the
cache control headers are those of the oldest(?) response.  Finding the
best rules may require some thought.  All this may of course be in the
part of the draft I have not read yet.

I am unhappy with the security considerations section.  I think
it is essential that the draft _requires_ all parties to implement a
watertight protection against the spoofing attacks you describe.   The
current language makes security optional.

I have been thinking about how to prevent spoofing.  The cryptographic
checksum method you describe protects against some things, though I
have not yet closely studied it for holes.  It seems to me that it
does not protect against the following attack:

1) victim.org has copyrighted content at victim.org/x.html

2) an end user does a request on attacker.org/y.html.  

3) attacker.org wants the user to see some of the copyrighted content
at victim.org/x.html as part of its own web page attacker.org/y.html,
without the attacker.org site ever literally sending this copyrighted
content from victim.org in a HTTP response it generates.  This might
make attacker.org immune to prosecution for copyright violation by
victim.org, while attacker.org still has the benefit of putting its
brand and advertising on material created by victim.org.

4) it looks to me that 3) can be achieved by attacker.org sending a
delta response with a Dcluster header that includes victim.org/x.html,
and the etag of the 'current' victim.org/x.html response, together
maybe with the right cryptographic checksum.  The browser (or
intermediate proxy), if it has cached victim.org/x.html earlier of
course, will than take that copyrighted material, apply the delta, and
display it to the user.

In any case, I think it is easy to have spoofing protection against
many attacks, including the one above, using the following rule:

- - if a cache has a delta response from URL X and wants to apply this
  to a response from URL Y, then this MUST only be done if

  a) X and Y are octet-wise identical

  or

  b) X has a Dcluster or Dtemplate header that points to or includes Y 
     _AND_
     Y has a Dcluster or Dtemplate header that points to or includes X

So in case b), material can only be merged if both sources explicitely
recognise the existence of the other as a 'partner'.

That is all I have for now.

Koen.  


------- End of Forwarded Message


From mogul@pa.dec.com  Fri Apr 14 14:00:27 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA20036; Fri, 14 Apr 2000 14:00:27 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA25494; Fri, 14 Apr 2000 14:00:27 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA18249; Fri, 14 Apr 2000 14:00:26 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200004142100.OAA18249@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: Still wrestling with: when/whether to use Vary 
In-Reply-To: Your message of "Fri, 14 Apr 2000 13:06:28 PDT."
             <200004142006.NAA07652@wera.pa.dec.com> 
Date: Fri, 14 Apr 2000 14:00:26 -0700
X-Mts: smtp

Regarding this:
>    
>    >      4. If any of the instance-manipulation values in the cached
>    >         IM header field is a delta-coding, and the cache entry
>    >         includes a Base-Instance header field, and that
>    >         Base-Instance entity tag is not one of the entity tags
>    >         listed in an If-None-Match header field of the subsequent
>    >         request.
>    
>    >      5. If any of the instance-manipulation values in the cached
>    >         IM header field is a delta-coding, the cache entry does
>    >         not include a Base-Instance header field, and the
>    >         If-None-Match header field of the request that led to that
>    >         cache entry does not match the If-None-Match header field
>    >         of the subsequent request.
>    
>>    What if there is only one etag in the original If-None-Match, and
>>    no Base-Instance was returned (say, since the server figures it
>>    wasn't necessary)?  The lack of Base-Instance etag should not
>>    disallow caching.  Your rule 5 covers some of these cases, but not
>>    all of them.
    
>Which cases aren't covered by rules #4 and #5?  Your "what if" example
>seems to be:
>    Original client sends:
>	GET /foo.html HTTP/1.1
>	If-None-Match: "abc"
>	A-IM:vcdiff
>    Server responds:
>	HTTP/1.1 226 IM Used
>	Etag: "pqr"
>	IM:vcdiff
>and then the new request (the one that *might* be a cache hit) is
>    New client sends:
>	GET /foo.html HTTP/1.1
>	If-None-Match: "abc"
>	A-IM:vcdiff

>is allowed as a cache hit by rule 4 (since there is no Delta-Base [which
>is the correct header name, I screwed up when I wrote these rules] in the
>cache entry), and also by rule 5 (since the If-None-Match headers match).

>Remember, rules #1-5 are the conditions where cache hits are NOT allowed,
>so any situation that does NOT match any of these rules is OK as far as
>caching goes.

<danielh@crosslink.net> writes:

    At a later date, a client sends:
	    GET /foo.html HTTP/1.1
	    If-None-Match: "abc","cba"  
	    A-IM:vcdiff
    
    Here, the If-None-Headers do NOT match, but the cache should response
    (since the implicit Delta-Base is "abc"). Which would be solved by ...

Ah, good point.  This is an optimization that might be useful
in some cases.  I see several fairly straightforward ways to
solve this:
	(1) say that servers "SHOULD" send Delta-Base (instead
	of "MAY") for responses where the base instance is
	unambiguously implicit in the request headers.
	(2) say that proxies "MAY" add a Delta-Base header,
	using the implied base-instance, to responses that
	they store or forward.

I've been avoiding #1 since the first draft of the document,
since this adds on-the-wire overhead.  I'm leaning towards #2,
since I can't think of any reason why a proxy that complies
with the spec shouldn't do this.

In either case, the example you offered becomes covered by
my rule #4, without modifications.

-Jeff

From mogul@pa.dec.com  Fri Apr 14 14:31:15 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA22327; Fri, 14 Apr 2000 14:31:15 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA21450; Fri, 14 Apr 2000 14:31:14 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA23628; Fri, 14 Apr 2000 14:31:14 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200004142131.OAA23628@wera.pa.dec.com>
To: http-delta@pa.dec.com
Cc: koen@win.tue.nl (Koen Holtman)
Subject: Re: Forwarded from Koen Holtman: comments on new delta draft 
In-Reply-To: Your message of "Fri, 14 Apr 2000 13:14:06 PDT."
             <200004142014.NAA16910@wera.pa.dec.com> 
Date: Fri, 14 Apr 2000 14:31:14 -0700
X-Mts: smtp

[I'm going to address Koen's security-related comments in a
separate message.]

    Overall, what I have seen of the design looks sound.  To my taste
    it is quite heavy for an optimisation mechanism, I would tend
    towards simplifying the thing by not handling all range scenarios,
    but that is my personal feeling so you don't need to pay any
    attention to that.

We've already run into a few design bugs that resulted from
not considering a wide enough range of scenarios.  I'd rather
take the time now to cover all of the possibilities, rather
than have to fix it again (or to discover, after systems are
deployed, that we did something wrong that can't be fixed.)

    Section 3 looks like a sound formalisation of response production
    in HTTP/1.1 to me, I don't expect that these will be any
    interoperability problems if you go with this model.  Your model
    implies that a 'variant' is a thing that does not have a content
    encoding but may have one applied to it later on -- I think it is
    closer to the original intention of HTTP/1.1 if you describe a
    `variant' as something that already has a content-encoding or not
    -- i.e. the content-encoding choice gets made during variant
    selection.  But in any case the exact meaning of the term 'variant'
    has no impact on what your specification requires on the wire, as
    far as I can see, so this is not a critical thing to get right.

I'm not too concerned about sticking to the original intention
of HTTP/1.1, because I think a lot of that was (and still is)
somewhat muddled.  The "content-coding" dimension of variant
representations (as discussed in section 12 of RFC2616) never
seemed to me to belong with the other dimensions (language,
character set, format), since all of the currently-defined
content-codings are loss-free and automatically invertible,
whereas there is (as of the foreseeable future) no automatic
way for a client to convert from English to Danish or vice
versa.

It might not be necessary to specify that content-coding happens
after variant selection (although compression codings surely
are applied after the *generation* of a text in a particular
natural language or character set, a selection between several
pre-compressed variant files could conceivably be done without
ordering these two steps).  So maybe my sequence is slightly
overdetermined here.  But I think it would just complicate things
to get all of these nuances into the document.

    The delta mechanism allows a new response to be generated by
    merging two cached responses (or one cached and one current).  In
    that case the question arises what age and which cache-control
    header the new response should have.  You should formalise this
    somewhere, e.g. specify that the age should be the max() of the two
    ages, and the cache control headers are those of the oldest(?)
    response.  Finding the best rules may require some thought.  All
    this may of course be in the part of the draft I have not read
    yet.

Good point.  Actually, I think section 13.5.3 (Combining Headers)
of RFC2616 already covers this, and (after a quick read) I don't
think the Delta I-D needs to do anything except to refer the
reader to that section of RFC2616.  But I'll give it a closer
look.

-Jeff

From danielh@crosslink.net  Fri Apr 14 14:46:23 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA20104; Fri, 14 Apr 2000 14:46:23 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA23226; Fri, 14 Apr 2000 14:46:23 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id OAA23809
	for <http-delta@pa.dec.com>; Fri, 14 Apr 2000 14:46:22 -0700 (PDT)
Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id RAA09797 for <http-delta@pa.dec.com>; Fri, 14 Apr 2000 17:46:21 -0400
Message-Id: <200004142146.RAA09797@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Fri, 14 Apr 2000 17:45:25 -0300
To: http-delta@pa.dec.com
In-Reply-To: <200004142014.NAA16910@wera.pa.dec.com>
Subject: Re: Forwarded from Koen Holtman: comments on new delta draft
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

Koen said:
>The delta mechanism allows a new response to be generated by merging two
>cached responses (or one cached and one current).  In that case the
>question arises what age and which cache-control header the new response
>should have.  You should formalise this somewhere, e.g. specify that the
>age should be the max() of the two ages, and the cache control headers
>are those of the oldest(?) response.  Finding the best rules may require
>some thought.  All this may of course be in the part of the draft I have
>not read yet.

Consider that a non-delta aware client would recieve the current response. 
This suggest that the cache-control should be that of the current response.
Similarly, the age should also be that of the current response. 

Basically, there is no reason that a stale (or otherwise unfresh) cached-instance 
can not be used as a delta-base -- given that etag's are unique across time (within
a cluster). So why worry about disposition (so long as the proxy or cache know
that use of  stale responses in delta requests does NOT change any
of the usual, conditional GET, restrictions)

>I am unhappy with the security considerations section.  I think it is
>essential that the draft _requires_ all parties to implement a watertight
>protection against the spoofing attacks you describe.   The current
>language makes security optional.

>I have been thinking about how to prevent spoofing.  The cryptographic
>checksum method you describe protects against some things, though I have
>not yet closely studied it for holes.  It seems to me that it does not
>protect against the following attack:
>1) victim.org has copyrighted content at victim.org/x.html
>2) an end user does a request on attacker.org/y.html.  
>3) attacker.org wants the user to see some of the copyrighted content at
>victim.org/x.html as part of its own web page attacker.org/y.html,
>without the attacker.org site ever literally sending this copyrighted
>content from victim.org in a HTTP response it generates.  This might make
>attacker.org immune to prosecution for copyright violation by victim.org,
>while attacker.org still has the benefit of putting its brand and
>advertising on material created by victim.org.
>4) it looks to me that 3) can be achieved by attacker.org sending a delta
>response with a Dcluster header that includes victim.org/x.html, and the
>etag of the 'current' victim.org/x.html response, together maybe with the
>right cryptographic checksum.  The browser (or intermediate proxy), if it
>has cached victim.org/x.html earlier of course, will than take that
>copyrighted material, apply the delta, and display it to the user.

I'm not sure that would be a hole.  For the  "attacker.org/x.html" response to be used
as a delta, the client (or proxy) would have had to have asked for a
delta -- a Dcluster only defines future options).  

What would be a problem is:
  a) client gets victim.org/x.html, with etag "abc"
  b) client gets attacker.org/y.html, with etag "bad1"
      attacker.org returns a dcluster that includes victim.org/x.html
  c) client ask for victim.org/x.html again, and sends if-none-match that
      includes "bad1" (since these two urls are now defined to be in the same cluster)
  d) if "bad1" happens to be an earlier etag of victim.org/x.html, then it would apply
    a delta against it's version of "bad1".  The client would then un-delta, using the attacker.org's
    version of "bad1". Which could cause victim.org's contents to be intermingled with
   attacker.org's contents.

A question: I can't recollect what the draft says, but it's my belief that :
  a) if a response to URL1 defines a cluster that includes URL2, 
  b) a later request to URL2 occurs (say, with etag of "pyx"
  c) then a subsequent re-request for URL1 will NOT include "pyx" 
That is, a dcluster returned with the latest instance of resourceX defines what other 
responses this instance can be used as delta base for;
but does NOT define what other instances can be used as future bases for  resourceX.

>In any case, I think it is easy to have spoofing protection against many
>attacks, including the one above, using the following rule:

>- - if a cache has a delta response from URL X and wants to apply this
>  to a response from URL Y, then this MUST only be done if
>  a) X and Y are octet-wise identical
>  or
>  b) X has a Dcluster or Dtemplate header that points to or includes Y 
>     _AND_
>     Y has a Dcluster or Dtemplate header that points to or includes X

Would you insist on  rule b when X and Y are to the same server?

>So in case b), material can only be merged if both sources explicitely
>recognise the existence of the other as a 'partner'.
Is this a MUST, or a strong SHOULD?



 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From danielh@crosslink.net  Fri Apr 14 14:47:14 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA09014; Fri, 14 Apr 2000 14:47:14 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA11775; Fri, 14 Apr 2000 14:47:14 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id OAA20488
	for <http-delta@pa.dec.com>; Fri, 14 Apr 2000 14:47:13 -0700 (PDT)
Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id RAA09993 for <http-delta@pa.dec.com>; Fri, 14 Apr 2000 17:47:12 -0400
Message-Id: <200004142147.RAA09993@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Fri, 14 Apr 2000 17:45:42 -0300
To: http-delta@pa.dec.com
In-Reply-To: <200004142100.OAA18249@wera.pa.dec.com>
Subject: Re: Still wrestling with: when/whether to use Vary
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

>    At a later date, a client sends:
>	    GET /foo.html HTTP/1.1
>	    If-None-Match: "abc","cba"  
>	    A-IM:vcdiff
>    
>    Here, the If-None-Headers do NOT match, but the cache should response
>    (since the implicit Delta-Base is "abc"). Which would be solved by
>...

>Ah, good point.  This is an optimization that might be useful in some
>cases.  I see several fairly straightforward ways to solve this:
>	(1) say that servers "SHOULD" send Delta-Base (instead
>	of "MAY") for responses where the base instance is
>	unambiguously implicit in the request headers.
>	(2) say that proxies "MAY" add a Delta-Base header,
>	using the implied base-instance, to responses that
>	they store or forward.

>I've been avoiding #1 since the first draft of the document, since this
>adds on-the-wire overhead.  I'm leaning towards #2, since I can't think
>of any reason why a proxy that complies with the spec shouldn't do this.

I agree -- #2 should be trivial to implement.

 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Fri Apr 14 18:04:36 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA30215; Fri, 14 Apr 2000 18:04:35 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA04728; Fri, 14 Apr 2000 18:04:35 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA24689; Fri, 14 Apr 2000 18:04:35 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200004150104.SAA24689@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: Forwarded from Koen Holtman: SECURITY comments on new delta draft 
In-Reply-To: Your message of "Fri, 14 Apr 2000 13:14:06 PDT."
             <200004142014.NAA16910@wera.pa.dec.com> 
Date: Fri, 14 Apr 2000 18:04:35 -0700
X-Mts: smtp

Koen Holtman writes:

    I am unhappy with the security considerations section.  I think it
    is essential that the draft _requires_ all parties to implement a
    watertight protection against the spoofing attacks you describe.
    The current language makes security optional.

I think we need to differ on this.  I think there are probably
plenty of circumstances where the risk of successful spoofing
is less than the cost of sending the digests.  I'm open to
making this a SHOULD-level requirement, but I think making
it a MUST-level requirement does not meet the IETF's criteria
in RFC2119 (which, I admit, are somewhat ambiguous).

At any rate, I think we should probably submit for Proposed
Standard without stricter requirements, then make sure that
we get an expert security review before progressing.  I'd
rather not assume that this spoofing stuff is the only security
issue!

    I have been thinking about how to prevent spoofing.  The
    cryptographic checksum method you describe protects against some
    things, though I have not yet closely studied it for holes.  It
    seems to me that it does not protect against the following attack:

    1) victim.org has copyrighted content at victim.org/x.html

    2) an end user does a request on attacker.org/y.html.

    3) attacker.org wants the user to see some of the copyrighted
    content at victim.org/x.html as part of its own web page
    attacker.org/y.html, without the attacker.org site ever literally
    sending this copyrighted content from victim.org in a HTTP response
    it generates.  This might make attacker.org immune to prosecution
    for copyright violation by victim.org, while attacker.org still has
    the benefit of putting its brand and advertising on material
    created by victim.org.
    
I'm not sure about copyright law outside the US, although I 
believe that it's now generally the same in most industrialized
countries.  In the US, I'm pretty sure there's a legal concept
called "contributory infringement" that would certainly make
attacker.org liable (at one point many years ago, I was warned
against publishing a document with the FTP address of an
RFC repository, since at the time the copyright status of
RFCs was hazy, and by helping people download them we could
have been guilty of contributory infringement).

It's also not clear that victim.org would be liable for copyright
infringement; it depends on whether the law requires "specific
intent" (meaning: victim.org intended to violate copyright) or
"general intent" (meaning: victim.org did some action that led
to a violation, without any intention of breaking the rules).

If you really care, I can check with our lab's lawyer, who
specializes in intellectual property law.

    In any case, I think it is easy to have spoofing protection against
    many attacks, including the one above, using the following rule:
    
    - - if a cache has a delta response from URL X and wants to apply this
      to a response from URL Y, then this MUST only be done if
    
      a) X and Y are octet-wise identical
    
      or
    
      b) X has a Dcluster or Dtemplate header that points to or includes Y 
	 _AND_
	 Y has a Dcluster or Dtemplate header that points to or includes X
    
    So in case b), material can only be merged if both sources explicitely
    recognise the existence of the other as a 'partner'.
    
I'm not sure if I understand how, in general, case (b) could arise.
Can you give a *plausible* scenario with specific message headers?

-Jeff

From mogul@pa.dec.com  Fri Apr 14 18:06:03 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA24663; Fri, 14 Apr 2000 18:06:03 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA01073; Fri, 14 Apr 2000 18:06:03 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA29602; Fri, 14 Apr 2000 18:06:02 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200004150106.SAA29602@wera.pa.dec.com>
To: koen@win.tue.nl (Koen Holtman)
Cc: http-delta@pa.dec.com
Subject: Re: Forwarded from Koen Holtman: SECURITY comments on new delta draft
Date: Fri, 14 Apr 2000 18:06:02 -0700
X-Mts: smtp

[Oops, I forgot to send this to Koen; sorry for the duplication!]

Koen Holtman writes:

    I am unhappy with the security considerations section.  I think it
    is essential that the draft _requires_ all parties to implement a
    watertight protection against the spoofing attacks you describe.
    The current language makes security optional.

I think we need to differ on this.  I think there are probably
plenty of circumstances where the risk of successful spoofing
is less than the cost of sending the digests.  I'm open to
making this a SHOULD-level requirement, but I think making
it a MUST-level requirement does not meet the IETF's criteria
in RFC2119 (which, I admit, are somewhat ambiguous).

At any rate, I think we should probably submit for Proposed
Standard without stricter requirements, then make sure that
we get an expert security review before progressing.  I'd
rather not assume that this spoofing stuff is the only security
issue!

    I have been thinking about how to prevent spoofing.  The
    cryptographic checksum method you describe protects against some
    things, though I have not yet closely studied it for holes.  It
    seems to me that it does not protect against the following attack:

    1) victim.org has copyrighted content at victim.org/x.html

    2) an end user does a request on attacker.org/y.html.

    3) attacker.org wants the user to see some of the copyrighted
    content at victim.org/x.html as part of its own web page
    attacker.org/y.html, without the attacker.org site ever literally
    sending this copyrighted content from victim.org in a HTTP response
    it generates.  This might make attacker.org immune to prosecution
    for copyright violation by victim.org, while attacker.org still has
    the benefit of putting its brand and advertising on material
    created by victim.org.
    
I'm not sure about copyright law outside the US, although I 
believe that it's now generally the same in most industrialized
countries.  In the US, I'm pretty sure there's a legal concept
called "contributory infringement" that would certainly make
attacker.org liable (at one point many years ago, I was warned
against publishing a document with the FTP address of an
RFC repository, since at the time the copyright status of
RFCs was hazy, and by helping people download them we could
have been guilty of contributory infringement).

It's also not clear that victim.org would be liable for copyright
infringement; it depends on whether the law requires "specific
intent" (meaning: victim.org intended to violate copyright) or
"general intent" (meaning: victim.org did some action that led
to a violation, without any intention of breaking the rules).

If you really care, I can check with our lab's lawyer, who
specializes in intellectual property law.

    In any case, I think it is easy to have spoofing protection against
    many attacks, including the one above, using the following rule:
    
    - - if a cache has a delta response from URL X and wants to apply this
      to a response from URL Y, then this MUST only be done if
    
      a) X and Y are octet-wise identical
    
      or
    
      b) X has a Dcluster or Dtemplate header that points to or includes Y 
	 _AND_
	 Y has a Dcluster or Dtemplate header that points to or includes X
    
    So in case b), material can only be merged if both sources explicitely
    recognise the existence of the other as a 'partner'.
    
I'm not sure if I understand how, in general, case (b) could arise.
Can you give a *plausible* scenario with specific message headers?

-Jeff



From koen@win.tue.nl  Sat Apr 15 15:46:10 2000
Return-Path: <koen@win.tue.nl>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA24047; Sat, 15 Apr 2000 15:46:10 -0700 (PDT)
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA08183; Sat, 15 Apr 2000 15:46:09 -0700
Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id PAA00320;
	Sat, 15 Apr 2000 15:46:07 -0700 (PDT)
Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3)
	  id AAA03499. Sun, 16 Apr 2000 00:42:21 +0200 (MET DST)
From: koen@win.tue.nl (Koen Holtman)
Message-Id: <200004152242.AAA03499@wsooti09.win.tue.nl>
Subject: Re: Forwarded from Koen Holtman: SECURITY comments on new delta draft
In-Reply-To: <200004150106.SAA29602@wera.pa.dec.com> from Jeffrey Mogul at "Apr 14, 2000  6: 6: 2 pm"
To: mogul@pa.dec.com (Jeffrey Mogul)
Date: Sun, 16 Apr 2000 00:42:20 +0200 (MET DST)
Cc: koen@win.tue.nl, http-delta@pa.dec.com
X-Mailer: ELM [version 2.4ME+ PL43 (25)]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

>[Oops, I forgot to send this to Koen; sorry for the duplication!]
>
>Koen Holtman writes:
>
>    I am unhappy with the security considerations section.  I think it
>    is essential that the draft _requires_ all parties to implement a
>    watertight protection against the spoofing attacks you describe.
>    The current language makes security optional.
>
>I think we need to differ on this.  I think there are probably
>plenty of circumstances where the risk of successful spoofing
>is less than the cost of sending the digests.  I'm open to
>making this a SHOULD-level requirement, but I think making
>it a MUST-level requirement does not meet the IETF's criteria
>in RFC2119 (which, I admit, are somewhat ambiguous).

I think a MUST on spoofing prevention might be essential if the
protocol is to pass an IETF security review.  In any case I am unhappy
with the security/efficiency tradeoff you are currently making.

[...]
>
>    In any case, I think it is easy to have spoofing protection against
>    many attacks, including the one above, using the following rule:
>    
>    - - if a cache has a delta response from URL X and wants to apply this
>      to a response from URL Y, then this MUST only be done if
>    
>      a) X and Y are octet-wise identical
>    
>      or
>    
>      b) X has a Dcluster or Dtemplate header that points to or includes Y 
>	 _AND_
>	 Y has a Dcluster or Dtemplate header that points to or includes X
>    
>    So in case b), material can only be merged if both sources explicitely
>    recognise the existence of the other as a 'partner'.
>    
>I'm not sure if I understand how, in general, case (b) could arise.
>Can you give a *plausible* scenario with specific message headers?

Oops, I failed to mention that the above rule would come paired with a
new requirement that a delta response from an url X, which refers to a
base entity that the browser could have gotten from another URL Y,
SHOULD include a Dcluster or Dtemplate header that points to Y.

To adapt an example from the draft:

1. first request/response:

  GET /foo?p=1 HTTP/1.1


      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "abc"
      DCluster: "//bar.example.net/foo?"

2. second request/response:


      GET /foo?p=2 HTTP/1.1
      Host: bar.example.net
      If-None-Match: "abc"
      A-IM: vcdiff

     HTTP/1.1 226 IM used
     Etag "def"
     Base-etag: "abc"
     IM: vcdiff
     DCluster: "//bar.example.net/foo?"  <---new

Note the last header in the second response, this one is new.  By
sending this header, the resource /foo?p=2 acknowledges that it is
willing to use responses from /foo?p=1 as base instances, and that it
considers /foo?p=1 to be 'friendly' and in the same uniquenes domain.

With this in place the security check b) I wrote above uses the header
to make sure that /foo?p=2 is not spoofed by unfriendly servers like
xx.attacker.org.  To spoof p=2 with this in place, an attacker would
have to alter a message from p=1 or p=2 in transit, or compromise the
origin server part that is responsible for the resource p=1.

My claim is that the above rule and requirement would reduce the
spoofing risks to the usual ones of transport integrity and
man-in-the-middle attacks.  So the delta mechanism would add no new
spoofing holes to the existing web, and this is good (and I my opinion
essential).


>
>-Jeff


Koen.


From bala@research.att.com  Sun Apr 16 11:26:26 2000
Return-Path: <bala@research.att.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA29859; Sun, 16 Apr 2000 11:26:26 -0700 (PDT)
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA20012; Sun, 16 Apr 2000 11:26:26 -0700
Received: from mail-green.research.att.com (H-135-207-30-103.research.att.com [135.207.30.103])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id LAA19019
	for <http-delta@pa.dec.com>; Sun, 16 Apr 2000 11:26:25 -0700 (PDT)
Received: from raptor.research.att.com (raptor.research.att.com [135.207.23.32])
	by mail-green.research.att.com (Postfix) with ESMTP id 65D761E016
	for <http-delta@pa.dec.com>; Sun, 16 Apr 2000 14:25:39 -0400 (EDT)
Received: from research.att.com (raptor.research.att.com [135.207.23.32])
	by raptor.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id OAA58194
	for <http-delta@pa.dec.com>; Sun, 16 Apr 2000 14:25:38 -0400 (EDT)
Message-Id: <200004161825.OAA58194@raptor.research.att.com>
To: http-delta@pa.dec.com
Subject: new version
Date: Sun, 16 Apr 2000 14:25:38 -0400
From: Balachander Krishnamurthy <bala@research.att.com>


i found a few minutes to read through it. here are some concerns.
some of these could have been expressed earlier but i just did not have
the time and am sorry about that.

. am not happy with the definition of 'instance' - for one it is circular.

   instance         The entity that would be returned in a status-200
                   response to a GET request, at the current time, for
                   the selected variant of the specified resource, with
                   the application of zero or more content-codings, but
                   without the application of any instance manipulations
                   or transfer-codings.

instance manipulation is only defined later. at this point of defining
instance we should leave instance manipulations out.

.  "One can think of an instance as a snapshot in the life of a resource."

 if no one heard the tree fall in the forest, then is it an instance?
 a resource may have multiple existences but zero instance if no one 
 requested it?
 i.e., a snapshot in the life of a resource only if the snapshot was taken.
 i would leave this sentence out or correct it since the definition requires
 an entity to be requested. maybe picky, but i thought we were trying to 
 avoid 2616 problems.

. page 10 "This formalization of the HTTP message"

well it is not really formalization of THE HTTP message since instances are
not discussed in 2616. is it clear that we are talking about our notion of
HTTP message? we say just prior to instance definition
	It is too late to fix the terminological failure in the HTTP/1.1
	specification, so we instead define a new term, for use in this
	document:
should the interpretation be that everything is relative to this document
and not, say, 2616?

my specific concern with "the HTTP message" is in relation to the following
sentence in Section 4
   This formalization of the HTTP message generation sequence has not
   previously been described. 
this would lead readers (it led me) to believe that we are talking about
generic HTTP (2616) message generation sequence. of course it could
not have been described this way since 'instance' didn't exist.
please note that am not complaining about introduction of 'instance'.

. section 5.2
	However, based on the new ordering constraint proposed in
        section 12.4.5,

  new compared to what? and as 12.4.5 comes later, so maybe say
	However, based on the ordering constraint discussed in section 12.4.5

. section 5.2
   We note that if a client indicates it is willing to accept deltas,
   but the server does not support this form of instance-manipulation,
   the server will simply ignore this aspect of the request.  (HTTP
   always allows an implementation to ignore a header it does not
   understand, and the specification of ``A-IM'' allows the server to
   ignore an instance-manipulation it does not understand.)

 since the server can ignore the entire header, it does not seem important
 that it is permitted to ignore  an instance-manipulation it does not understand.
 are we saying that the server can send back a 200 if 
	it doesn't understand the header
	it doesn't understand an instance-manipulation
	either of the above two

. section 5.4 line 880
   A response using delta encoding must be identified as such.  This is
   done using the ``IM'' header, specified in section 12.4.4.

 change to

   A response using delta encoding must be identified as such.  This is
   done using the ``IM'' response-header, specified in section 12.4.4. 
                         ^^^^^^^^^

. section 5.4 line 886 just a nit
   Because the Internet is full of HTTP/1.0 caches, which
   might never be entirely replaced, and because the HTTP specifications
  
 change to
   Because the Internet has a signficant number of HTTP/1.0 caches, which...
	or 'overwhelmingly large' - not necessarily "full"?

.  section 5.4 line 902 - this human user noticed the typo

   transmission of unnecessary bytes, and this Reason-phase should not
   normally be seen by human users.)  

  change to
   transmission of unnecessary bytes, and this Reason-phRase should not
   normally be seen by human users.)                    

.  section 5.4 line 906
    Existing proxies apparently forward responses with unknown
    status codes, and do not attempt to cache them.

  is this a "known thing" kind of statement? do we know in practice if this
  is true? the only 1.1 proxy contact i had left that company. can someone
  check with 1.0 proxy folks? i don't mean 'experimental' proxies but products.

. section 5.6 line 957 nit
   We used this example in section 5.2: the client sends:
  to
   We used this example in section 5.2: 
    The client sends:


. section 6 line 1118
	A
      recent study suggests that ``vdelta'' is the best
      overall delta algorithm [16].

  what is the statute of limitations for 'recency' - study was done in '96
  maybe last known study?

. section 8, line 1354
    http://quote.yahoo.com/q?s=DEC&d=f
 yields
	No such ticker symbol. 
  DEC -> CPQ. if we wait long enough maybe T will change to T + AWE
  

. section 8, line 1406 
     In order to use this approach to clustering, we need to impose one
     important constraint.  HTTP/1.1 requires so-called ``strong'' entity
     tags to be unique for a given URI, but does not impose any broader
     uniqueness requirements.
  what is a 'uniqueness requirement'. sounds colloquial.
 
. section 10, line 1644
      - When the proxy receives a request from a non-delta-capable
        client, it might convert this into a delta request before
        forwarding it to the server, and then (after applying a
        resulting delta response to one of its own cache entries)
        it would return a full-body response to the client.

   (0) is assumption that non-delta-capable means client can't handle ranges?
   (1) request may not be forwarded
   (2) full-body response may not be returned (could be a 304/206 and no/partial body)

. section 12.5, line 2081
	inparams --> imparams
                      ^
that's all for now
cheers,
bala

From danielh@crosslink.net  Sun Apr 16 13:57:05 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA14426; Sun, 16 Apr 2000 13:57:05 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA27794; Sun, 16 Apr 2000 13:57:05 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA25637
	for <http-delta@pa.dec.com>; Sun, 16 Apr 2000 13:57:04 -0700 (PDT)
Received: from smtp.crosslink.net (dyn41.c5200-1.springfield.236.crosslink.net [207.199.142.42]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id QAA24106 for <http-delta@pa.dec.com>; Sun, 16 Apr 2000 16:56:58 -0400
Message-Id: <200004162056.QAA24106@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Sun, 16 Apr 2000 16:50:38 -0400
To: http-delta@pa.dec.com
In-Reply-To: <200004161825.OAA58194@raptor.research.att.com>
Subject: Re: new version
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

>   instance         The entity that would be returned in a status-200
>                   response to a GET request, at the current time, for
>                   the selected variant of the specified resource, with
>                   the application of zero or more content-codings, but
>                   without the application of any instance manipulations
>                   or transfer-codings.
>instance manipulation is only defined later. at this point of defining
>instance we should leave instance manipulations out.

I'm not sure that is a problem --tt alerts the reader to an important
feature of instances (that they may be subject to "manipulation").

And a definition of  Instance Manipulation is only a few paragraphs
further down!

>>.  "One can think of an instance as a snapshot in the life of a resource."
> if no one heard the tree fall in the forest, then is it an instance? ...

I really like the phrase, but it's too imprecise. Alas, it should be
removed.

>. section 8, line 1406 
>     In order to use this approach to clustering, we need to impose one
>     important constraint.  HTTP/1.1 requires so-called ``strong'' entity
>     tags to be unique for a given URI, but does not impose any broader
>     uniqueness requirements.
>  what is a 'uniqueness requirement'. sounds colloquial.

(that is, the etag does NOT have to be unique across different URIs). 


-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From danielh@crosslink.net  Tue Apr 18 20:37:33 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id UAA04841; Tue, 18 Apr 2000 20:37:33 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA12896; Tue, 18 Apr 2000 20:37:33 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id UAA05210
	for <http-delta@pa.dec.com>; Tue, 18 Apr 2000 20:37:32 -0700 (PDT)
Received: from smtp.crosslink.net (dyn46.c5200-2.springfield.236.crosslink.net [207.199.142.175]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id XAA16713 for <http-delta@pa.dec.com>; Tue, 18 Apr 2000 23:37:29 -0400
Message-Id: <200004190337.XAA16713@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Tue, 18 Apr 2000 23:31:18 -0300
To: http-delta@pa.dec.com
Subject: dcluster and spoofing
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

A suggestion regarding dcluster and spoofing:

The following is an alternative to Koen's solution, which 
involves some sort of verification scheme -- a scheme that
seems to me to negate much of the benefit of clustering (since at least
one non-delta response is required from "victim.org" before clustering can
begin).

By way of review ...

 The problem of spoofing with dcluster (and dtemplate) occurs when 
   a) victim.org "happens" to have a base instance identified by an 
      etag of "pey" that lies in the uniqueness scope of
victim.org/foo.bar
   b) "pey" is also used as an etag for a response returned from 
      malicious.org to client x
   c) a Dcluster in this response (from malicious.org) identified 
      victim.org/foo.bar as being in it's uniqueness scope.
 When client x then asks for victim.org/foo.bar, victim.org may 
 return a delta response against it's "pey" base-instance,
 which is not the same base instance as the clients 
 "pey"-from-malicious.org base-instance

One way of avoiding this problem is for the client to identify the source
of all etags that did not come from a prior request to 
victim.org/foo.bar.

Then, victim.org could choose whether these were legitimate
etags (in the sense of having been generated by servers or intra-
server content providers that are truely in the uniqueness scope of 
victim.org/foo.bar)

For example, client x could provide:
   GET /foo.bar http/1.1
   host: victim.org
   A-IM: vcdiff
   If-None-Match: "def","pey","arf"
   DCluster: malicious.org="pey", malicious.org/foo.bar="arf",
victim2.org="def"

Alternatively, so reduce the size a bit:
  DCluster: malicious.org="pey", /foo.bar="arf", victim2.org="def"

(or A-Dcluster: .... )

Where malicious.org identified victim.org/foo.bar in a Dcluster on
previous two requests, and victim2.org identified it on one request.  
Presuming that victim2.org is legit, victim.org would ignore "arf"
 and "pey", but  use "def" (if it is available).

This does increase the size of delta requests. However, in legitimate
cases this extra header will only be sent to delta aware servers, with the
strong expectation that a delta response will be generated.



-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From koen@win.tue.nl  Wed Apr 19 10:41:11 2000
Return-Path: <koen@win.tue.nl>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA10223; Wed, 19 Apr 2000 10:41:10 -0700 (PDT)
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA22290; Wed, 19 Apr 2000 10:41:09 -0700
Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id KAA07065
	for <http-delta@pa.dec.com>; Wed, 19 Apr 2000 10:41:08 -0700 (PDT)
Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3)
	  id TAA05872. Wed, 19 Apr 2000 19:37:18 +0200 (MET DST)
From: koen@win.tue.nl (Koen Holtman)
Message-Id: <200004191737.TAA05872@wsooti09.win.tue.nl>
Subject: Re: dcluster and spoofing
In-Reply-To: <200004190337.XAA16713@lycanthrope.crosslink.net> from "danielh@crosslink.net" at "Apr 18, 2000 11:31:18 pm"
To: danielh@crosslink.net
Date: Wed, 19 Apr 2000 19:37:18 +0200 (MET DST)
Cc: http-delta@pa.dec.com
X-Mailer: ELM [version 2.4ME+ PL43 (25)]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

>A suggestion regarding dcluster and spoofing:
>
>The following is an alternative to Koen's solution, which 
>involves some sort of verification scheme -- a scheme that
>seems to me to negate much of the benefit of clustering (since at least
>one non-delta response is required from "victim.org" before clustering can
>begin).

No, it is not the intention of my scheme to hold back
clustering-related actions by a cache until the first non-delta
response from victim.org.  If a cache gets a GET request on victim,org,
and victim.org was included in a Dcluster on a response from
attack.org with an etag E, it is my intention that the cache goes
ahead and transforms the request on victim.org into a delta request,
using the etag E.  However, the cache should only *use* the delta
response obtained from this request if it has a Dcluster or Dtemplate
pointing to attack.org, the sourece of the etag E.  If this
anti-spoofing check fails, then the cache will have made the delta
request for nothing, and will have to retry without the etag E which
is now known to be a from a spoofing attack, but this is an
inefficiency one can live with.  Note that an initial non-delta
response from victim.org is not needed for this anti-spoofing
mechanism to work.


>[...your discussion of an alternative system snipped to save space...]

Your alternative system would also work I, I believe.

I'm usually not too worried about adding bytes to headers, but to make
a comparison, my method requires adding bytes to the delta responses
with clustering only, which looks more economical than adding bytes to
every request that could trigger a delta response with clustering.

I'm currently waiting for Jeff to publish a revised security section.
I don't care much which anti-spoofing mechanism is used in the end, my
main concern is that use or such a mechanism should be required by
default, and that is is cheap enough that people are willing to stick
to this requirement.

>
>
>-----------------------------------------------------------
>Daniel Hellerstein
>danielh@crosslink.net
>http://www.srehttp.org
>-----------------------------------------------------------
>

Koen.


From mogul  Fri Apr 21 19:03:22 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id TAA14142; Fri, 21 Apr 2000 19:03:22 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200004220203.TAA14142@wera.pa.dec.com>
To: http-delta
Subject: Status report: Delta draft
Date: Fri, 21 Apr 2000 19:03:22 -0700
X-Mts: smtp

It probably shouldn't be a surprise to anyone that I've fallen
behind on all of the comments people have been providing 
re: the latest revisions to the Delta encoding draft, still:
    ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-delta-04.6april2000.txt

Life is like that (i.e., my actual job takes priority).

Anyway, I've applied a lot of obvious edits (thanks to
all who commented), and I've created an "issues list" to
handle the rest of them.  This "issues list" mechanism
worked very well while we were editing the HTTP/1.1
specification - it ensures that open issues aren't lost,
and that one doesn't have to search through zillions of
email messages to find them.

Jim Gettys had a nice HTML-tables format for his HTTP/1.1
issues list.  I'm lazy; it's in ASCII text.  Sorry.

The current draft of the issues list is:
    ftp://ftp.digital.com/pub/DEC/WRL/mogul/issues-00.txt

and is broken down into "substantive issues" and "editorial
issues".  "Editorial issues" are purely questions of how to put
what we mean into words; the "substantive issues" are the
trickier ones (i.e., what should the specification actually
specify).  For some of these, a few of us have been having
private email conversations to work out the details, but
we'll certainly encourage wider discussion if necessary.

For discussion of issues: please put the Issue-name
in the Subject line of your message, and try to keep
the discussions on-topic.  That is, one issue per email
message (unless two or more are related), and use a
new Subject: line for a new issue.

If you want to suggest a new issue, I'd appreciate it if
you use the "Template for issues list items" at the end
of the file.  Try to keep them narrower than 80 columns.
Send new items directly to me, or to the entire http-delta
list.  Minor editorial stuff (spelling errors, incorrect
citations, etc.) should go directly to me, rather than
burdening the mailing list.

Thanks,
-Jeff

From danielh@crosslink.net  Sun Apr 23 10:44:20 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA03327; Sun, 23 Apr 2000 10:44:20 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA00649; Sun, 23 Apr 2000 10:44:20 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id KAA22149
	for <http-delta@pa.dec.com>; Sun, 23 Apr 2000 10:44:19 -0700 (PDT)
Received: from smtp.crosslink.net (dyn14.c5200-1.springfield.236.crosslink.net [207.199.142.15]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id NAA01847 for <http-delta@pa.dec.com>; Sun, 23 Apr 2000 13:44:14 -0400
Message-Id: <200004231744.NAA01847@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Sun, 23 Apr 2000 13:43:06 -0400
To: http-delta@pa.dec.com
Subject: DCLUSTER-ORDERING (issue)
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

The following is a possible section 12.4.2.a; which defines just how a
client should use Dcluster information to determine what base-instance may
be useable (hence, what etags to include in a If-None-Match)


---------------
12.4.2.a: Determining the base-instances in a uniqueness scope

When sending a delta-enabled request, a client should
identify all base-instances that may be useable. Essentially, the problem
is finding all base-instances in the same
uniqueness scope as the request-URI.

The first step is simple:

   i) any available base-instance, from a prior response from the 
     same request-URI, MAY be used

The next step is to find any base-instance that is explicitily associated
with a "matching" URL-prefix.

   ii) any available base-instance, from a prior response from a URI that 
       contained a DCluster that prefix-matches the request-URI, MAY be
used

Then, the set of matching URL-prefixes should be determined.

   iii) The request-URI should be compared to all available DCluster 
        information. This comparision will yield a set of matching
        URL-prefixs, and the dates of their defintion.

Using the results of step iii, base-instances that are implicitily in the
request-URI's uniqueness scope can be found

    iv) Every available base-instance is compared to each member of the 
        set of matching URL-prefixes. If a match is found, and the date 
        of the matching URL-prefix is before the date of the
base-instance, 
        then the base instance MAY be used.

If a client's cache is large, following all these steps may be overly
time-consuming.  Thus, these steps are NOT required -- they are meant to
define the largest set of useable base-instances, but not necessarily the
optimal set.

Notes:

   * step ii may define base-instance that do NOT prefix-match
     the request-URI.
 
   * the "available" base-instances are effected by  expiration
     concerns.  Expiration of base-instances may be due to constraints
     on the size of the client's cache, or may be dicated by
     the server (say, due to Cache-control: retain response headers)

   * the date of definition  rule is used to  prevent accidents with very 
     old  cache entries


-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From danielh@crosslink.net  Sun Apr 23 10:51:36 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA21711; Sun, 23 Apr 2000 10:51:36 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA30908; Sun, 23 Apr 2000 10:51:36 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id KAA09200
	for <http-delta@pa.dec.com>; Sun, 23 Apr 2000 10:51:35 -0700 (PDT)
Received: from smtp.crosslink.net (dyn14.c5200-1.springfield.236.crosslink.net [207.199.142.15]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id NAA02845 for <http-delta@pa.dec.com>; Sun, 23 Apr 2000 13:51:34 -0400
Message-Id: <200004231751.NAA02845@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Sun, 23 Apr 2000 13:46:46 -0400
To: http-delta@pa.dec.com
Subject: New issue: implicit delta-base
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

Issue-Name: Implicit Delta Base

Document-section: page 36, point 4

Reported-By: Daniel Hellerstein
Reported-Date: 23 Apr 2000

Description: Clarification of caching rules when Delta-Base is not 
specified.

Suggested resolution:

 Modify point 4 on the "when to not cache rules", to include:
    "If a delta response is returned without
    a delta-base, as may happen if If-None-Match contains a single etag,
    the proxy MAY create an Delta-base header for internal use
    (with a value equal to the single Etag contained in the 
     request's If-None-Match header).

Resolution-Date:

-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From danielh@crosslink.net  Sun Apr 23 10:56:20 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA31821; Sun, 23 Apr 2000 10:56:20 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA25698; Sun, 23 Apr 2000 10:56:20 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id KAA30986
	for <http-delta@pa.dec.com>; Sun, 23 Apr 2000 10:56:19 -0700 (PDT)
Received: from smtp.crosslink.net (dyn14.c5200-1.springfield.236.crosslink.net [207.199.142.15]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id NAA03386 for <http-delta@pa.dec.com>; Sun, 23 Apr 2000 13:56:18 -0400
Message-Id: <200004231756.NAA03386@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Sun, 23 Apr 2000 13:51:57 -0400
To: http-delta@pa.dec.com
Subject: New issue: Client
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

Issue-Name: Client-initiated Dcluster (NEW ISSUE)
Document-section: Section 8
Reported-By: Daniel Hellerstein
Reported-Date: 23 April 2000

Description: Expanding the use of Dcluster

I propose that a client can add a Dcluster request header.

This would be used to indicate a client's "guess" as to an enhanced
uniqueness scope that may also be available to the server. The client
could then include etags in an If-None-Match that are associated with
instances from the Dcluster.  That is, with instances from URI's that
prefix-match the argument contained in the Dcluster request header, and
that would
otherwise NOT be in the uniqueness scope of the request-URI.


Suggested resolution:

In essence, this provides a way for the client and server to coordinate on
the use of "augmented caches".  For example, there may be sites that
specialize in commonly used "base instances", and these site may be 
readily accessible by both client and server. Alternatively, clients  (and
servers) may have out-of-band means of adding instances (and their URIs
and Etags) to their cache; for example, on an installation CD used  to
install access to a full-service ISP.

This capability might exacerbate possible spoofing problems; but nothing
that would not be solved by Koen's solution (of adding the appropriate 
Dcluster response-header to any response that uses a base-instance  that
is not from the request-URI; that is, that is from the  "extended"
uniqueness-scope).


Example:

  GET /personal/biography.html HTTP/1.1
  Host: joeblow.umess.edu
  Dcluster: baseinstances.umess.edu/personal/
  A-IM: vcdiff
  If-None-Match: "std_biography"

Assuming that joeblow.umess.edu has quick out-of-band access to  instances
generated by baseinstances.umess.edu; and that 
the joeblow.umess.edu will never use  "std_biography" as an etag  for
/personal/biography.html, it could respond with.

   HTTP/1.1 226 IM Used
   Etag: "joe_bio1c"
   Delta-base: "std_homepage"
   Dcluster: baseinstances.umess.edu
   IM: vcdiff
   
Notes:
  * the response Delta-base and Dcluster are optional. In
     particular, Dcluster is not needed for security reasons -- the 
    inclusion of a Dcluster by the client gives both side enough 
    information to detect spoofing.

  * this example makes sense when the client (say, janedoe.umess.edu) 
    also has quick out-of-bound access to baseinstances.umess.edu, or
    when it has made prior requests to  umess.edu  which resulted
    in acquisition of the "std_biography" instance (say, via a DTemplate).





Resolution-Date:

-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From koen@win.tue.nl  Wed Apr 26 11:48:09 2000
Return-Path: <koen@win.tue.nl>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA18340; Wed, 26 Apr 2000 11:48:08 -0700 (PDT)
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA18397; Wed, 26 Apr 2000 11:48:08 -0700
Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id LAA30841
	for <http-delta@pa.dec.com>; Wed, 26 Apr 2000 11:48:07 -0700 (PDT)
Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3)
	  id UAA03796. Wed, 26 Apr 2000 20:44:20 +0200 (MET DST)
From: koen@win.tue.nl (Koen Holtman)
Message-Id: <200004261844.UAA03796@wsooti09.win.tue.nl>
Subject: Re: more thoughts on dcluster (fwd)
To: http-delta@pa.dec.com
Date: Wed, 26 Apr 2000 20:44:20 +0200 (MET DST)
Cc: koen@win.tue.nl (Koen Holtman)
X-Mailer: ELM [version 2.4ME+ PL43 (25)]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit


[This should have gone to the list too, forwarded on request, sorry for
the delay]

----- Forwarded message from Real Name -----

>From danielh@crosslink.net Fri Apr 21 16:04:21 2000
X-Really-To: <koen@win.tue.nl>
From: danielh@crosslink.net (Real Name)
Subject: Re: more thoughts on dcluster
To: koen@win.tue.nl (Koen Holtman)
X-Mailer: CommuniGate Pro Web Mailer v.3.1
Date: Fri, 21 Apr 2000 10:04:17 -0400
Message-ID: <web-10300172@mailserver1.crosslink.net>
In-Reply-To: <200004191737.TAA05872@wsooti09.win.tue.nl>


Consider a case where there exists a large & fast
public repository of "base instances" (say, 
www.baseinstances.net) which contains a myriad of commonly
used templates-like base instances -- say one for a 
"typical home page", one for a typical "send me your comments page",
etc.

Assuming these can be used as base instances 
by a variety of sites; it is likely that a client will
have used one of them recently (I'm abstracting from
how). Hence, on future requests the client can tell
a server that it has copies of a (or of several) likely 
base-instances from such a repository, which could
make delta much more effective.

Dcluster (or dtemplate) can handle this now, but it does
require that the server inform the client first of what
URIs are in a uniqueness scope. I'm wondering
if the opposite would also work -- the client "guessing"
what uniqueneness scope the uri might fall into. 

In particular, that the client would include a
Dcluster: that points to www.baseinstances.net, along with
several appropriate etags. The server can then use these 
base-instance (if readily available).

Actually, the client may never have contacted www.baseinstances.net --
it may have "pre-loaded" it's cache from an installation or update
cd-rom; or otherwise used out-of-band means to load base instances
(and their etag & uris) into it's cache.

And how does that effect spoofing?  I'm thinking that either
Koen's or my notion -- of a server providing a dcluster, or
the client providing it, can work. That is, either mechanism
can be used to indicate the source of an etag; both need not
be done. And the above example shows one case where "client
provided" dclusters can do double duty -- as way of "guessing"
a uniqueness scope, and as a way of protecting against spoofing.





----- End of forwarded message from Real Name -----

From mogul@pa.dec.com  Fri Apr 28 15:27:42 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA32165; Fri, 28 Apr 2000 15:27:42 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA08114; Fri, 28 Apr 2000 15:27:42 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA32291; Fri, 28 Apr 2000 15:27:42 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200004282227.PAA32291@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: DCLUSTER-ORDERING (issue) 
In-Reply-To: Your message of "Sun, 23 Apr 2000 13:43:06 EDT."
             <200004231744.NAA01847@lycanthrope.crosslink.net> 
Date: Fri, 28 Apr 2000 15:27:42 -0700
X-Mts: smtp

<danielh@crosslink.net> writes:

    The following is a possible section 12.4.2.a; which defines just
    how a client should use Dcluster information to determine what
    base-instance may be useable (hence, what etags to include in a
    If-None-Match)

Thanks for suggesting this resolution.  I think I will adopt
the basic outline of your suggestion (but I plan to place it
as section 12.10, more or less).  I had to spend some time
working through the various cases, and so I ended up with
with a somewhat different way of stating the rules, but I
think they are fairly precise now.  Also, I worked in the
anti-spoofing rules that Koen wanted to see (as far as I
understand things), although there will still need to be
some more language elsewhere about this.

-Jeff

+---+
12.10 Rules for matching cache entries with DCluster headers

  Normally, when a client does a cache lookup to find an
  entry matching the URL of a resource, it checks for an
  exact match.  A client that supports the DCluster header
  (section XXX) MAY use a more complex matching rule when
  formulating a request for a delta-encoded response,
  allowing the client to list entity tags from multiple
  resources.
  
  Assuming that a client is about to make a request for a
  delta-encoded response for a given Request-URI URL1, the
  request MAY include the entity tag from a cache entry for
  URL2 if the cache entry for URL1 does not contain a
  DTemplate header (section YYY) specifying a resource other
  that URL2, and if at least one of the following conditions hold:

    (1) URL2 is URL1.

    (2) The cache entry for URL1 includes a DCluster header
    field, and at least one of the uri-prefix values in
    that field is a prefix of URL2, and the Date header
    field in the cache entry for URL1 is no newer than the
    Date header field in the cache entry for URL2.  (See
    section 14.2 for privacy considerations.)
    
	Note: a cache that includes multiple entries for URL1
	might have several with DCluster field values identical
	to value in the most recent entry.  If so, the constraint
	on Date header values may be satisfied by the oldest
	such cache entry for URL1.  In practice, an implementation
	might choose to record, in the cache entry for URL1,
	the Date value from the last response that changed
	the DCluster value for URL1, rather than storing the
	actual prior cache entries.
	
	>>>QUESTION: the spoofing attack is not possible in case 2, right?<<<

    (3) The cache entry for URL2 includes a DCluster header
    field, and at least one of the uri-prefix values in
    that field is a prefix of URL1, and (to protect against
    the spoofing spoofing attack described in section 14.1)
    at least one of these conditions holds:
	(a) The host part (and port, if specified) of URL1
	and URL2 are identical.
	(b) Condition (2) above also holds.
	(c) The client intends to reject any delta response
	without a secure means to detect spoofing, such
	as an instance digest.
	(d) The client implementation has been explicitly
	configured to disable protection against spoofing.

  The matching rules in this section define the maximal set
  of cache entries, and thus entity tags, that a client MAY
  use in a request for a delta-encoded response.  In general,
  clients SHOULD further prune the set to avoid sending
  excessively large headers.  The precise details of this
  pruning operation are left to the individual implementation,
  but pruning SHOULD be consistent with these rules:
    (1) If the cache entry for URL2 includes a "retain"
    cache-directive, this entry SHOULD NOT be used if the
    optional delta-seconds value is larger than the entry's age.

    (2) Otherwise, cache entries with "retain" cache-directives
    SHOULD be preferred over other entries.

    (3) Newer entries MAY be preferred over older entries.
    
+---+


From mogul@pa.dec.com  Fri Apr 28 15:43:30 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA02367; Fri, 28 Apr 2000 15:43:30 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA07385; Fri, 28 Apr 2000 15:43:30 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA28917; Fri, 28 Apr 2000 15:43:30 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200004282243.PAA28917@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: New issue: Client-initiated Dcluster
In-Reply-To: Your message of "Sun, 23 Apr 2000 13:51:57 EDT."
             <200004231756.NAA03386@lycanthrope.crosslink.net> 
Date: Fri, 28 Apr 2000 15:43:30 -0700
X-Mts: smtp

<danielh@crosslink.net> writes:

    I propose that a client can add a Dcluster request header.

    This would be used to indicate a client's "guess" as to an enhanced
    uniqueness scope that may also be available to the server. The
    client could then include etags in an If-None-Match that are
    associated with instances from the Dcluster.  That is, with
    instances from URI's that prefix-match the argument contained in
    the Dcluster request header, and that would otherwise NOT be in the
    uniqueness scope of the request-URI.
    
I think I'm going to reject this for a few reasons:

First, and most important, while this might be an interesting
and useful extension to the delta encoding mechanism, I don't
it represents an identifiable problem with the existing draft.
I'd suggest waiting until we've reached some sort of closure
on the basic mechanism, then writing this up as a separate
Internet-Draft.  As far as I can tell, since it should be
an optional mechanism, it can be described in a separate
document.

Second, I'd strongly recommend against using the same header
name in both a request and a response unless it really means
*exactly* the same thing.  We've had some confusion in HTTP/1.1
because, for example, some of the cache-directive names
are valid in both requests and responses, but mean subtly
different things.

Finally, I'd suggest thinking carefully about whether the
mechanism you propose would actually work correctly, without
some additional details.  I don't think you could actually
safely allow deltas using entity tags that aren't definitely
in the same uniqueness scope (this is just a hunch).

-Jeff

From mogul  Fri Apr 28 18:23:09 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA17481; Fri, 28 Apr 2000 18:23:09 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200004290123.SAA17481@wera.pa.dec.com>
To: http-delta
Subject: new issue: DELTA+IF-RANGE
Date: Fri, 28 Apr 2000 18:23:09 -0700
X-Mts: smtp

Just thinking about all of the likely header combinations ...

-Jeff

Issue-Name: DELTA+IF-RANGE
Document-section: needs new subsection of section 12?
Reported-By: Jeff Mogul <mogul@pa.dec.com>
Reported-Date: Fri, 28 Apr 2000
Description:
	The spec needs to provide some guidance on how the server
	should interpret a request that allows delta encoding
	and also includes an If-Range header.

	HTTP/1.1 says If-Range means:
	   if the entity is unchanged, send me the part(s) that I
	   am missing; otherwise, send me the entire new entity
	When combined with a request for a delta, the meaning could
	either be:
	   if the entity is unchanged, send me the part(s) OF THE
	   DELTA that I am missing; otherwise, send me the entire
	   new entity
	or it could be:
	   if the entity is unchanged, send me the part(s) that I
	   am missing; otherwise, send me A DELTA-ENCODED RESPONSE
	   FOR the entire new entity
	or it could be:
	   if the entity is unchanged, send me the part(s) OF THE
	   DELTA that I am missing; otherwise, send me A DELTA-ENCODED
	   RESPONSE FOR the entire new entity
Suggested resolution:
	The third choice seems to be the only useful interpretation.

	The first choice seems odd (why would one only want to
	apply delta-encoding to the previous response [the one
	that was prematurely terminated and that is being
	filled in with an If-Range], but not to the current one?).

	The second choice also seems not to work (the prematurely
	terminated response could not have been delta-encoded,
	because trying to fill it in using a Range of the
	non-delta-encoded instance wouldn't work in that case,
	but then why ask for a delta now if we didn't ask for
	it the last time?)
	
	So a legal example of this combination (choice #3) would
	be something like:
		GET /foo.html HTTP/1.1
		Host: example.com
		Range: 1024-		// get the rest of the response
		A-IM: vcdiff, range	// apply the delta, then the range
		If-Range: "abc"		// Etag for partial prior response
		If-None-Match: "pqr"	// Etag for prior base instance

	Perhaps the spec should say that if the request carries
	and If-Range header, and the A-IM header lists "range"
	prior to any delta-coding, then the server SHOULD ignore
	the delta-coding?

Resolution-Date:

From danielh@crosslink.net  Fri Apr 28 21:36:47 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id VAA21938; Fri, 28 Apr 2000 21:36:47 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA29078; Fri, 28 Apr 2000 21:36:47 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id VAA28573
	for <http-delta@pa.dec.com>; Fri, 28 Apr 2000 21:36:46 -0700 (PDT)
Received: from smtp.crosslink.net (dyn31.c5200-1.springfield.236.crosslink.net [207.199.142.32]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id AAA11505 for <http-delta@pa.dec.com>; Sat, 29 Apr 2000 00:36:44 -0400
Message-Id: <200004290436.AAA11505@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Sat, 29 Apr 2000 00:34:33 -0400
To: http-delta@pa.dec.com
In-Reply-To: <200004282329.QAA19758@wera.pa.dec.com>
Subject: Re: New issue: implicit delta-base
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

Reported-By: Daniel Hellerstein
Reported-Date: 23 Apr 2000
>    
>    Description: Clarification of caching rules when Delta-Base is not 
>    specified.
>    
>    Suggested resolution:
>    
>     Modify point 4 on the "when to not cache rules", to include:
>	"If a delta response is returned without
>	a delta-base, as may happen if If-None-Match contains a single etag,
>	the proxy MAY create an Delta-base header for internal use
>	(with a value equal to the single Etag contained in the 
>	 request's If-None-Match header).
>
Jeff replied    
>How about a slightly different modification: at the end of
>section 12.4.1 (Delta-Base), add this:
>   A cache that receives a delta-encoded response that lacks
>   a Delta-base header MAY add a Delta-Base header whose value
>   is the entity tag given in the If-None-Match field of the
>   request (but only if that field lists exactly one entity
>   tag).

>This kills two birds with one stone: it solves your problem, and it also
>allows a caching proxy to forward the implicit
>Delta-base to another client.

That's fine by me.


>Alternatively, we could change 12.4.1 from
>   Any response with an IM header that includes a delta-coding MAY
>   include a Delta-Base header.
>to
>   Any response with an IM header that includes a delta-coding SHOULD
>   include a Delta-Base header.
>as suggested in the note in that section, which would render your issue
>superfluous.  Any comments?

I prefer the first solution.


-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From danielh@crosslink.net  Fri Apr 28 22:28:08 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id WAA27640; Fri, 28 Apr 2000 22:28:08 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA12409; Fri, 28 Apr 2000 22:28:08 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id WAA30392
	for <http-delta@pa.dec.com>; Fri, 28 Apr 2000 22:28:07 -0700 (PDT)
Received: from smtp.crosslink.net (dyn31.c5200-1.springfield.236.crosslink.net [207.199.142.32]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id BAA18916 for <http-delta@pa.dec.com>; Sat, 29 Apr 2000 01:28:05 -0400
Message-Id: <200004290528.BAA18916@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Sat, 29 Apr 2000 01:23:08 -0400
To: http-delta@pa.dec.com
In-Reply-To: <200004290123.SAA17481@wera.pa.dec.com>
Subject: Re: new issue: DELTA+IF-RANGE
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

>	So a legal example of this combination (choice #3) would
>	be something like:
>		GET /foo.html HTTP/1.1
>		Host: example.com
>		Range: 1024-		// get the rest of the response
>		A-IM: vcdiff, range	// apply the delta, then the range
>		If-Range: "abc"		// Etag for partial prior response
>		If-None-Match: "pqr"	// Etag for prior base instance

Which means:
   If foo.html's current-instance has an etag of "abc", then
       a) compute a delta (using vcdiff) between "abc" and "pqr"
      b) return bytes 1024- of this delta 
   If it's NOT "abc" then (say, it's "xyz")
      a) compute a delta between "xyz" and "pqr", and return this delta 
         (ignore the range)

>	Perhaps the spec should say that if the request carries
>	and If-Range header, and the A-IM header lists "range"
>	prior to any delta-coding, then the server SHOULD ignore
>	the delta-coding?

Makes sense -- what would the vcdiff (of the 1024- range) be computed
against? -----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From danielh@crosslink.net  Fri Apr 28 22:28:44 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id WAA06345; Fri, 28 Apr 2000 22:28:44 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA01434; Fri, 28 Apr 2000 22:28:44 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id WAA23476
	for <http-delta@pa.dec.com>; Fri, 28 Apr 2000 22:28:43 -0700 (PDT)
Received: from smtp.crosslink.net (dyn31.c5200-1.springfield.236.crosslink.net [207.199.142.32]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id BAA19007 for <http-delta@pa.dec.com>; Sat, 29 Apr 2000 01:28:41 -0400
Message-Id: <200004290528.BAA19007@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Sat, 29 Apr 2000 01:27:41 -0400
To: http-delta@pa.dec.com
In-Reply-To: <200004290011.RAA13020@wera.pa.dec.com>
Subject: Re: another thought re: client-initiated Dcluster
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

Jeff said
>I think maybe what you are getting at is something a little
>different from DCluster.  Maybe I'm reading this the wrong
>way, but I don't think this is best thought of as a case of
>expanding the uniqueness scope for a URL.

>Rather, I think what you really want to do is to have the
>client's request express the concept that
>    I have ALL of the base instances in this [large] set,
>    so I don't intend to list them all in my If-None-Match
>    header.
>So the sequence of events might be that server umess.edu
>tells the client that its uniqueness scope includes
>"http://baseinstances.com" or perhaps "cdrom://baseinstaces".

>Then the client has a choice of a million entity tags
>(or however many things are in this repository) to specify
>in its If-None-Match header - this is clearly not feasible.

The notion was that a client would only use the instances that are somehow
functionally related to the resource it's about to request -- where
"functionally related" would be determined by name and location.  For
example, a rule of the sort
   "/hello.htm is used as a "welcome to my site "page, hence
   we should use the base-instances for this type of page"
would be used to choose a limited set of etags.

I now think that this may be overly restrictive, in the
pain-in-the-neck-to-implement sense -- getting servers and clients to
agree on these "rules" being the first obstacle.

So your suggestion  does have a certain logic.  

Your suggestion may be a bit extreme, since it seems to the client to
having "all"  the base instances in the named respository,  not just a
"useful collection".  But with carefully orgainized "sets", say as 
specified using  subdirectories, this may not be such an onerous problem.

>So instead, the client could send something like
>  GET /personal/daniel.html HTTP/1.1
>  Host: bios.umess.edu
>  Base-Instance-sets: http://baseinstances.com/,
>		cdrom://baseinstances
>  A-IM: vcdiff
>with no If-None-Match, because (in this example) it has never received
>your biography before.
>And the server could respond
>  HTTP/1.1 226 IM Used
>  IM: vcdiff
>  DTemplate: "http://baseinstances.com/biotemplate/version97"
>  Delta-Base: "whateveretag"
>  Etag: "adjklaskdjasd"

Interesting, the response indicates a template that the server believes
the  client "already has", based on the client's inclusion of a 
Base-Instance-sets.  The Delta-base may not be strictly necessary, but
it's probably a good idea  -- since there may be more then one "base
instance" associated with the named template (assuming that the delta-base
is for this named template) 

>with the body that is the delta between one of the many
>files in one of those repositories and the current version
>of your biography.

Or maybe my assumption is not what you were thinking (that the delta base 
points to a base instance of
http://baseinstances.com/biotemplate/version97)? Which I think doesn't
make sense.


>This isn't fully worked through - I'm leaving that to you :-). But I
>think the first step is to clearly define what problem you are trying to
>solve, and I think the issue with a
>repository-based approach is not how to define the uniqueness scope, but
>how to limit the number of entity tags in the
>request headers.

The problem is how to use delta-encoding on the first response to a
client. The idea is that when a client knows (or can reaonably guess) that
the origin server  has quick access to a respository of "base instances"
-- a repository that the client also has quick access to.  For example,
this repository may be   a very fast (or widely mirrored) site, or it may
be local data distributed via CD-ROM  (say, in an installation package
used for all clients  at a university).

At this moment, I can't think of any major flaws in your proposal.   The
points I would make are:
  a) the client needs some unspecified means  to determine when to include


    a Base-Instances-Set request header
  b) the  Delta-base that is returned refer to the "uniqueness scope" of
the URI
       included in the Dtemplate; that is, that the base instance used by
the
       client to decode the response be identified by the Delta-base etag
of the
      named Dtemplate.
  c) the base-instances-set can be specified down to the "subdirectory"
level; it
      need not refer to an "entire site".
  d) I would not allow base-instance-sets of the form
  		cdrom://baseinstances
      Instead, the client could use
               baseinstances.com/cdrom_OCT99/
     which both client and server would presumably have a "local" copy of.
    That is, express everything as URL-prefix, that points to an server,
    and let the client and server take advantage of whatever clever
"caches" they may have.

So -- I guess I've been volunteered to write a section.... assuming that
the rest of the group deems this a worth addition (or at least one person
thinks so, and no one else disagrees).


-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From koen@win.tue.nl  Sat Apr 29 13:20:02 2000
Return-Path: <koen@win.tue.nl>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA05058; Sat, 29 Apr 2000 13:20:02 -0700 (PDT)
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA11777; Sat, 29 Apr 2000 13:20:01 -0700
Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA19482;
	Sat, 29 Apr 2000 13:20:00 -0700 (PDT)
Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3)
	  id WAA08791. Sat, 29 Apr 2000 22:16:14 +0200 (MET DST)
From: koen@win.tue.nl (Koen Holtman)
Message-Id: <200004292016.WAA08791@wsooti09.win.tue.nl>
Subject: Re: DCLUSTER-ORDERING (issue)
In-Reply-To: <200004282227.PAA32291@wera.pa.dec.com> from Jeffrey Mogul at "Apr 28, 2000  3:27:42 pm"
To: mogul@pa.dec.com (Jeffrey Mogul)
Date: Sat, 29 Apr 2000 22:16:14 +0200 (MET DST)
Cc: http-delta@pa.dec.com, koen@win.tue.nl (Koen Holtman)
X-Mailer: ELM [version 2.4ME+ PL43 (25)]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit


Jeff writes:
><danielh@crosslink.net> writes:
>
>    The following is a possible section 12.4.2.a; which defines just
>    how a client should use Dcluster information to determine what
>    base-instance may be useable (hence, what etags to include in a
>    If-None-Match)
>
>Thanks for suggesting this resolution.  I think I will adopt
>the basic outline of your suggestion (but I plan to place it
>as section 12.10, more or less).  I had to spend some time
>working through the various cases, and so I ended up with
>with a somewhat different way of stating the rules, but I
>think they are fairly precise now.  Also, I worked in the
>anti-spoofing rules that Koen wanted to see (as far as I
>understand things), although there will still need to be
>some more language elsewhere about this.


The proposed language below looks OK at first sight, but I have not
done a detailed analysis.  

However I am a bit concerned about the direction that is taken here in
expanding the draft.  I don't think that it is necessary that the
draft spells out, at a MUST/MAY/SHOULD level, an exact algorithm for
selecting the etags to send.  I believe that it is safe to leave the
invention of the algorithm up to the implementers.  (Most of the text
below could be helpful as an appendix though -- that helps implementers
without running the risk that the real spec becomes self-contradictory
or develops a hole because we forgot a case.) 

The draft _must_ be very exact in the algorithm for deciding when a
base instance (with an etag X) and a delta response can be merged.
>From this merging decision algorithm it will follow that it will never
make sense to send certain etags in the request, because it is known
in advance that they can never be a valid X in the merging step.  The
implementer, however, can make the necessary deductions about which
etags make sense here.  If the implementer gets it wrong and writes
an algorithm that sends too many etags, this will result in an
inefficiency (sometimes the response obtained will fail the test that
allows merging) but it will never result in incorrect content being
delivered.

I could see some room for the draft making suggestions on certain
classes of etags that should not be sent, because servers would never
be expected to use them -- e.g. the etag of a base instance for which
revalidation failed in the past.

>
>-Jeff
>
>+---+
>12.10 Rules for matching cache entries with DCluster headers
>
>  Normally, when a client does a cache lookup to find an
>  entry matching the URL of a resource, it checks for an
>  exact match.  A client that supports the DCluster header
>  (section XXX) MAY use a more complex matching rule when
>  formulating a request for a delta-encoded response,
>  allowing the clent to list entity tags from multiple
>  resources.
>  
>  Assuming that a client is about to make a request for a
>  delta-encoded response for a given Request-URI URL1, the
>  request MAY include the entity tag from a cache entry for
>  URL2 if the cache entry for URL1 does not contain a
>  DTemplate header (section YYY) specifying a resource other
>  that URL2, and if at least one of the following conditions hold:
>
>    (1) URL2 is URL1.
>
>    (2) The cache entry for URL1 includes a DCluster header
>    field, and at least one of the uri-prefix values in
>    that field is a prefix of URL2, and the Date header
>    field in the cache entry for URL1 is no newer than the
>    Date header field in the cache entry for URL2.

I believe the 'no newer' above is too restrictive in the template
case.  If URL2 were a template it would generally be very old. right?
(I assume the 'dcluster' above is supposed to mean 'dcluster and
dtemplate'.)

>  (See
>    section 14.2 for privacy considerations.)
>    
>	Note: a cache that includes multiple entries for URL1
>	might have several with DCluster field values identical
>	to value in the most recent entry.  If so, the constraint
>	on Date header values may be satisfied by the oldest
>	such cache entry for URL1.  In practice, an implementation
>	might choose to record, in the cache entry for URL1,
>	the Date value from the last response that changed
>	the DCluster value for URL1, rather than storing the
>	actual prior cache entries.
>	
>	>>>QUESTION: the spoofing attack is not possible in case 2, right?<<<

On the spoofing attack: it looks like the attack is prefented here,
yes, but I would have to see the merging rule to see if the attack is
prevented everywhere.  I would like to see all possible attacks being
prevented by the merging rule, as I think it will be in what you will
write.  The etag sending rules above would then only affect
efficiency, not security, if they happend to have a hole.


>
>    (3) The cache entry for URL2 includes a DCluster header
>    field, and at least one of the uri-prefix values in
>    that field is a prefix of URL1, and (to protect against
>    the spoofing spoofing attack described in section 14.1)
>    at least one of these conditions holds:
>	(a) The host part (and port, if specified) of URL1
>	and URL2 are identical.
>	(b) Condition (2) above also holds.
>	(c) The client intends to reject any delta response
>	without a secure means to detect spoofing, such
>	as an instance digest.
>	(d) The client implementation has been explicitly
>	configured to disable protection against spoofing.

I am really uncomfortable with the (a)-(d) list above because it
greatly expands the number of cases one needs to consider to determine 
if the spoofing protection is watertight.  As I said above, I would
like the protection to be centralised in the merging rule part of the
spec; this reduces the cases above to only case (c), so that
everything from 'and (to protect against...' can be deleted.

>
>  The matching rules in this section define the maximal set
>  of cache entries, and thus entity tags, that a client MAY
>  use in a request for a delta-encoded response.  In general,
>  clients SHOULD further prune the set to avoid sending
>  excessively large headers.  The precise details of this
>  pruning operation are left to the individual implementation,
>  but pruning SHOULD be consistent with these rules:
>    (1) If the cache entry for URL2 includes a "retain"
>    cache-directive, this entry SHOULD NOT be used if the
>    optional delta-seconds value is larger than the entry's age.
>
>    (2) Otherwise, cache entries with "retain" cache-directives
>    SHOULD be preferred over other entries.
>
>    (3) Newer entries MAY be preferred over older entries.

There should probably be something about Dcluster in the above
discussion too.



Koen.

From danielh@crosslink.net  Sat Apr 29 16:59:11 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA12199; Sat, 29 Apr 2000 16:59:10 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA22131; Sat, 29 Apr 2000 16:59:10 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id QAA00393
	for <http-delta@pa.dec.com>; Sat, 29 Apr 2000 16:59:09 -0700 (PDT)
Received: from smtp.crosslink.net (dyn09.c5200-1.springfield.236.crosslink.net [207.199.142.10]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id TAA26045 for <http-delta@pa.dec.com>; Sat, 29 Apr 2000 19:59:01 -0400
Message-Id: <200004292359.TAA26045@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Sat, 29 Apr 2000 19:45:54 -0400
To: http-delta@pa.dec.com
In-Reply-To: <200004292016.WAA08791@wsooti09.win.tue.nl>
Subject: Re: DCLUSTER-ORDERING (issue)
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

Koen wrote
>However I am a bit concerned about the direction that is taken here in
>expanding the draft.  I don't think that it is necessary that the draft
>spells out, at a MUST/MAY/SHOULD level, an exact algorithm for selecting
>the etags to send.  I believe that it is safe to leave the invention of
>the algorithm up to the implementers.  (Most of the text below could be
>helpful as an appendix though -- that helps implementers without running
>the risk that the real spec becomes self-contradictory or develops a hole
>because we forgot a case.) 

The current paragraph
    The matching rules in this section define the maximal set
    of cache entries, and thus entity tags, that a client MAY
    use in a request for a delta-encoded response.  In general,
    clients SHOULD further prune the set to avoid sending
    excessively large headers.  The precise details of this
    pruning operation are left to the individual implementation,
    but pruning SHOULD be consistent with these rules:

is a fairly weak -- except for the limitation on  what the
"maximal" set should be.  

If I read correctly, you are advocating that the concept of the "maximal
set" not be prominent.  That  a client can  send any etag, even if there
is no direct evidence  that the base-instance associated with one of these
"any etags" is from the request-URI's  uniqueness scope. In other words,
including such etags may be very inefficient, but it's not unacceptable
practice.  


>prevented by the merging rule, as I think it will be in what you will
>write.  The etag sending rules above would then only affect efficiency,
Is the "merging rule" 
  a) IF a server uses a base-instance from a request-URI's uniqueness
scope, 
    but not from the actual request-URI
  b) THEN it MUST include a Dcluster pointing ot the URI for which this
      etag is (associated with) a base-instance
?

If so, that would alleviate some concerns about clients exceeding the
"maximal set".

But I still prefer the language the way it is (maximal sets, defined 
using SHOULD and MAY) --
since it will encourage careful practice.

-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From koen@win.tue.nl  Sun Apr 30 13:03:24 2000
Return-Path: <koen@win.tue.nl>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA10763; Sun, 30 Apr 2000 13:03:23 -0700 (PDT)
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA24355; Sun, 30 Apr 2000 13:03:23 -0700
Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA08730
	for <http-delta@pa.dec.com>; Sun, 30 Apr 2000 13:03:22 -0700 (PDT)
Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3)
	  id VAA10021. Sun, 30 Apr 2000 21:59:34 +0200 (MET DST)
From: koen@win.tue.nl (Koen Holtman)
Message-Id: <200004301959.VAA10021@wsooti09.win.tue.nl>
Subject: Re: DCLUSTER-ORDERING (issue)
In-Reply-To: <200004292359.TAA26045@lycanthrope.crosslink.net> from "danielh@crosslink.net" at "Apr 29, 2000  7:45:54 pm"
To: danielh@crosslink.net
Date: Sun, 30 Apr 2000 21:59:34 +0200 (MET DST)
Cc: http-delta@pa.dec.com
X-Mailer: ELM [version 2.4ME+ PL43 (25)]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

>Koen wrote
>>However I am a bit concerned about the direction that is taken here in
>>expanding the draft.  I don't think that it is necessary that the draft
>>spells out, at a MUST/MAY/SHOULD level, an exact algorithm for selecting
>>the etags to send.  I believe that it is safe to leave the invention of
>>the algorithm up to the implementers.  (Most of the text below could be
>>helpful as an appendix though -- that helps implementers without running
>>the risk that the real spec becomes self-contradictory or develops a hole
>>because we forgot a case.) 
>
>The current paragraph
>    The matching rules in this section define the maximal set
>    of cache entries, and thus entity tags, that a client MAY
>    use in a request for a delta-encoded response.  In general,
>    clients SHOULD further prune the set to avoid sending
>    excessively large headers.  The precise details of this
>    pruning operation are left to the individual implementation,
>    but pruning SHOULD be consistent with these rules:
>
>is a fairly weak -- except for the limitation on  what the
>"maximal" set should be.  

Yes.

>
>If I read correctly, you are advocating that the concept of the "maximal
>set" not be prominent.  That  a client can  send any etag, even if there
>is no direct evidence  that the base-instance associated with one of these
>"any etags" is from the request-URI's  uniqueness scope. In other words,
>including such etags may be very inefficient, but it's not unacceptable
>practice.  

Yes, exactly.  The position I am advocating is perhaps more editorial,
or related to 'protocol complexity', than it will affect what goes on
on the wire.  However I do feel that editorial concerns from people
other than the editor carry some weight in as far as they affect
(the complexity of) the security related language.

>
>
>>prevented by the merging rule, as I think it will be in what you will
>>write.  The etag sending rules above would then only affect efficiency,
>Is the "merging rule" 
>  a) IF a server uses a base-instance from a request-URI's uniqueness
>scope, 
>    but not from the actual request-URI
>  b) THEN it MUST include a Dcluster pointing ot the URI for which this
>      etag is (associated with) a base-instance
>?

What I have been calling the "merging rule" is the thing above, plus
all other things that need to be checked on merging, e.g. also if the
base instance said that the request-URI was in its uniqueness scope.

>
>If so, that would alleviate some concerns about clients exceeding the
>"maximal set".
>
>But I still prefer the language the way it is (maximal sets, defined 
>using SHOULD and MAY) --
>since it will encourage careful practice.
>
>-----------------------------------------------------------
>Daniel Hellerstein
>danielh@crosslink.net
>http://www.srehttp.org
>-----------------------------------------------------------
>

Koen.

From mogul@pa.dec.com  Mon May  1 14:04:15 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA31473; Mon, 1 May 2000 14:04:15 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA24893; Mon, 1 May 2000 14:04:15 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA30649; Mon, 1 May 2000 14:04:14 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200005012104.OAA30649@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: another thought re: client-initiated Dcluster 
In-Reply-To: Your message of "Sat, 29 Apr 2000 01:27:41 EDT."
             <200004290528.BAA19007@lycanthrope.crosslink.net> 
Date: Mon, 01 May 2000 14:04:14 -0700
X-Mts: smtp

<danielh@crosslink.net> writes:

    So -- I guess I've been volunteered to write a section.... assuming
    that the rest of the group deems this a worth addition (or at least
    one person thinks so, and no one else disagrees).

Actually, I would strongly suggest doing this as a separate document
(Internet-Draft), NOT as another section for the Delta specification.
The current document is already too long/complex, and I don't think
the extension you're proposing needs to be in the same document;
it should be possible to layer it as an extension.

We need to make progress on getting a Delta I-D finished, and while
there may be many interesting bells and whistles that we could
add, I'd argue against almost anything else at this point.

-Jeff

From mogul@pa.dec.com  Mon May  1 14:44:39 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA04715; Mon, 1 May 2000 14:44:39 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA05593; Mon, 1 May 2000 14:44:39 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA11072; Mon, 1 May 2000 14:44:39 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200005012144.OAA11072@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: DCLUSTER-ORDERING (issue) 
In-Reply-To: Your message of "Sun, 30 Apr 2000 21:59:34 +0200."
             <200004301959.VAA10021@wsooti09.win.tue.nl> 
Date: Mon, 01 May 2000 14:44:39 -0700
X-Mts: smtp

Koen Holtman writes:
    The position I am advocating is perhaps more editorial, or related
    to 'protocol complexity', than it will affect what goes on on the
    wire.  However I do feel that editorial concerns from people other
    than the editor carry some weight in as far as they affect (the
    complexity of) the security related language.

That's a fair criticism - if neither you nor Daniel can figure
out what I meant to say in that section, then it's probably not
good enough for the world at large.  Let me take another stab
at this DCLUSTER-ORDERING issue (although it should probably
now be called DCLUSTER-MATCHING).

When writing rules for the client (cache) to use in determining
which entity tags to send in a delta-eligible request, we need
to consider three orthogonal requirements:
	(1) will correct, non-malicious implementations that
	follow these rules always deliver the right content
	to the user?
	(2) do these rules make the most efficient use of
	shared resources (the Internet, servers, proxies, etc.)?
	(3) do these rules protect against the known spoofing
	attack?
I think we all agree that we should not unduly limit the behavior
or design of implementations beyond these three requirements
(although anything we specify ought to be plausibly implementable!)

The "maximal set" approach is primarily meant to address the
first requirement.  If the set of entity tags that a client
generates is entirely contained in this maximal set, we can
guarantee the right answer from non-malicious servers.  Rules
3(a)-3(d) are designed to address the third requirement,
anti-spoofing.  The "pruning rules" are design to address
the second requirement, efficiency.

I'll try to write an introductory paragraph to explain that.

We have a separate issue (SPOOFING) open, about whether
the anti-spoofing rules are sound and strong enough.  I'd
like to continue to handle that as a separate issue, please!

-Jeff

From danielh@crosslink.net  Mon May  1 14:55:13 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA03063; Mon, 1 May 2000 14:55:13 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA32551; Mon, 1 May 2000 14:55:13 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id OAA17894
	for <http-delta@pa.dec.com>; Mon, 1 May 2000 14:55:12 -0700 (PDT)
Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id RAA18256 for <http-delta@pa.dec.com>; Mon, 1 May 2000 17:55:06 -0400
Message-Id: <200005012155.RAA18256@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Mon, 01 May 2000 17:51:55 -0400
To: http-delta@pa.dec.com
In-Reply-To: <200005012144.OAA11072@wera.pa.dec.com>
Subject: Re: DCLUSTER-ORDERING (issue)
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

>When writing rules for the client (cache) to use in determining which
>entity tags to send in a delta-eligible request, we need to consider
>three orthogonal requirements:
>	(1) will correct, non-malicious implementations that
>	follow these rules always deliver the right content
>	to the user?
>	(2) do these rules make the most efficient use of
>	shared resources (the Internet, servers, proxies, etc.)?
>	(3) do these rules protect against the known spoofing
>	attack?
>I think we all agree that we should not unduly limit the behavior or
>design of implementations beyond these three requirements (although
>anything we specify ought to be plausibly implementable!)

Seems right to me.

>The "maximal set" approach is primarily meant to address the first
>requirement.  If the set of entity tags that a client generates is
>entirely contained in this maximal set, we can guarantee the right answer
>from non-malicious servers.  Rules 3(a)-3(d) are designed to address the
>third requirement,
>anti-spoofing.  The "pruning rules" are design to address
>the second requirement, efficiency.
>I'll try to write an introductory paragraph to explain that.

I've got no further comments right now... 
I'm awaiting the next draft, or section of draft.

 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Mon May  1 15:05:56 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA10421; Mon, 1 May 2000 15:05:56 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA21973; Mon, 1 May 2000 15:05:56 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA13318; Mon, 1 May 2000 15:05:56 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200005012205.PAA13318@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: DCLUSTER-ORDERING (issue) 
In-Reply-To: Your message of "Sat, 29 Apr 2000 22:16:14 +0200."
             <200004292016.WAA08791@wsooti09.win.tue.nl> 
Date: Mon, 01 May 2000 15:05:56 -0700
X-Mts: smtp

Koen Holtman writes:
    However I am a bit concerned about the direction that is
    taken here in expanding the draft.  I don't think that it is
    necessary that the draft spells out, at a MUST/MAY/SHOULD
    level, an exact algorithm for selecting the etags to send.  I
    believe that it is safe to leave the invention of the
    algorithm up to the implementers.

    The draft _must_ be very exact in the algorithm for deciding
    when a base instance (with an etag X) and a delta response
    can be merged.  From this merging decision algorithm it will
    follow that it will never make sense to send certain etags in
    the request, because it is known in advance that they can
    never be a valid X in the merging step.

I think there is a subtle error in your reasoning here.

There are basically three decision points where a choice
is made about entity tags if "clustering" is used:

	(1) The client, when forming a request, has to
	pick a set of entity tags to send in the If-None-Match
	header.
	(2) The server, when computing a delta, has to
	pick one member of this set as the base instance
	(or it can decide to pick nothing == no delta encoding)
	(3) The client, when it receives a delta response,
	needs to decide if the response is valid for use
	in reconstructing the current base instance.

As far as I can tell, Koen and Daniel are using the term
"merging rule" to describe the third decision point (although
I admit that I'm not entirely sure if that's what they
mean).  And Koen is arguing that the "merging rule" is
where the specification must be exact.

But, in fact, the first decision point is also critical
(i.e., must be formally correct), or else protocol will
break.  The problem is that it is (potentially) possible
for two different base instances, in two different uniqueness
scopes, to have identical entity tags.  In fact, vanilla
HTTP/1.1 allows every resource served by a server to have
exactly the same entity tag!  (For example, a valid HTTP/1.1
server that provided a million different pages from a CD-ROM
could use the CD-ROM's creation timestamp as the entity
tag for each and every one of those pages.)  And if the
client and server don't agree on which uniqueness scope
an entity tag is drawn from, they also would not realize
that they could disagree on what the associated instance
is.  So the client MUST NOT, under any circumstances,
tell the server that it wants a delta using an entity tag
that isn't in the right uniqueness scope - there is no
way for the checks at decisions points 2 or 3 to fix a
mistake at point 1.

I would agree that the third step also needs to be precisely
specified, at least to the extent that at least one defense against
spoofing involves looking at the DCluster header on that
response.  But (as far as I can tell) this is ONLY an issue
for anti-spoofing, and not for the more general problem of
ensuring correct behavior even when all parties are honest.

-Jeff

From mogul  Mon May  1 16:55:39 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA28696; Mon, 1 May 2000 16:55:39 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200005012355.QAA28696@wera.pa.dec.com>
To: http-delta
Subject: SPOOFING
In-reply-to: Your message of "Sat, 29 Apr 2000 22:16:14 +0200."
             <200004292016.WAA08791@wsooti09.win.tue.nl> 
Date: Mon, 01 May 2000 16:55:39 -0700
X-Mts: smtp

Koen Holtman writes:
    >    (3) The cache entry for URL2 includes a DCluster header
    >    field, and at least one of the uri-prefix values in
    >    that field is a prefix of URL1, and (to protect against
    >    the spoofing spoofing attack described in section 14.1)
    >    at least one of these conditions holds:
    >	(a) The host part (and port, if specified) of URL1
    >	and URL2 are identical.
    >	(b) Condition (2) above also holds.
    >	(c) The client intends to reject any delta response
    >	without a secure means to detect spoofing, such
    >	as an instance digest.
    >	(d) The client implementation has been explicitly
    >	configured to disable protection against spoofing.
    
    I am really uncomfortable with the (a)-(d) list above because it
    greatly expands the number of cases one needs to consider to
    determine if the spoofing protection is watertight.  As I said
    above, I would like the protection to be centralised in the merging
    rule part of the spec; this reduces the cases above to only case
    (c), so that everything from 'and (to protect against...' can be
    deleted.

I understand your desire to centralize the anti-spoofing rules,
as a matter of making the spec simpler to verify.  But I think that
this leads to excessively restrictive anti-spoofing rules,
because I think that a client that follows either rule 3(a) or
rule 3(b) [which means "the client already has a cache entry
for the Request-URI that includes DCluster header covering
URL2"] is safe against spoofing.

We could debate that assertion (e.g., you could find a counter
example).  But if it is a true assertion, then I'm not sure
why we should limit the implementors' options more than necessary.

Frankly, it's not a big deal to me either way.  I suspect that
whatever we put into the spec, implementors might ignore
the "official" security requirements and do whatever they
think is "secure enough", as is too often the case with Web
security.  I'd rather analyze the options up front, rather
than trying to limit the analysis to just a few of the choices,
since then (if one of 3(a) or 3(b) proves faulty) we could at
least warn implementors that this has been analyzed and shown
not to work.

-Jeff

From mogul  Tue May  2 17:41:56 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id RAA30436; Tue, 2 May 2000 17:41:56 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200005030041.RAA30436@wera.pa.dec.com>
To: http-delta
Subject: DTemplate & DCluster
Date: Tue, 02 May 2000 17:41:56 -0700
X-Mts: smtp

danielh@crosslink.net wrote (in a message to me, not to the list):
    >  Assuming that a client is about to make a request for a
    >  delta-encoded response for a given Request-URI URL1, the
    >  request MAY include the entity tag from a cache entry for
    >  URL2 if the cache entry for URL1 does not contain a
    >  DTemplate header (section YYY) specifying a resource other
    >  that URL2, and if at least one of the following conditions hold:
    
    Are you saying that inclusion of a DTemplate FORCES the client to
    use the DTemplate's "base instance" instead of other instances in
    the uniqueness scope, such as URL.2 may be?  Or am I reading the
    above sentence incorrectly?

You're reading the sentence right; the spec for DTemplate (in
effect) changes the way that a client uses DCluster.  Or rather,
the meaning of DCluster stays the same (it defines the uniqueness
scope), but if the client implements DTemplate, then it doesn't
use the uniqueness scope as a source of a list of entity tags; it
uses it as a source a list of of DTemplate values, and then
"indirects" through this list to get a list of entity tags.

However, after re-reading that, I realized that I had been too
lazy about being precise.

So I replaced that paragraph with a simpler one:
    
    Assuming that a client is about to make a request for a
    delta-encoded response for a given Request-URI URL1, the
    request MAY include the entity tag from a cache entry for
    URL2 if at least one of the following conditions hold:
    
And then added this new paragraph, later in the section:

    If the client supports the OPTIONAL DTemplate header
    (section YYY), a modified rule applies.  As the client
    chooses the set of cache entries from which entity tags
    are acceptable according to the matching rules listed
    above in this section, it constructs a set of the
    DTemplate header field values found in those acceptable
    entries.  If the set is non-empty, then the client
    SHOULD ignore the entity tags chosen according to the
    rules above, and instead it lists the entity tags
    for any cache entries for the URIs specified by the set
    of DTemplate header field values.  If no such cache
    entries are found, the client MAY request the resource
    specified by one of the DTemplate header field values,
    then use the entity tag for the response in its
    delta-eligible request for URL1.

Is that clear (and does it seem right?)  It's still a little
dense.  I'm trying hard to specify the necessary behavior, not
the implementation behind it, but that leads to some abstraction.

-Jeff

From danielh@crosslink.net  Tue May  2 21:35:30 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id VAA30296; Tue, 2 May 2000 21:35:30 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA20266; Tue, 2 May 2000 21:35:29 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id VAA26333
	for <http-delta@pa.dec.com>; Tue, 2 May 2000 21:35:29 -0700 (PDT)
Received: from smtp.crosslink.net (dyn45.c5200-2.springfield.236.crosslink.net [207.199.142.174]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id AAA03580 for <http-delta@pa.dec.com>; Wed, 3 May 2000 00:35:26 -0400
Message-Id: <200005030435.AAA03580@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Wed, 03 May 2000 00:34:18 -0300
To: http-delta@pa.dec.com
In-Reply-To: <200005030041.RAA30436@wera.pa.dec.com>
Subject: Re: DTemplate & DCluster
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

In <200005030041.RAA30436@wera.pa.dec.com>, on 05/02/00 
   at 05:41 PM, Jeffrey Mogul <mogul@pa.dec.com> said:

>danielh@crosslink.net wrote (in a message to me, not to the list):
>    >  Assuming that a client is about to make a request for a
>    >  delta-encoded response for a given Request-URI URL1, the
>    >  request MAY include the entity tag from a cache entry for
>    >  URL2 if the cache entry for URL1 does not contain a
>    >  DTemplate header (section YYY) specifying a resource other
>    >  that URL2, and if at least one of the following conditions hold:
>    
>    Are you saying that inclusion of a DTemplate FORCES the client to
>    use the DTemplate's "base instance" instead of other instances in
>    the uniqueness scope, such as URL.2 may be?  Or am I reading the
>    above sentence incorrectly?

>You're reading the sentence right; the spec for DTemplate (in effect)
>changes the way that a client uses DCluster.  Or rather, the meaning of
>DCluster stays the same (it defines the uniqueness scope), but if the
>client implements DTemplate, then it doesn't use the uniqueness scope as
>a source of a list of entity tags; it uses it as a source a list of of
>DTemplate values, and then "indirects" through this list to get a list of
>entity tags.

That's kind of a shock -- I had no sense of this from my prior readings of
the delta spec!  

>However, after re-reading that, I realized that I had been too lazy about
>being precise.

>So I replaced that paragraph with a simpler one:
>    
>    Assuming that a client is about to make a request for a
>    delta-encoded response for a given Request-URI URL1, the
>    request MAY include the entity tag from a cache entry for
>    URL2 if at least one of the following conditions hold:
>    
>And then added this new paragraph, later in the section:

I'm still a bit unclear...

>    If the client supports the OPTIONAL DTemplate header
>    (section YYY), a modified rule applies.  As the client
>    chooses the set of cache entries from which entity tags
>    are acceptable according to the matching rules listed

which means: the client looks at each of  it's cache entries, and
determines which entries are part of the uniqueness scope of URL1 -- for
example, which ones have DCluster information that matches URL1. 

>    above in this section, it constructs a set of the
>    DTemplate header field values found in those acceptable
>    entries.

Allow me to think out loud ...

  In a sense, DTemplates are the opposite of DCluster.
    A DCluster says
         "in subsequent requests, you can this instance can be used as a 
         base instance for these URIS".
    A DTemplates says
          "this uri is a good candidate for use as a base-instance on
           future requests to the request-URI you just asked for"

   It's a little bit  odd -- why bother telling the client to go somewhere
else,
    when you just sent her what is probably a perfectly good base
instance?
    In most cases, the answer has to do with efficiency (possibly the 
    DTemplates's base  instance is easier to compute deltas against), or
more 
    likely permanence (the server will probably retain the DTemplate's
    base instance, but probably not the instance that was just sent to the
client).

So -- for a cached entry to contain a DTemplate does not mean "you can use
me for some other request-URIs", it means "go here for another
base-instance for me".

Perhaps what should be said is that if any of the cached instances of URL1
contains a DTemplate entry, then only the instances pointed to by
DTemplates, contained in URL1 cached instances, should be used.  Other
cached entries, that  are not for URL1, should not be used --  EVEN if
they contain DCluster  information that puts them in URL1's uniqueness
scope.

This actually simplifies implementation --  the set of cached entries to
be checked is much shorter (just those for those "starting at" the same
request-URI).

But maybe I still don't have it quite right??


>  If the set is non-empty, then the client
>    SHOULD ignore the entity tags chosen according to the
>    rules above, and instead it lists the entity tags
>    for any cache entries for the URIs specified by the set
>    of DTemplate header field values.  If no such cache
>    entries are found, the client MAY request the resource
>    specified by one of the DTemplate header field values,
>    then use the entity tag for the response in its
>    delta-eligible request for URL1.




>Is that clear (and does it seem right?)  It's still a little dense.  I'm
>trying hard to specify the necessary behavior, not the implementation
>behind it, but that leads to some abstraction.

>-Jeff
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul  Wed May  3 18:52:33 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA05792; Wed, 3 May 2000 18:52:33 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200005040152.SAA05792@wera.pa.dec.com>
To: http-delta
Subject: Proposal: splitting the Delta document into two
Date: Wed, 03 May 2000 18:52:33 -0700
X-Mts: smtp

After spending the last week or so trying to figure out the
details related to DCluster and DTemplate, it's dawned on
me that it might make more sense to split this into two
separate documents.

One would specify the basic HTTP Delta mechanism, without
any mention of clusters, templates, or uniqueness scopes.

The other would extend that specification to add clusters,
templates, or uniqueness scopes.

This would give the following advantages:
	(1) simplify the presentation of both parts
	(2) decouple the basic delta mechanism (which is
	relatively well understood) from the more esoteric
	mechanisms (which are justified by research results,
	but which have not been (widely?) implemented).
	(3) isolate the debate about security issues, all
	of which seem to be associated with DCluster.

We still have a few issues related to the basic delta spec,
but most are connected to the cluster/templates parts.

I've made an initial stab at the separation; it seems easy
enough.

Would anyone who is listed as an author of the basic Delta
specification:

Network Working Group                         Jeffrey Mogul, Compaq WRL,
Internet-Draft                          Balachander Krishnamurthy, AT&T,
Expires: 25 September 2000                           Fred Douglis, AT&T,
                                   Anja Feldmann, Univ. of Saarbruecken,
                                                           Yaron Goland,
                                               Arthur van Hoff, Marimba,
                                            Daniel Hellerstein, ERS/USDA

like to be REMOVED as an author of the cluster/template mechanism?
Would anyone else (Koen?) like to be added to the latter?

Thanks,
-Jeff

From danielh@crosslink.net  Thu May  4 07:42:40 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id HAA18668; Thu, 4 May 2000 07:42:40 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA05676; Thu, 4 May 2000 07:42:40 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id HAA28025
	for <http-delta@pa.dec.com>; Thu, 4 May 2000 07:42:39 -0700 (PDT)
Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id KAA14124 for <http-delta@pa.dec.com>; Thu, 4 May 2000 10:42:38 -0400
Message-Id: <200005041442.KAA14124@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Thu, 04 May 2000 10:40:40 -0300
To: http-delta@pa.dec.com
Subject: 2 parts
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

>After spending the last week or so trying to figure out the
>details related to DCluster and DTemplate, it's dawned on
>me that it might make more sense to split this into two
>separate documents.

So the idea is that a base implementation of delta would only
use a request-URI's "own" instance. An advanced implementation
would allow for the various ways of extending the uniqueness
scope (Dtemplate, Dcluster, Base-Instances).

My feeling is why not  --- afterall, it doubles the number of "pubs" :]

>This would give the following advantages:
>	(1) simplify the presentation of both parts
>	(2) decouple the basic delta mechanism (which is
>	relatively well understood) from the more esoteric
>	mechanisms (which are justified by research results,
>	but which have not been (widely?) implemented).
>	(3) isolate the debate about security issues, all
>	of which seem to be associated with DCluster.

>We still have a few issues related to the basic delta spec,
>but most are connected to the cluster/templates parts.

But I don't think this solves the Dcluster spoofing problem --
an "advanced" client (that understands Dcluster) can still be
fooled into using malicious.org's "foo" instance for
victim.org's "foo" instance, even if victim.org only implements
"basic delta". 

That is, if extended uniquness is ever allowed, then some way of identifying
the provenence of a base-instance is still required.
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Thu May  4 15:51:51 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA04906; Thu, 4 May 2000 15:51:51 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA14386; Thu, 4 May 2000 15:51:51 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA30715; Thu, 4 May 2000 15:51:51 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200005042251.PAA30715@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: 2 parts 
In-Reply-To: Your message of "Thu, 04 May 2000 10:40:40 -0300."
             <200005041442.KAA14124@lycanthrope.crosslink.net> 
Date: Thu, 04 May 2000 15:51:51 -0700
X-Mts: smtp

danielh@crosslink.net writes:
    >This would give the following advantages:
    >	(1) simplify the presentation of both parts
    >	(2) decouple the basic delta mechanism (which is
    >	relatively well understood) from the more esoteric
    >	mechanisms (which are justified by research results,
    >	but which have not been (widely?) implemented).
    >	(3) isolate the debate about security issues, all
    >	of which seem to be associated with DCluster.
    
    >We still have a few issues related to the basic delta spec,
    >but most are connected to the cluster/templates parts.
    
    But I don't think this solves the Dcluster spoofing problem -- an
    "advanced" client (that understands Dcluster) can still be fooled
    into using malicious.org's "foo" instance for victim.org's "foo"
    instance, even if victim.org only implements "basic delta".

    That is, if extended uniquness is ever allowed, then some way of
    identifying the provenence of a base-instance is still required.

Separating the draft into two documents isn't intended to SOLVE
the spoofing problem.  It's only intended to remove that problem,
and the complexities of solving it, from the document that
specifies the basic delta mechanism.

It does have the effect of limiting the possible solutions
of the spoofing problem to those that only involve implementations
that support DCluster and/or DTemplate.  I.e., we should
not add extra work for implementations that do not support
either of those mechanisms.  I believe we have always assumed
that to be the case, this just makes it explicit.

-Jeff

From danielh@crosslink.net  Thu May  4 20:04:32 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id UAA00408; Thu, 4 May 2000 20:04:32 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA28024; Thu, 4 May 2000 20:04:31 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id UAA26478
	for <http-delta@pa.dec.com>; Thu, 4 May 2000 20:04:31 -0700 (PDT)
Received: from smtp.crosslink.net (dyn59.c5200-1.springfield.236.crosslink.net [207.199.142.60]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id XAA29312 for <http-delta@pa.dec.com>; Thu, 4 May 2000 23:04:29 -0400
Message-Id: <200005050304.XAA29312@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Thu, 04 May 2000 22:59:56 -0400
To: http-delta@pa.dec.com
In-Reply-To: <200005042251.PAA30715@wera.pa.dec.com>
Subject: Re: 2 parts
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

>    But I don't think this solves the Dcluster spoofing problem -- an
>    "advanced" client (that understands Dcluster) can still be fooled
>    into using malicious.org's "foo" instance for victim.org's "foo"
>    instance, even if victim.org only implements "basic delta".

>    That is, if extended uniquness is ever allowed, then some way of
>    identifying the provenence of a base-instance is still required.

>Separating the draft into two documents isn't intended to SOLVE the
>spoofing problem.  It's only intended to remove that problem, and the
>complexities of solving it, from the document that specifies the basic
>delta mechanism.
>It does have the effect of limiting the possible solutions
>of the spoofing problem to those that only involve implementations that
>support DCluster and/or DTemplate.  I.e., we should
>not add extra work for implementations that do not support
>either of those mechanisms.  I believe we have always assumed that to be
>the case, this just makes it explicit.

If both parties are simple (no dcluster, no dtemplate) that's true; but if
the client is "advanced" and the server is "simple", then spoofing can
still occur.

All I'm saying is that two docs may be a good idea (I've no objection),
but the "basic" document will have to deal with spoofing in some way
(since basic implementations will coexists with advanced implementation).

Or, "advanced" implementations will have to be able to tell a server that
this is an "advanced request", which a basic server can ignore (or can
respond with a "i'm simple, so send me a simple request")




-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From douglis@research.att.com  Mon May  8 12:27:48 2000
Return-Path: <douglis@research.att.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id MAA27340; Mon, 8 May 2000 12:27:48 -0700 (PDT)
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA09823; Mon, 8 May 2000 12:27:46 -0700
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id MAA12691;
	Mon, 8 May 2000 12:27:45 -0700 (PDT)
Received: from alliance.research.att.com (alliance.research.att.com [135.207.26.26])
	by mail-blue.research.att.com (Postfix) with ESMTP
	id DE55E4CE06; Mon,  8 May 2000 15:26:54 -0400 (EDT)
Received: from windsor.research.att.com (windsor.research.att.com [135.207.26.46])
	by alliance.research.att.com (8.8.7/8.8.7) with ESMTP id PAA18736;
	Mon, 8 May 2000 15:26:53 -0400 (EDT)
Received: from windsor.research.att.com (localhost [127.0.0.1])
	by windsor.research.att.com (8.8.8+Sun/8.8.5) with ESMTP id PAA15595;
	Mon, 8 May 2000 15:25:34 -0400 (EDT)
Message-Id: <200005081925.PAA15595@windsor.research.att.com>
X-Mailer: exmh version 2.1.1 10/15/1999
X-Exmh-Isig-Comptype: repl
X-Exmh-Isig-Folder: delta
From: Fred Douglis <douglis@research.att.com>
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: http-delta@pa.dec.com
Subject: Re: Proposal: splitting the Delta document into two 
In-Reply-To: Your message of "Wed, 03 May 2000 18:52:33 PDT."
             <200005040152.SAA05792@wera.pa.dec.com> 
X-Uri: http://www.research.att.com/~douglis/
Mime-Version: 1.0
Content-Type: text/plain
Comments: Hyperbole mail buttons accepted, v3.13.
Date: Mon, 08 May 2000 15:25:33 -0400
Sender: douglis@research.att.com

As you know, I was involved with the early work and have pretty much
sat on the sidelines ever since.  So, I think that if you want to
split it, that's great, but I suspect that I and anyone else who was
involved earlier on but had nothing to do with the more recent stuff
shouldn't be named as an author -- or at least should read the new
doc, which I haven't yet :-).

I suspect if I had time to read it, I'd be very interested in the
clustering work, but unfortunately I can't look at it for at least
another 5-6 weeks due to a very high OSDI reviewing load.

Fred




From mogul@pa.dec.com  Mon May  8 18:44:16 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA25071; Mon, 8 May 2000 18:44:16 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA26954; Mon, 8 May 2000 18:44:16 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA19441; Mon, 8 May 2000 18:44:15 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200005090144.SAA19441@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: 2 parts 
In-Reply-To: Your message of "Thu, 04 May 2000 22:59:56 EDT."
             <200005050304.XAA29312@lycanthrope.crosslink.net> 
Date: Mon, 08 May 2000 18:44:15 -0700
X-Mts: smtp

<danielh@crosslink.net> wrote:
    >Separating the draft into two documents isn't intended to SOLVE the
    >spoofing problem.  It's only intended to remove that problem, and the
    >complexities of solving it, from the document that specifies the basic
    >delta mechanism.
    >It does have the effect of limiting the possible solutions
    >of the spoofing problem to those that only involve implementations that
    >support DCluster and/or DTemplate.  I.e., we should
    >not add extra work for implementations that do not support
    >either of those mechanisms.  I believe we have always assumed that to be
    >the case, this just makes it explicit.
    
    If both parties are simple (no dcluster, no dtemplate) that's true;
    but if the client is "advanced" and the server is "simple", then
    spoofing can still occur.

    All I'm saying is that two docs may be a good idea (I've no
    objection), but the "basic" document will have to deal with
    spoofing in some way (since basic implementations will coexists
    with advanced implementation).

This doesn't work.  The basic Delta spec should not require
implementations to do something specific that is meant to
prevent Dcluster-spoofing, if these implementations are otherwise
ignorant of Dcluster.

I mean, it clearly would not work to require a server that
does not support DCluster to send a Dcluster header in its
responses!  At least, not without a major redefinition of
what a Dcluster header means, and I think this would become
very confusing.

So we need to find a solution to the Dcluster-spoofing problem
that does not depend on any Dcluster-specific behavior from
non-Dcluster-supporting implementations.  This should have been
obvious from the start, but by separating the documents, we
can now make this explicit.

    Or, "advanced" implementations will have to be able to tell a
    server that this is an "advanced request", which a basic server can
    ignore (or can respond with a "i'm simple, so send me a simple
    request")

I'm against this approach for three reasons:
(1) increased specification complexity for the basic document
(2) increased protocol overhead for the Dcluster mechanism
(3) not at all clear to me how this would be implemented.

My suggestion: let's give the Dcluster/Dtemplate stuff a
rest until we've finished a complete draft for the basic
Delta stuff (and because a lot of people are telling me
privately that they are bored with this debate!)  If it
really does turn out that we need to do something to the
basic delta spec to prevent spoofing (and that includes
a decision that Dcluster is worth doing in the first place),
the IETF process gives us plenty of opportunities to add
somet requirements to the basic Delta spec.

-Jeff

From mogul  Fri May 12 17:49:19 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id RAA16730; Fri, 12 May 2000 17:49:19 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200005130049.RAA16730@wera.pa.dec.com>
To: http-delta
Subject: new draft of Delta encoding spec available (finally)
Date: Fri, 12 May 2000 17:49:19 -0700
X-Mts: smtp

I've finished another revised draft of the Delta encoding spec.
It's available as:
ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-delta-04.12may2000.txt

Changes in this version:

(1) I removed all the stuff about uniqueness scopes, DCluster,
and DTemplate support.  This is now in a separate document.

(2) I added sections 5.5 (Guaranteeing cache safety) and
10.8.2 (IM directive), to resolve the CACHE-SAFETY issue
(I hope).

(3) The DELTA+IF-RANGE issue turned out to be already covered
in the spec, more or less - I just forgot that I had covered
it.  I did add one related tweak to the specification:

   If a request includes an A-IM header field that lists the "range"
   instance-manipulation prior to any delta-coding(s), and the request
   also includes an If-Range header that lists the entity tag of the
   current instance, the server SHOULD ignore the delta-coding(s).

Otherwise, the meaning of that A-IM header is very hard to define.

(4) I added section 10.3 (Basic requirements for delta-encoded
responses).

I think I've resolved all of the editorial issues, too.

There are still a few places that could use some review.
For example, are there any other "basic requirements" for
section 10.3?  Are there any known "security considerations"
for the basic delta document (all of the known security
issues were related to DCluster/DTemplate)?

Otherwise, I *think* this version is ready to go to the IETF
for publication as an Internet-Draft.  I'd like to do this
on or before Thursday, May 18, since I'll be travelling
May 19-30, unless we find any significant new issues.

-Jeff

From mogul  Tue May 16 17:08:19 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id RAA09674; Tue, 16 May 2000 17:08:18 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200005170008.RAA09674@wera.pa.dec.com>
To: http-delta
Subject: Bug & fix: Deltas & content-codings
Date: Tue, 16 May 2000 17:08:18 -0700
X-Mts: smtp

danielh@crosslink.net found this bug in the Delta spec:

    10.7, pg 34
    
    I'm not sure it this makes sense..
    
      2. If the new (delta) response and the cached response have a
         different set of content-codings, the client decodes the
         content-codings from both the delta response and the
         cached response, before applying the delta.

    How do you content-decode a "delta response" -- you first have to
    generate it's instance (which requires differencing against the
    base instance).

In retrospect, I can't imagine what I was thinking when I wrote
that.

I went through a case analysis, and Daniel and I decided that
we need to adopt the principle that a client should never be
required to *apply* (as opposed to decode) a content-coding
simply to extract a delta-coding.

The spec needs to change in two ways: (1) specify some restrictions
on what the server can send (to avoid requiring a client to
content-encode), and (2) fix the requirements on clients, which
now can be a lot simpler.

Result:
=========
10.7 Rules for deltas in the presence of content-codings
   The use of delta encoding with content-encoded instances adds some
   slight complexity.  When a client (perhaps a proxy) has received a
   delta encoded response, either or both of that new response and a
   cached previous response may have non-identity content-codings.  We
   specify rules for the server and client, to prevent situations where
   the client is unable to make sense of the server's response.

10.7.1 Rules for generating deltas in the presence of content-codings
   When a server generates a delta-encoded response, the list of
   content-codings the server uses (i.e., the value of the response's
   Content-Encoding header field) SHOULD be a prefix of the list of
   content-codings the server would have used had it not generated a
   delta encoding.

   This requirement allows a client receiving a delta-encoded response
   to apply the delta to a cached base instance without having to apply
   any content-codings during the process (although the client might, of
   course, be required to decode some content-codings).

10.7.2 Rules for applying deltas in the presence of content-codings
   When a client receives a delta response with one or more non-identity
   content codings:

      1. If both the new (delta) response and the cached response
         (instance) have exactly the same set of content-codings,
         the client applies the delta response to the cached
         response without removing the content-codings from either
         response.

      2. If the new (delta) response and the cached response have a
         different set of content-codings, before applying the
         delta the client decodes one or more content-codings from
         the cached response, until the result has the same set of
	 content-codings as the delta response.

      3. If a proxy or cache is forwarding the result of applying
         the delta response to a cached base instance response, or
         later forwards this result from a cache entry, the
         forwarded response MUST carry the same Content-Encoding
         header field as the new (delta) response (and so it must
         be content-encoded as indicated by that header field).

   The intent of these rules (and in particular, rule #3) is that the
   results are always consistent with the rule that the entity tag is
   associated with the result of the content-coding, and that any
   recipient after the application of the delta-coding receives exactly
   the same response it would have received as a status-200 response
   from the origin server (without any delta-coding).

=========

-Jeff

From danielh@crosslink.net  Tue May 16 21:09:10 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id VAA03608; Tue, 16 May 2000 21:09:10 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA23174; Tue, 16 May 2000 21:09:09 -0700
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id VAA04753
	for <http-delta@pa.dec.com>; Tue, 16 May 2000 21:09:09 -0700 (PDT)
Received: from smtp.crosslink.net (dyn24.c5200-1.springfield.236.crosslink.net [207.199.142.25]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id AAA11210 for <http-delta@pa.dec.com>; Wed, 17 May 2000 00:09:06 -0400
Message-Id: <200005170409.AAA11210@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Wed, 17 May 2000 00:03:14 -0300
To: http-delta@pa.dec.com
In-Reply-To: <200005170008.RAA09674@wera.pa.dec.com>
Subject: Re: Bug & fix: Deltas & content-codings
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 

Jeff proposed:
The spec needs to change in two ways: (1) specify some restrictions on
>what the server can send (to avoid requiring a client to content-encode),
>and (2) fix the requirements on clients, which now can be a lot simpler.

.......
>10.7.1 Rules for generating deltas in the presence of content-codings
>   When a server generates a delta-encoded response, the list of
>   content-codings the server uses (i.e., the value of the response's
>   Content-Encoding header field) SHOULD be a prefix of the list of
>   content-codings the server would have used had it not generated a
>   delta encoding.
And since any content-encoding will be used only if an appropriate
accept-encoding was recieved from the client, the server will know that
the client can decode the instance.

>....   
>      2. If the new (delta) response and the cached response have a
>         different set of content-codings, before applying the
>         delta the client decodes one or more content-codings from
>         the cached response, until the result has the same set of
>	 content-codings as the delta response.

This is where the prefix rule comes into play

>      3. If a proxy or cache is forwarding the result of applying
>         the delta response to a cached base instance response, or
>         later forwards this result from a cache entry, the
>         forwarded response MUST carry the same Content-Encoding
>         header field as the new (delta) response (and so it must
>         be content-encoded as indicated by that header field).


>   The intent of these rules (and in particular, rule #3) is that the
>   results are always consistent with the rule that the entity tag is
>   associated with the result of the content-coding, and that any
>   recipient after the application of the delta-coding receives exactly
>   the same response it would have received as a status-200 response
>   from the origin server (without any delta-coding).

Conclusion: looks good!

-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul  Thu May 18 17:03:33 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id RAA22279; Thu, 18 May 2000 17:03:33 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200005190003.RAA22279@wera.pa.dec.com>
To: http-delta
Subject: Bug & Fix(??): server's consistent use of IMs
Date: Thu, 18 May 2000 17:03:32 -0700
X-Mts: smtp

    10.5.3
       If a response uses more than one instance-manipulation, the
       instance-manipulations MUST be applied in the order in which they
       appear in the A-IM request-header field.
    
    I was going to say that we could add a sentence:
       However the server  may choose to use only a subset the listed A-IM
       manipulations, so long as they are applied in the order listed in
       the A-IM request header.
    
    But is this true -- suppose we have
      A-IM: diff,gzip,range 
    say, because the client wants just the range of a prior "diff,gzip'ed"
    response. If the server choosed to use
     IM: diff,range
    the result probably is NOT helpful to the client.
    
    I'm not sure what this implies; that a trailing range means "don't use 
    range unless you use all the preceding manipulations"????

Upon analysis, I think we've decided that this particular case
isn't a disaster.  However, during this analysis, we realized
that there is a problem if the server isn't consistent about
what instance-manipulations it applies prior to computing a delta.

Here's a proposed solution (inserted in section 10.5.3 just
before the Examples):

=====
   The server's choice about whether to apply an instance-manipulation
   SHOULD be independent of its choice to apply any subsequently-applied
   two-input instance-manipulations, to the response.  (Two-input
   instance-manipulations include delta-codings, because they take two
   different values as input.  Compression and "range"
   instance-manipulations take only one input.  Other
   instance-manipulations may be defined in the future.)

      Note: the intent of this requirement is to prevent the server
      from generating a delta-encoded response that the client can
      only decode by first applying an instance-manipulation encoding
      to its cached base instance.  A server implementor might wish
      to consider what the client would logically have in its cache,
      when deciding which instance-manipulations to apply prior to a
      delta-coding.
=====

Daniel isn't entirely happy with the phrasing, but I needed to
put something down in writing and we basically agree on the
intent.

-Jeff


From mogul  Thu May 18 17:06:39 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id RAA21050; Thu, 18 May 2000 17:06:39 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200005190006.RAA21050@wera.pa.dec.com>
To: http-delta
Subject: draft-mogul-http-delta-04.txt submitted to the IETF
Date: Thu, 18 May 2000 17:06:39 -0700
X-Mts: smtp

Since I'm about to go out of town for a few weeks, and we
seem to have reached a relatively stable draft (of course,
the last time we had apparent stability, it was an illusion),
I've sent the latest draft to the IETF.

You can see *approximately* what I sent to the IETF at
ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-delta-04.18may2000.txt

Comments are welcome, but I won't be able to read them (let
alone reply) until about June 1.

-Jeff

From mogul  Wed May 31 13:07:24 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA04111; Wed, 31 May 2000 13:07:24 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200005312007.NAA04111@wera.pa.dec.com>
To: http-delta
Subject: Latest HTTP Delta draft now available from the IETF
Date: Wed, 31 May 2000 13:07:24 -0700
X-Mts: smtp

From: Internet-Drafts@ietf.org
Message-ID: <200005221036.GAA28644@ietf.org>
Subject: I-D ACTION:draft-mogul-http-delta-04.txt
Date: Mon, 22 May 2000 06:36:48 -0400
Mime-Version: 1.0
Content-Type: Multipart/Mixed; Boundary="NextPart"
To: IETF-Announce@isi.edu

--NextPart

A New Internet-Draft is available from the on-line Internet-Drafts directories.


	Title		: Delta encoding in HTTP
	Author(s)	: J. Mogul, B. Krishnamurthy, F. Douglis,
                          A. Feldmann, Y. Goland, A. van Hoff,  
                          D. Hellerstein 
	Filename	: draft-mogul-http-delta-04.txt
	Pages		: 45
	Date		: 19-May-00
	
Many HTTP requests cause the retrieval of slightly modified
instances of resources for which the client already has a
cache entry.  Research has shown that such modifying
updates are frequent, and that the modifications are
typically much smaller than the actual entity.  In such
cases, HTTP would make more efficient use of network
bandwidth if it could transfer a minimal description of the
changes, rather than the entire new instance of the
resource.  This is called 'delta encoding.'  This
document describes how delta encoding can be supported as a
compatible extension to HTTP/1.1.

A URL for this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-mogul-http-delta-04.txt

Internet-Drafts are also available by anonymous FTP. Login with the username
"anonymous" and a password of your e-mail address. After logging in,
type "cd internet-drafts" and then
	"get draft-mogul-http-delta-04.txt".

A list of Internet-Drafts directories can be found in
http://www.ietf.org/shadow.html 
or ftp://ftp.ietf.org/ietf/1shadow-sites.txt


Internet-Drafts can also be obtained by e-mail.

Send a message to:
	mailserv@ietf.org.
In the body type:
	"FILE /internet-drafts/draft-mogul-http-delta-04.txt".
	
NOTE:	The mail server at ietf.org can return the document in
	MIME-encoded form by using the "mpack" utility.  To use this
	feature, insert the command "ENCODING mime" before the "FILE"
	command.  To decode the response(s), you will need "munpack" or
	a MIME-compliant mail reader.  Different MIME-compliant mail readers
	exhibit different behavior, especially when dealing with
	"multipart" MIME messages (i.e. documents which have been split
	up into multiple messages), so check your local documentation on
	how to manipulate these messages.
		
		
Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.

--NextPart
Content-Type: Multipart/Alternative; Boundary="OtherAccess"

--OtherAccess
Content-Type: Message/External-body;
	access-type="mail-server";
	server="mailserv@ietf.org"

Content-Type: text/plain
Content-ID:	<20000519111728.I-D@ietf.org>

ENCODING mime
FILE /internet-drafts/draft-mogul-http-delta-04.txt

--OtherAccess
Content-Type: Message/External-body;
	name="draft-mogul-http-delta-04.txt";
	site="ftp.ietf.org";
	access-type="anon-ftp";
	directory="internet-drafts"

Content-Type: text/plain
Content-ID:	<20000519111728.I-D@ietf.org>

--OtherAccess--

--NextPart--





From mogul  Wed Jun  7 13:23:17 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA01729; Wed, 7 Jun 2000 13:23:17 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200006072023.NAA01729@wera.pa.dec.com>
To: http-delta
Subject: Finally: http-delta mailing list log accessible on the Web
Date: Wed, 07 Jun 2000 13:23:17 -0700
X-Mts: smtp

I finally got my act together and set up a crude tunnel through
our firewall, so you can now read the log of the http-delta@pa.dec.com
mailing list at:
	ftp://ftp.digital.com/pub/DEC/WRL/mogul/http-delta-log.txt

This is suboptimal in at least two ways:
	(1) it's one long flat text file, it's not hypertext
	or broken down by messages or threads or authors.

	(2) it's only updated about once per day, so it might
	be as much as a day behind the actual mailing list.

I hope this is sufficient!

-Jeff

P.S.: Fred Douglis pointed out that the message I sent to the
HTTP-WG list about the new draft:
	http://www.ics.uci.edu/pub/ietf/http/hypermail/2000/0130.html
didn't go to the http-delta list.  I didn't think that was necessary,
since the HTTP-WG message summarizes stuff the rest of you already
know, but you might want to look at it.

From douglis@research.att.com  Wed Jun  7 13:49:58 2000
Return-Path: <douglis@research.att.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA03984; Wed, 7 Jun 2000 13:49:58 -0700 (PDT)
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA31061; Wed, 7 Jun 2000 13:49:57 -0700
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA03378;
	Wed, 7 Jun 2000 13:49:57 -0700 (PDT)
Received: from alliance.research.att.com (alliance.research.att.com [135.207.26.26])
	by mail-blue.research.att.com (Postfix) with ESMTP
	id B8D6B4CE1C; Wed,  7 Jun 2000 16:49:11 -0400 (EDT)
Received: from windsor.research.att.com (windsor.research.att.com [135.207.26.46])
	by alliance.research.att.com (8.8.7/8.8.7) with ESMTP id QAA14792;
	Wed, 7 Jun 2000 16:49:11 -0400 (EDT)
Received: from windsor.research.att.com (localhost [127.0.0.1])
	by windsor.research.att.com (8.8.8+Sun/8.8.5) with ESMTP id QAA20660;
	Wed, 7 Jun 2000 16:47:30 -0400 (EDT)
Message-Id: <200006072047.QAA20660@windsor.research.att.com>
X-Mailer: exmh version 2.1.1 10/15/1999
From: Fred Douglis <douglis@research.att.com>
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: http-delta@pa.dec.com
Subject: Re: Finally: http-delta mailing list log accessible on the Web 
In-Reply-To: Your message of "Wed, 07 Jun 2000 13:23:17 PDT."
             <200006072023.NAA01729@wera.pa.dec.com> 
X-Uri: http://www.research.att.com/~douglis/
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 07 Jun 2000 16:47:30 -0400
Sender: douglis@research.att.com

> P.S.: Fred Douglis pointed out that the message I sent to the
> HTTP-WG list about the new draft:
> 	http://www.ics.uci.edu/pub/ietf/http/hypermail/2000/0130.html
> didn't go to the http-delta list.  I didn't think that was necessary,
> since the HTTP-WG message summarizes stuff the rest of you already
> know, but you might want to look at it.

Just to be clear for the others: what I found interesting was not the content 
of the message, which as Jeff says was already known, but rather that the 
draft had been reannounced to that mailing list.  I note that so far, neither 
Bala's original last call nor Jeff's recent reannouncement has actually 
generated any discussion in the http-wg list -- I suppose because those who 
care have moved to the http-delta list.

(I subscribe to that list, but I was about a year behind in the messages when 
I came across the message in question.)

Fred


From mogul  Tue Jun 20 10:56:01 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA23660; Tue, 20 Jun 2000 10:56:01 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200006201756.KAA23660@wera.pa.dec.com>
To: http-delta
Subject: Issac Goldstand's comments on Delta encoding spec
Date: Tue, 20 Jun 2000 10:56:01 -0700
X-Mts: smtp

With Isaac's permission, I'm resending this message to the
entire mailing list [my apologies to those of you who are now
seeing it for the third time :-)].  Isaac is now on the mailing
list, as well.

I'll send my reply either today, or (more likely) next week,
when I get back from USENIX.

A suggestion to others on the list: there are a lot of comments
in this message; it might be a good idea to discuss them
individually, under topic-specific Subject lines, rather than
trying to deal with them all in a single message.

-Jeff
------- Forwarded Message

Date: Tue, 20 Jun 2000 00:15:02 +0300 (IDT)
Message-Id: <200006192115.e5JLF2n08056@megila.jct.ac.il>
From: Issac Goldstand <issac@avoda.jct.ac.il>
Reply-To: neoi@writeme.com
Organization: Jerusalem College of Technology

Firstly, an apology to all those who are either getting the smae thing 
twice, or who got it BASE64 encoded the first time.  I don't know WHY 
it was BASE64 encoded, but I'm sending it with 8bit encoding now, just 
to be safe.

I only recently found the I-D for delta encoding on the IETF site and
wanted to raise a few issues.  You'll have to excuse me for being a
"newbie" to this area (i.e., involvement in IETF and related
activities).  I'll try not to broach "internal protocol" too much :-)  I
will try to keep my comments here brief, but will be more than happy to
elaborate should it be requested of me.

All comments are regarding document "draft-mogul-http-delta-04.txt"

Firstly: regarding section 10.4.1 of the document, you write:

   "...The response MUST include an Etag header field giving the entity
tag of the current instance, and MUST include an IM header field listing
the instance manipulations that were applied to the current instance..."

Why is this being defined as MUST?  Can the server not be allowed to
send it back without an entity tag?  RFC 2616 section 13.3.4 clearly
states "HTTP/1.1 origin servers...SHOULD send an entity tag validator
unless it is not feasible to generate one."

Therefore, I believe you should be more specific to your reasoning as to
why delta encodings suddenly MUST carry them.

Secondly:  You mention the seeming necessity to add the Delta-Base
response tag.  This seems to me to be a waste.  The IM response header
tag, defined in section 10.5.2 seems to be specific to delta encoded
responses.  Therefore, it would seem to me that rather than adding a
Delta-Base response header as defined in section 10.5.1, we could save a
few bytes AND registration of a new header by simply adding the
Delta-Base as a parameter to the IM response header, as such:

   IM = "IM" ":" #(instance-manipulation  [ ";" "base" "=" entity-tag ])

Since this is specific to the instance, we can possibly define multiple
entity-tags for the same recourse in different formats, should it ever
be needed (I still have to think a bit about this, so I'm being
intentionally vague)  in a similar format - particularly by
proxies/caches

Thirdly: Also regarding the Delta-base header (I will continue to refer
to it as such for the sake of simplicity, but still strongly reemphasize
what I stated above about adding it as a parameter), you mention in
section 10.5.1

   A cache or proxy that receives a delta-encoded response that lacks a
   Delta-base header MAY add a Delta-Base header whose value is the
   entity tag given in the If-None-Match field of the request (but only
   if that field lists exactly one entity tag).

This seems to me to be lacking forward compatibility.  It is the place
of the server and _not_ an intermediate proxy to assign entity-tags, and
therefore, if the same server which originally generated an entity-tag
for the instance (otherwise the instance would not exist within the
cache) does NOT supply the entity-tag for the given response, we
MUST assume that the server had a specific reason for withholding the
entity-tag and therefore, SHOULD NOT (perhaps even MUST NOT) supply the
client with one.

Fourth: You state the following in section 10.6:

 "A status-226 cache entry MUST NOT be used in response to a subsequent
request ...  If any of the instance-manipulation values in the cached IM
header field is a delta-coding, the cache entry does not include a
Delta-Base header field, and the If-None-Match header field of the
request that led to that cache entry does not match the If-None-Match
header field of the subsequent request."

It seems to me that we should be stopping the cache from responding as
soon as EITHER the If-None-Match header fields do not match OR if the
Delta-Base response header is missing.  The document seems to be
requiring the lack of BOTH of these conditions.

Additionally, you later state (also section 10.6) "...we know of no
formal specification for deciding if a cached status-206 response is
consistent with a subsequent request..."  yet in section 9 you clearly
seem to be making an issue of having digests for both the full data
response (i.e., what would come with a 200 reply) and for
delta-repsonses (226).  Could this kind of digesting not be used to
formally match 206 entries?  Furthermore, I stated above that it may be
beneficial to replace the Delta-base header with base parameters within
the IM tag.  By implementing numerous tags in a similar fashion we could
define an entity-tag that represents the delta between the delta base
and the instance itself (i.e. what would be transmitted with a 200
response), and thus be able to cache it properly, and, more importantly,
define a _unique_ identifying tag to each delta.

Next: The last example in section 10.7.3 (which I will repeat here) says
that a request by a client:

       GET /foo.html HTTP/1.1
       Host: example.com
       If-none-match: "abc"
       Accept-encoding: gzip
       A-IM: diffe, gzip

might return:

       HTTP/1.1 226 IM Used
       Date: Wed, 24 Dec 1997 14:01:00 GMT
       Delta-base: "abc"
       Etag: "ghi"
       IM: diffe, gzip

with a body containing  GZIP(DIFFE_DELTA(GUNZIP(GZIP(foo.html;"abc")),
foo.html;"ghi"))

I simply wanted to add that such a request could, and would very likely,
return the following response:

       HTTP/1.1 226 IM Used
       Date: Wed, 24 Dec 1997 14:01:00 GMT
       Delta-base: "abc"
       Etag: "jkl"
       IM: diffe, gzip
       Content-Encoding: gzip

with a body containing GZIP(DIFFE_DELTA(GZIP(foo.html;"abc"),
GZIP(foo.html;"jkl")))

This would be because the server is interpreting the Accept-Encoding and
A-IM headers as two distinct requests, so if both can be done, then it
probably will.  I assume that either I misunderstood something here or
that you had some obscure reason for not mentioning this scenario with
the examples, because if I _did_ correctly assess this, then it should
be in the list of examples, as not to confuse readers into thinking that
this would not happen - a situation that would certainly be applicable
should the I-D become a RFC at some future point.

In section 10.8.1 it is stated that "...By implication, if a client has
retrieved and cached several instances of a resource, some of which are
marked with ``retain'' and some not, then there is no point in caching
the instances not marked with ``retain''."

This seems silly, as if the server knows how to use the retain at all,
it should know to send a "retain=0" to eliminate caching.  It seems to
me that a lack of "retain" should just indicate that normal caching
rules should apply, rather than imply that the server is hinting that it
should NOT be cached.

While I'm discussing section 10.8.1, there are another two minor issues
that I just wanted to point out:
Firstly, you state  "A client ought not use the corresponding entity tag
in a future request for a delta-encoded response after that interval
ends."  If you're going to write this in section 10, where you are
playing by RFC 2119's rules, you "ought" to define "ought" :-)
Secondly, you state a bit later on that "A server SHOULD NOT send
``retain=0'' except in reply to a request that attempts to obtain a
delta-encoded response."  Here, it seems to me that the server should
either NEVER send "retain"s unless an A-IM is present, or send
"retain=0" any time it likes - INCLUDING when no A-IM is specified.  I
personally prefer the second line of thought.

My final "formal" comment is on section 10.8.2, and is, I'm afraid, a
bit vague at this point.

Section 10.8.2 states "A cache that complies with the specification for
the IM header, the A-IM header, and the 226 response-status code SHOULD
ignore a no-store cache-directive if an im directive is present in the
same response.  All other implementations MUST ignore the im directive
(i.e., MUST observe a no-store directive, if present)."

I have not yet worked out case examples, but it seems to me that this
may present problems with forward compatibility with HTTP/1.1 protocol
extensions.  What if another extension is suggested that runs parallel
with the IM field and also has special cache control requirements.
Might it not be possible that our implementation here will handicap
later HTTP extension proposals?

I look forward to hearing from some or all of you, and would be most
interested in being kept up-to-date on this (And other relevant)
issues.  Also - one final question:  It is painfully obvious that there
are some other "supporting" internet-drafts connected with this, and I'd
appreciate it if you could send me the names of related drafts so I can
more completely familiarize myself with this project.

Sincerely,
   Issac Goldstand



- --
Internet is a wonderful mechanism for making a fool of
yourself in front of a very large audience.
  --Anonymous

Moving the mouse won't get you into trouble...  Clicking it might.
  --Anonymous


------- End of Forwarded Message


From jmacd@helen.CS.Berkeley.EDU  Tue Jun 20 17:16:05 2000
Return-Path: <jmacd@helen.CS.Berkeley.EDU>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id RAA01263; Tue, 20 Jun 2000 17:16:05 -0700 (PDT)
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA25361; Tue, 20 Jun 2000 17:16:04 -0700
Received: from helen.CS.Berkeley.EDU (helen.CS.Berkeley.EDU [128.32.131.251])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id RAA05708;
	Tue, 20 Jun 2000 17:16:04 -0700 (PDT)
Received: (from jmacd@localhost)
	by helen.CS.Berkeley.EDU (8.9.1a/8.9.1) id RAA21899;
	Tue, 20 Jun 2000 17:12:19 -0700 (PDT)
Message-Id: <20000620171219.23658@helen.CS.Berkeley.EDU>
Date: Tue, 20 Jun 2000 17:12:19 -0700
From: Josh MacDonald <jmacd@CS.Berkeley.EDU>
To: Jeffrey Mogul <mogul@pa.dec.com>, http-delta@pa.dec.com
Subject: Delta-compression
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.89.1

This is to announce the release of a new piece of software that the
http-delta readers may find interesting.  I have finished my Master's
thesis on the topic of delta-compressed storage and transport and 
recently released my software under a BSD-style license.  The technical
report and software is available at:

	http://www.cs.berkeley.edu/~jmacd/xdelta.html

My software is primarily intended as a replacement for back-end delta-
compressed storage using RCS.  I show that a transactions can improve
the performance, reliability, and extensibility of this kind of 
application.  My software is geared towards embedding in server 
applications that want delta-compression, but it does not implement
a network protocol of its own.  While it is not immediately applicable
to the delta encoding draft (I have file format issues to sort out--
contact me for details), a prototype HTTP proxy has already been 
implemented using my system.  A paper by Mihut Ionescu and Matthew 
Delco describing their HTTP proxy is available at:

	http://www.cs.pdx.edu/~delco/xproxy.ps.gz

I would appreciate any feedback you might have regarding this work.

-josh

From mogul@pa.dec.com  Mon Jul  3 16:41:11 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA31776; Mon, 3 Jul 2000 16:41:11 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA25351; Mon, 3 Jul 2000 16:41:10 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA30450; Mon, 3 Jul 2000 16:41:10 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200007032341.QAA30450@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: Issac Goldstand's comments on Delta encoding spec 
In-Reply-To: Your message of "Tue, 20 Jun 2000 10:56:01 PDT."
             <200006201756.KAA23660@wera.pa.dec.com> 
Date: Mon, 03 Jul 2000 16:41:10 -0700
X-Mts: smtp

Here are my replies to Issac Goldstand's comments of 20 June 2000.
(Sorry for the long delay, but I do have to keep up with my day job.)

    All comments are regarding document "draft-mogul-http-delta-04.txt"

And all are greatly appreciated, even if I think most don't
require changes to the draft.

    Firstly: regarding section 10.4.1 of the document, you write:

       "...The response MUST include an Etag header field giving the
       entity tag of the current instance, and MUST include an IM
    header field listing the instance manipulations that were applied
    to the current instance..."

    Why is this being defined as MUST?  Can the server not be allowed
    to send it back without an entity tag?  RFC 2616 section 13.3.4
    clearly states "HTTP/1.1 origin servers...SHOULD send an entity tag
    validator unless it is not feasible to generate one."

    Therefore, I believe you should be more specific to your reasoning
    as to why delta encodings suddenly MUST carry them.
    
That section of RFC2616 isn't actually the most relevant one.
Instead, look at RFC2616 section 13.3.3 (Weak and Strong Validators).
Basically, when you're using delta-coding or ranges, at least,
you need a strong validator, and there is not much chance of this
when using Last-Modified.  Otherwise, you can't be sure that you
are matching up the response to the proper instance.

Having said that, I suppose that there is one exception: if the
only instance-manipulation for the message is a compression, and
it will never be used with a delta-coding or range, then
you don't actually need a strong validator.  But I can't think
of why you would want to use a compression i-m in this case,
instead of a compression content-coding, so I think this is not
worth making an exception for.

    Secondly:  You mention the seeming necessity to add the Delta-Base
    response tag.  This seems to me to be a waste.  The IM response
    header tag, defined in section 10.5.2 seems to be specific to delta
    encoded responses.  Therefore, it would seem to me that rather than
    adding a Delta-Base response header as defined in section 10.5.1,
    we could save a few bytes AND registration of a new header by
    simply adding the Delta-Base as a parameter to the IM response
    header, as such:
    
       IM = "IM" ":" #(instance-manipulation  [ ";" "base" "=" entity-tag ])
    
It's true that the Delta-Base header is somewhat of a carryover
from an earlier draft of the delta encoding spec (before we had
the IM header at all).  I don't think there is any real overhead
in "registration of a new header" (since we have to register at
least a few new headers), and in some ways it's probably easier
to parse this than as a parameter of the instance-manipulation.

As far as wasting bytes: yes, this would save a few bytes.

On the other hand, I'm reluctant to change the spec yet again,
especially if there is even a chance that this would trigger
some subtle bug that we haven't thought about.

I'll take comments on this one; let's see which way the consensus
goes.

    Since this is specific to the instance, we can possibly define
    multiple entity-tags for the same recourse in different formats,
    should it ever be needed (I still have to think a bit about this,
    so I'm being intentionally vague)  in a similar format -
    particularly by proxies/caches

I'm not sure I understand what you are proposing here.

    Thirdly: Also regarding the Delta-base header (I will continue to
    refer to it as such for the sake of simplicity, but still strongly
    reemphasize what I stated above about adding it as a parameter),
    you mention in section 10.5.1

       A cache or proxy that receives a delta-encoded response that
       lacks a Delta-base header MAY add a Delta-Base header whose
       value is the entity tag given in the If-None-Match field of the
       request (but only if that field lists exactly one entity tag).

    This seems to me to be lacking forward compatibility.  It is the
    place of the server and _not_ an intermediate proxy to assign
    entity-tags, and therefore, if the same server which originally
    generated an entity-tag for the instance (otherwise the instance
    would not exist within the cache) does NOT supply the entity-tag
    for the given response, we MUST assume that the server had a
    specific reason for withholding the entity-tag and therefore,
    SHOULD NOT (perhaps even MUST NOT) supply the client with one.

The reason that this is allowed is to resolve some issues that
were discussed on the mailing list; search for "implict delta-base"
in <ftp://ftp.digital.com/pub/DEC/WRL/mogul/http-delta-log.txt>
to see what the details were.

I can't think of any plausible reason why the server should ever
be able to withhold this information, because otherwise it would
be impossible to guarantee application of the delta to the right
base instance.  So there is always a delta-base for a delta-coded
response; the only question is whether it is explicit or implicit.
The implicit approach saves a more than a few bytes (which seems
to concern you; see above!) in a fairly common case.

But there are scenarios where a cached delta-coded response cannot
be properly returned to a subsequent request without an explicit
delta-base - hence we decided to allow a cache to convert the
implicit form to an explicit form.

    Fourth: You state the following in section 10.6:

     "A status-226 cache entry MUST NOT be used in response to a
     subsequent request ...  If any of the instance-manipulation values
    in the cached IM header field is a delta-coding, the cache entry
    does not include a Delta-Base header field, and the If-None-Match
    header field of the request that led to that cache entry does not
    match the If-None-Match header field of the subsequent request."

    It seems to me that we should be stopping the cache from responding
    as soon as EITHER the If-None-Match header fields do not match OR
    if the Delta-Base response header is missing.  The document seems
    to be requiring the lack of BOTH of these conditions.

This is another piece of the support for implicit delta-base values.
If the cache entry does include a delta-base value E, then the
entry is suitable for use in reply to a request whose If-None-Match
field lists E.  It's not necessary for the If-None-Match fields
to match exactly, in this case.  More concretely, if the original
request had
	If-None-Match: "a", "b", "c"
and the cached response had
	Delta-Base: "c"
and the new request had
	If-None-Match: "c", "d", "e"
then the cached response is usable with the request (other conditions
being satisfied).  We didn't want to require an exact match on
the If-None-Match fields in this case, because it greatly reduces
the probably of the cache entry being useful.

On the other hand, if the entry does NOT include a delta-base
value, then it *is* necessary for the If-None-Match fields to
match exactly (and, by virtue of other rules, they have to list
exactly one entity-tag).  Otherwise, you can't tell whether the
cache entry is suitable for the new response.

The rule is written in terms of "the cache entry" rather than, say,
"the response [as sent by the origin server]" in order to allow
the implicit delta-base to be made explicit.  For example,
if the original request had
	If-None-Match: "c"
and the cached response originally had no explicit "Delta-Base" header,
and the new request had
	If-None-Match: "c", "d", "e"
then the cache could (by converting the implicit Delta-base
to a real one) properly make use of the cache entry.

    Additionally, you later state (also section 10.6) "...we know of no
    formal specification for deciding if a cached status-206 response
    is consistent with a subsequent request..." 

This should probably read "no existing, published formal specification".
I'll make the change.  Does that make it clearer?

    yet in section 9 you
    clearly seem to be making an issue of having digests for both the
    full data response (i.e., what would come with a 200 reply) and for
    delta-repsonses (226).  Could this kind of digesting not be used to
    formally match 206 entries? 

Probably, but 206, "Range", and "Content-Range" are already in
(somewhat) widespread use, and I don't think it's feasible to
retroactively tighten up their specification.  (I'll take some
blame for not having done this during the HTTP/1.1 design
process, but the whole "entity" vs. "instance" debacle was
not yet clear enough then.)

    Furthermore, I stated above that it
    may be beneficial to replace the Delta-base header with base
    parameters within the IM tag.  By implementing numerous tags in a
    similar fashion we could define an entity-tag that represents the
    delta between the delta base and the instance itself (i.e. what
    would be transmitted with a 200 response), and thus be able to
    cache it properly, and, more importantly, define a _unique_
    identifying tag to each delta.

I think you're stumbling back into the confusion over what
an "entity tag" really is.  It's not surprising, given that
it actually has nothing to do with the "entity" as the
term is defined in RFC2616.  This also relates to the confusing
debate we had, in the HTTP/1.1 design process, about what
a cache actually stores.  I think it really only makes sense
to talk about the instances (or partial instances) that a cache
stores; talking about the entities (or "responses") that a cache
stores seems to inevitably lead to ambiguity and confusion.

Some day I plan to write a paper about this, since I think it
is a fairly difficult but important (and badly misundertood)
aspect of HTTP.

    Next: The last example in section 10.7.3 (which I will repeat here)
    says that a request by a client:
    
	   GET /foo.html HTTP/1.1
	   Host: example.com
	   If-none-match: "abc"
	   Accept-encoding: gzip
	   A-IM: diffe, gzip
    
    might return:
    
	   HTTP/1.1 226 IM Used
	   Date: Wed, 24 Dec 1997 14:01:00 GMT
	   Delta-base: "abc"
	   Etag: "ghi"
	   IM: diffe, gzip
    
    with a body containing  GZIP(DIFFE_DELTA(GUNZIP(GZIP(foo.html;"abc")),
    foo.html;"ghi"))
    
    I simply wanted to add that such a request could, and would very likely,
    return the following response:
    
	   HTTP/1.1 226 IM Used
	   Date: Wed, 24 Dec 1997 14:01:00 GMT
	   Delta-base: "abc"
	   Etag: "jkl"
	   IM: diffe, gzip
	   Content-Encoding: gzip
    
    with a body containing GZIP(DIFFE_DELTA(GZIP(foo.html;"abc"),
    GZIP(foo.html;"jkl")))

I don't think this makes sense, in practice.  To quote Daniel
Hellerstein, "[the] computation of differences between compressed
instances is probably useless, hence the [delta spec] goes to some
length to allow efficient use of compression [after] delta coding."

While it is perhaps remotely possible that some future delta
algorithm could make use of compressed inputs (or maybe not;
perhaps there's a good information-theory reason?), I think
it would be misleading to provide an example of that.

    In section 10.8.1 it is stated that "...By implication, if a client
    has retrieved and cached several instances of a resource, some of
    which are marked with ``retain'' and some not, then there is no
    point in caching the instances not marked with ``retain''."

    This seems silly, as if the server knows how to use the retain at
    all, it should know to send a "retain=0" to eliminate caching.  It
    seems to me that a lack of "retain" should just indicate that
    normal caching rules should apply, rather than imply that the
    server is hinting that it should NOT be cached.

Again, this is mostly in the service of avoiding the transmission
of excessive bytes.  If you read the statement carefully, it's
not saying that 
	"lack of retain implies do not cache this response"
it is saying that a response lacking "retain" should not be
cached *in addition to* (or instead of) a response for the
same resource that does have a "retain" directive.  If this
is the only stored cache entry for the resource, then the normal
caching rules *do* apply.

Sending "retain=0" in this case seems to be redundant.

    While I'm discussing section 10.8.1, there are another two minor
    issues that I just wanted to point out:  Firstly, you state  "A
    client ought not use the corresponding entity tag in a future
    request for a delta-encoded response after that interval ends."  If
    you're going to write this in section 10, where you are playing by
    RFC 2119's rules, you "ought" to define "ought" :-)

Actually, that was intentional - I was trying to avoid adding a
formal requirement (and the word "ought" is fairly well-defined
in any English dictionary).  Our bias in writing the HTTP/1.1
spec was to avoid imposing "normative requirements"  (MUST/SHOULD)
when they were not required for interoperability or reasonable
performance requirements, and I think this is one of those places.

    Secondly, you
    state a bit later on that "A server SHOULD NOT send ``retain=0''
    except in reply to a request that attempts to obtain a
    delta-encoded response."  Here, it seems to me that the server
    should either NEVER send "retain"s unless an A-IM is present, or
    send "retain=0" any time it likes - INCLUDING when no A-IM is
    specified.  I personally prefer the second line of thought.

I'd be curious as to your reasoning for the "second line of thought."
This "SHOULD NOT" was originally motivated by a desire to avoid
sending extra bytes in a context where they would not be useful
(because many clients would not ever care about deltas).

As to the first, if by "NEVER" you mean "MUST NOT", again I think
that's probably not justifiable in this case.

    My final "formal" comment is on section 10.8.2, and is, I'm afraid,
    a bit vague at this point.

    Section 10.8.2 states "A cache that complies with the specification
    for the IM header, the A-IM header, and the 226 response-status
    code SHOULD ignore a no-store cache-directive if an im directive is
    present in the same response.  All other implementations MUST
    ignore the im directive (i.e., MUST observe a no-store directive,
    if present)."

    I have not yet worked out case examples, but it seems to me that
    this may present problems with forward compatibility with HTTP/1.1
    protocol extensions.  What if another extension is suggested that
    runs parallel with the IM field and also has special cache control
    requirements.  Might it not be possible that our implementation
    here will handicap later HTTP extension proposals?

Yes, I suppose this is, at least remotely, a potential problem.

However, it's probably plausible to expect that if a server
generates
	HTTP 226 IM Used
	Cache-Control: no-store, im, some-other-extension
then that "some-other-extension" had better be compatible
with IM field - otherwise it probably shouldn't be included
in a 226 response, right?

So, in practice, I think this is not a problem except perhaps
for extensions to the IM mechanism itself, and (hopefully)
the current IM specifications are general enough that this
shouldn't be necessary.

    Also - one final question:  It is painfully
    obvious that there are some other "supporting" internet-drafts
    connected with this, and I'd appreciate it if you could send me the
    names of related drafts so I can more completely familiarize myself
    with this project.

Actually, there are no "supporting internet-drafts" except
	(1) prior versions of draft-mogul-http-delta-*.txt
	(2) draft-mogul-http-digest-02, as cited.

I did strip out, from draft-mogul-http-delta-03.txt, some
optional parts of the delta spec ("Clusters" and "Templates")
because they were causing a lot of debate on some potential
security holes.  It's my intention to generate a new I-D
that (separately) specifies those extensions, but I don't
want to spend any time on that until the basic (and long-overdue)
delta spec is mostly done.

-Jeff

From mogul  Thu Jul  6 16:59:59 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA02096; Thu, 6 Jul 2000 16:59:59 -0700 (PDT)
Message-Id: <200007062359.QAA02096@wera.pa.dec.com>
To: http-delta
From: Issac Goldstand <issac@avoda.jct.ac.il>
Reply-To: neoi@writeme.com
Subject: Re: Issac Goldstand's comments on Delta encoding spec
Date: Thu, 06 Jul 2000 16:59:59 -0700
Sender: mogul
X-Mts: smtp

[I'm resending Issac's reply to the list, with some reformatting,
on his request.  He has had problems sending mail without
base64-encodings. -Jeff]

OK - here are my replies to Jeff's replies to my original
comments (geez this has the potential to get very long :-))

Jeffrey Mogul wrote:

> Here are my replies to Issac Goldstand's comments of 20 June 2000.
> (Sorry for the long delay, but I do have to keep up with my day job.)
>     Firstly: regarding section 10.4.1 of the document, you write:
>
>        "...The response MUST include an Etag header field giving the
>        entity tag of the current instance, and MUST include an IM
>     header field listing the instance manipulations that were applied
>     to the current instance..."
>
>     Why is this being defined as MUST?  Can the server not be allowed
>     to send it back without an entity tag?  RFC 2616 section 13.3.4
>     clearly states "HTTP/1.1 origin servers...SHOULD send an entity tag
>     validator unless it is not feasible to generate one."
>
>     Therefore, I believe you should be more specific to your reasoning
>     as to why delta encodings suddenly MUST carry them.
>
> That section of RFC2616 isn't actually the most relevant one.
> Instead, look at RFC2616 section 13.3.3 (Weak and Strong Validators).
> Basically, when you're using delta-coding or ranges, at least,
> you need a strong validator, and there is not much chance of this
> when using Last-Modified.  Otherwise, you can't be sure that you
> are matching up the response to the proper instance.

Just want to make sure I'm understanding correctly:  The
MUST here is really going on the strong validator.  We are
saying that since delta-encoding is so utterly dependant on
the ETags, then we want the server to constantly provide us
with one.  But that doesn't appear to be what we're saying
here.  The I-D states "MUST include an ETag header field".
That does not in any way imply whether it must be a strong
or weak validator, and as is, leaves one wondering why it
MUST be sent.  I still think that this should be changed to
SHOULD supply an ETag header.  Basically, we have three
scenarios here:

    1) Server replies with strong validator ETag - This is
    what we want to happen

    2) Server replies with weak validator ETag - This is
    inacceptable, as in this case the ETag is garbage as
    far as we're concerned, since it doesn't point to a
    specific instance of the requested resource (that
    sounds very BAD - mixing instance with entity.  I'll
    address this later)

    3) Server replies with no ETag - This, surprisingly
    enough, is acceptable - even if it's part of a 226
    response.  The only problem is that we will not be able
    to use it as a base for future IMs.
    
What the I-D implies, however, is that we are just
requiring an ETag.  This is actually bad because it implies
that acceptable responses are (1) or (2), while the only
accpetable responses are (1) and possibly (3).  Having
explained this, I urge you to reread my original comment in
a new light.  Basically, I was trying to push you into
accepting scenario (3).  Of course, now that I've gone into
detail, it seems ver obvious that we MUST reword this, as
scenario (2) MUST NOT be allowed to occur.

>     Secondly:  You mention the seeming necessity to add the Delta-Base
>     response tag.  This seems to me to be a waste.  The IM response
>     header tag, defined in section 10.5.2 seems to be specific to delta
>     encoded responses.  Therefore, it would seem to me that rather than
>     adding a Delta-Base response header as defined in section 10.5.1,
>     we could save a few bytes AND registration of a new header by
>     simply adding the Delta-Base as a parameter to the IM response
>     header, as such:
>
>        IM = "IM" ":" #(instance-manipulation  [ ";" "base" "=" entity-tag ])
>
>     Since this is specific to the instance, we can possibly define
>     multiple entity-tags for the same recourse in different formats,
>     should it ever be needed (I still have to think a bit about this,
>     so I'm being intentionally vague)  in a similar format -
>     particularly by proxies/caches
>
> I'm not sure I understand what you are proposing here.

OK.  What I was getting at is an "enhanced" version of the
IM tag.  It would go something like this (Pls forgive me if
I make mistakes in this - I'm still pretty new at this)

     IM = "IM" ":" #(instance-manipulation [";" "base" "=" entity-tag]
		 [";" "ITag" "=" entity-tag])

Before I explain this, I'm gonna go into that
instance/entity issue I mentioned earlier:  As far as I
understand, we have two things:  An instance is a
description of a "state" of a given resouce.  Basically,
it's a version of it at a given time.  Entity I understand
as a referance to the _current_ instance of the resource
and exists solely for the duration of the transfer (even
though later transfers might still be using the same
instance).  So basically, we have the ETags, which appear
to me to be pointing to instances, rather than "entities".
Having stated this, we can look at what I described
above.   Although I formally specified the header (or tried
to, at least :)), I want it to be clear that I'm leaving
the parameters loose.  What I called an "ITag" there would
basically be a pointer to the instance after it's
corresponding IM was applied but BEFORE all of the rest.
Supplying it would enable browsers and proxies to cache the
resource multiple times with different IM tags.  Then, if a
client later requests a new instance with a
"If-non-matches" containing one of thesee "ITag"s, it (web
cache/proxy/server/whatever) can calculate the remaining
IMs and return it (if it knows that it is still valid -
possbily due to the presense of a "retain" directive).  In
addition to this, the DECODED version of the resource after
all IMs have been applied (ie, what would be returned in a
200 response) can be identified via the normal ETag
header.  Like I said before, though, this is all
tentative.  Comments are welcome though.

>     Secondly, you
>     state a bit later on that "A server SHOULD NOT send ``retain=0''
>     except in reply to a request that attempts to obtain a
>     delta-encoded response."  Here, it seems to me that the server
>     should either NEVER send "retain"s unless an A-IM is present, or
>     send "retain=0" any time it likes - INCLUDING when no A-IM is
>     specified.  I personally prefer the second line of thought.
>
> I'd be curious as to your reasoning for the "second line of thought."

It allows for a client (and even better - an intermediate
proxy) who DOES understand deltas and "retain" to get this
additional information even if a delta was not requested.

> This "SHOULD NOT" was originally motivated by a desire to avoid
> sending extra bytes in a context where they would not be useful
> (because many clients would not ever care about deltas).

But, like I said above, many proxies might.

> As to the first, if by "NEVER" you mean "MUST NOT", again I think
> that's probably not justifiable in this case.

Assuming that you still don't agree with my above reason,
which I think is a worthwhile gamble (although bytes might
be wasteed on individual transfers that don't use the
retain, the time and bandwidth saved by good intermediate
delta-understanding proxies should justify it) I'd like to
know why the MUST NOT would not be justifiable.

That's pretty much it for now.

  Issac

--
Internet is a wonderful mechanism for making a fool of
yourself in front of a very large audience.
  --Anonymous

Moving the mouse won't get you into trouble...  Clicking it might.
  --Anonymous

From mogul@pa.dec.com  Mon Jul 10 16:01:45 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA29558; Mon, 10 Jul 2000 16:01:45 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA00746; Mon, 10 Jul 2000 16:01:45 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA04232; Mon, 10 Jul 2000 16:01:44 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200007102301.QAA04232@wera.pa.dec.com>
To: neoi@writeme.com
Cc: http-delta@pa.dec.com
Subject: "Entity" is a dirty word [was Re: Issac Goldstand's comments on Delta encoding spec}
In-Reply-To: Your message of "Thu, 06 Jul 2000 16:59:59 PDT."
             <200007062359.QAA02096@wera.pa.dec.com> 
Date: Mon, 10 Jul 2000 16:01:44 -0700
X-Mts: smtp

Issac, I think you (like many others) have been confused by
the use of the word "entity" in the HTTP spec.  I've written
about this before, but I think I need to say something about
this problem in general, before addressing your specific
comments.

Simply put, all uses of the word "entity" in the HTTP
specification are bogus.  The word should never have
been used.  It was introduced by well-meaning people
who were very wrong about what the definition should
be.  I fought against this, but lost.

The term "entity tag" is also a mistake, because it
has ABSOLUTELY NOTHING AT ALL to do with an "entity"
(following the HTTP/1.1 definition of the term).  I
probably should have fought harder against this myself,
but at the time I didn't realize how wrong this was.

In retrospect, the HTTP/1.1 specification should have
used the term "instance tag" and named the header
field "ITag".  This would have avoided a lot of confusion.

So, when you wrote:
    As far as I understand, we have two things:  An instance is a
    description of a "state" of a given resouce.  Basically, it's a
    version of it at a given time.  Entity I understand as a referance
    to the _current_ instance of the resource and exists solely for the
    duration of the transfer (even though later transfers might still
    be using the same instance).  So basically, we have the ETags,
    which appear to me to be pointing to instances, rather than
    "entities".  Having stated this, we can look at what I described
    above.

you were on the right track, but still not quite right.

The problem is the specification never clearly defines
what a "resource" is - at least, not clearly enough to
decide what kind of thing has a "state" at a given time.
The word "version" doesn't quite solve the problem, because
"resources" can have "variants" which might or might not
be time-based.

If you are looking for something to describe as "a version
of [a resource] at a given time", however, the "variant"
is probably the closest we can get.  Part of the confusion
is whether a content-coding (such as gzip) is intrinsic
to the resource variant, or whether it is applied on the fly.

Another problem is that it's impossible to define what
the "current state" of a resource (or variant) is, because
the result of applying a method to a resource isn't just
based on time, it could be based on who is making the
request and what request headers are provided.

In any case, we've chosen the word "instance" to apply to
a possibly content-encoded possibly variant of a resource
in response to a given request at a particular time.

And the protocol makes no sense in certain contexts unless
"entity tags" are associated with "instances".

So what is an "entity"?  As defined in the spec, all it
really is a message-in-transit.  It can't be defined
as *the* "_current_ instance of the resource", because the
same instance could be sent in many different entities,
as you've noted, and because "current" isn't really meaningful
by itself.

So, clearly, an "entity tag" has no useful connection with an
"entity" in HTTP/1.1.

-Jeff

From mogul@pa.dec.com  Mon Jul 10 18:17:52 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA32615; Mon, 10 Jul 2000 18:17:52 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA01590; Mon, 10 Jul 2000 18:17:52 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA32135; Mon, 10 Jul 2000 18:17:52 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200007110117.SAA32135@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: Issac Goldstand's comments on Delta encoding spec 
In-Reply-To: Your message of "Thu, 06 Jul 2000 16:59:59 PDT."
             <200007062359.QAA02096@wera.pa.dec.com> 
Date: Mon, 10 Jul 2000 18:17:52 -0700
X-Mts: smtp

Issac Goldstand <neoi@writeme.com> wrote:

    > That section of RFC2616 isn't actually the most relevant one.
    > Instead, look at RFC2616 section 13.3.3 (Weak and Strong Validators).
    > Basically, when you're using delta-coding or ranges, at least,
    > you need a strong validator, and there is not much chance of this
    > when using Last-Modified.  Otherwise, you can't be sure that you
    > are matching up the response to the proper instance.
    
    Just want to make sure I'm understanding correctly:  The
    MUST here is really going on the strong validator. 

OK, I think I see your point.

    Basically, we have three scenarios here:
    
	1) Server replies with strong validator ETag - This is
	what we want to happen
    
	2) Server replies with weak validator ETag - This is
	inacceptable, as in this case the ETag is garbage as
	far as we're concerned, since it doesn't point to a
	specific instance of the requested resource (that
	sounds very BAD - mixing instance with entity.  I'll
	address this later)
    
	3) Server replies with no ETag - This, surprisingly
	enough, is acceptable - even if it's part of a 226
	response.  The only problem is that we will not be able
	to use it as a base for future IMs.
	
We both agree on #1.

Re: #3 - I guess you're right that it's theoretically OK if a
delta arrives without an Etag, because that response
can be used even if it cannot be (usefully) cached.

Re: #2 - if we don't require an entity tag at all, then I
don't think we can insist on a strong one.  To put this
another way - the "Etag" field in a 226 response is not
relevant to the immediate use of this response (e.g., delta
application to an existing cache entry).  It's only relevant
to caching of the re-assembled instance.  And that's basically
independent of whether the instance was completed as the
result of a 200 response or a 226 response.  Since the
spec doesn't require 200 responses to use strong entity
tags, I think we can't, either, in this situation.

So I accept your correction; the second paragraph of 10.4.1
"226 IM Used" changes from:

   The request MUST have included an A-IM header field listing at least
   one instance-manipulation.  The response MUST include an Etag header
   field giving the entity tag of the current instance, and MUST include
   an IM header field listing the instance-manipulations that were
   applied to the current instance.

To

   The request MUST have included an A-IM header field listing at least
   one instance-manipulation.  The response MUST include
   an IM header field listing the instance-manipulations that were
   applied to the current instance.

and just leave the question of including an Etag header field
to other parts of the HTTP/1.1 spec (and other parts of the
delta spec).

    >     Secondly, you
    >     state a bit later on that "A server SHOULD NOT send ``retain=0''
    >     except in reply to a request that attempts to obtain a
    >     delta-encoded response."  Here, it seems to me that the server
    >     should either NEVER send "retain"s unless an A-IM is present, or
    >     send "retain=0" any time it likes - INCLUDING when no A-IM is
    >     specified.  I personally prefer the second line of thought.
    >
    > I'd be curious as to your reasoning for the "second line of thought."
    
    It allows for a client (and even better - an intermediate
    proxy) who DOES understand deltas and "retain" to get this
    additional information even if a delta was not requested.
    
    > This "SHOULD NOT" was originally motivated by a desire to avoid
    > sending extra bytes in a context where they would not be useful
    > (because many clients would not ever care about deltas).
    
    But, like I said above, many proxies might.

OK, I don't think it's a big deal, and (as I said) the bias
for spec-writers ought to be to leave out requirements if they
don't have clear justifications (this is the reason why I
am very reluctant to use the keyword "MUST" unless we can
point to a specific problem that it solves).

So, the last paragraph of section 10.8.1 "Retain directive"
changes from:

   If the retain directive includes a delta-seconds value of zero, a
   client SHOULD NOT use the corresponding entity tag in a future
   request for a delta-encoded response.  A server SHOULD NOT send
   ``retain=0'' except in reply to a request that attempts to obtain a
   delta-encoded response.

To

   If the retain directive includes a delta-seconds value of zero, a
   client SHOULD NOT use the corresponding entity tag in a future
   request for a delta-encoded response.

	Note: We recommend that server implementors consider
	the bandwidth implications of sending "retain=0"
	directives to clients or proxies that might not have
	the ability to make use of it.

Any comments?

-Jeff

From mogul@pa.dec.com  Mon Jul 10 18:35:28 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA29782; Mon, 10 Jul 2000 18:35:28 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA12078; Mon, 10 Jul 2000 18:35:28 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA02063; Mon, 10 Jul 2000 18:35:28 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200007110135.SAA02063@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Issac's "enhanced" version of the IM header
In-Reply-To: Your message of "Thu, 06 Jul 2000 16:59:59 PDT."
             <200007062359.QAA02096@wera.pa.dec.com> 
Date: Mon, 10 Jul 2000 18:35:28 -0700
X-Mts: smtp

Issac Goldstand <neoi@writeme.com> wrote:
    OK.  What I was getting at is an "enhanced" version of the
    IM tag.  It would go something like this (Pls forgive me if
    I make mistakes in this - I'm still pretty new at this)
    
	 IM = "IM" ":" #(instance-manipulation [";" "base" "=" entity-tag]
		     [";" "ITag" "=" entity-tag])
    
    [... stuff elided by Jeff ...]
    What I called an "ITag" there would
    basically be a pointer to the instance after it's
    corresponding IM was applied but BEFORE all of the rest.
    Supplying it would enable browsers and proxies to cache the
    resource multiple times with different IM tags.  Then, if a
    client later requests a new instance with a
    "If-non-matches" containing one of thesee "ITag"s, it (web
    cache/proxy/server/whatever) can calculate the remaining
    IMs and return it (if it knows that it is still valid -
    possbily due to the presense of a "retain" directive).  In
    addition to this, the DECODED version of the resource after
    all IMs have been applied (ie, what would be returned in a
    200 response) can be identified via the normal ETag
    header.  Like I said before, though, this is all
    tentative.  Comments are welcome though.

I can see what you are getting at.  However, I think this
is something that (1) at the very least can be postponed to
be treated as a future extension of the current spec,
and (2) perhaps isn't actually necessary.

Reasoning behind (1):
    (a) We're already way too late on the basic delta spec.

    (b) I would rather not add something that changes the
    definition of "instance" and hence the interpretation of
    entity tags without a lot of thought; we have gotten
    into trouble over this subject area before!
    
    (c) I think this is something that, if it proves to
    be useful, could probably be specific as an extension
    (and an implementation-optional feature) and so
    does not need to be in the basic delta spec.

Reasoning behind (2):

Your goal here, if I understand it correctly, is to allow a
cache to take a cache entry and use it in a reply with a
different set of instance-manipulations than it had when
"it" arrived.  I put "it" in quotes because I think this is
one of those places where we need to be more rigorous about
saying what we mean.

There has been some debate over the years about exactly
what an HTTP cache stores.  Does it store "resources" or
"responses" or "objects" or "messages" or what?  I think
it's probably most accurate to think of a cache as storing
"instances" or "partial instances".  If you agree with that
conceptualization, then what is stored in a cache (in
abstract terms) never actually has an instance
manipulation.  I.e., the cache doesn't store a "delta
response", it stores the result of applying the delta to a
previous cache entry.

So instead of thinking of how to allow a cache to identify
a cache entry with some level of instance manipulations
already included, we should simply think of a cache as
FIRST trying to find the right *instance* in its cache to
use in a response, and THEN it might optionally apply some
instance manipulations, if it has enough information to do
that locally.

As a performance optimization,  the cache could keep a
"hidden cache" of instances with pre-applied instance
manipulations.  But this kind of optimization does not need
be (nor should be) visible to any client or server (and
hence not one that needs to be covered in the spec),
except that we might possibly want to warn implementors
that any such hidden caches cannot become inconsistent
with the true (spec-visible) cache entries.

If you can come up with a specific, concrete example
(that is, showing all of the headers in the sequence
of requests and responses) that cannot be solved this
way, then it is worth thinking harder about the extension
you are proposing.  But I don't think we actually need
it.

-Jeff



From mogul  Tue Jul 11 16:59:49 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA09964; Tue, 11 Jul 2000 16:59:48 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200007112359.QAA09964@wera.pa.dec.com>
To: http-delta
Subject: Remaining issue - replace Delta-Base header with IM param?
Date: Tue, 11 Jul 2000 16:59:48 -0700
X-Mts: smtp

I would like to submit a revised Internet-Draft (draft -05)
before the Friday deadline (the IETF stops accepting new
I-Ds prior to an IETF meeting).

I think we are almost at closure on the basic delta-encoding
specification.  There is one remaining issue, which is
Issac Goldstand's suggestion that we replace the Delta-Base
header with a parameter carried in the IM header.

For example, this:

       HTTP/1.1 226 IM Used
       Date: Wed, 24 Dec 1997 14:01:00 GMT
       Etag: "def"
       Delta-base: "abc"
       Content-encoding: gzip
       IM: vcdiff

would become

       HTTP/1.1 226 IM Used
       Date: Wed, 24 Dec 1997 14:01:00 GMT
       Etag: "def"
       Content-encoding: gzip
       IM: vcdiff;base="abc"

I haven't received much feedback about this (I think that
Daniel Hellerstein is in favor of the change).

On the one hand, it would regularize the protocol specification
somewhat.

On the other hand, I'm not sure this buys us very much.
For example, if we do go ahead with the DCluster and/or
DTemplate headers, these seem to actually require the
use of separate headers - they might be sent on responses
that do not have IM headers.

So I think there is no clear benefit to handling
the delta-base value either as a header or as a parameter,
aside from saving perhaps a few bytes in the 226 response.

With that in mind, my inclination is to leave the protocol
spec as it is, since we have already reviewed it for some
time.  I'm nervous about making a change at this point
whose consequences we might not entirely understand.

If anyone has strong arguments in favor of making a change,
please speak up ASAP.

Thanks
-Jeff

From mogul  Thu Jul 13 11:49:32 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA28560; Thu, 13 Jul 2000 11:49:32 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200007131849.LAA28560@wera.pa.dec.com>
To: http-delta
Subject: Last chance to review draft-mogul-http-delta-05.txt pre-publication 
Date: Thu, 13 Jul 2000 11:49:32 -0700
X-Mts: smtp

I've posted the latest version of the Delta Encoding spec at:
ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-delta-05.11july2000.txt

I plan to submit this to the IETF before noon tomorrow (they
stop accepting I-Ds several weeks before their meeting), so
if anyone has any last-minute comments, please tell me today!

This version has some minor changes (mostly the result of
Isaac Goldstand's comments, some typos were found by others).
It preserves the Delta-Base: header (at least, for now).

My hope is that we can proceed to a "Last Call" on the HTTP-WG
mailing list, and then ask the IESG to bless this as a Proposed
Standard.  I will then get to work on the Cluster/Template
extensions, I promise :-)

Thanks,
-Jeff

From mogul  Thu Jul 20 08:52:45 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id IAA18836; Thu, 20 Jul 2000 08:52:45 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200007201552.IAA18836@wera.pa.dec.com>
To: http-delta
Subject: draft-mogul-http-delta-05.txt now available from IETF
Date: Thu, 20 Jul 2000 08:52:45 -0700
X-Mts: smtp

It took the I-D editor a few days to push this through their backlog:

    From: Internet-Drafts@ietf.org
    Message-ID: <200007201044.GAA10458@ietf.org>
    Subject: I-D ACTION:draft-mogul-http-delta-05.txt
    Date: Thu, 20 Jul 2000 06:44:08 -0400
    
    A New Internet-Draft is available from the on-line Internet-Drafts
    directories.
    
    
	    Title	: Delta encoding in HTTP
	    Author(s)	: J. Mogul, B. Krishnamurthy, F. Douglis, A. Feldmann, 
		      Y. Goland, A. van Hoff, D. Hellerstein
	    Filename	: draft-mogul-http-delta-05.txt
	    Pages	: 45
	    Date	: 19-Jul-00
	    
    Many HTTP requests cause the retrieval of slightly modified
    instances of resources for which the client already has a
    cache entry.  Research has shown that such modifying
    updates are frequent, and that the modifications are
    typically much smaller than the actual entity.  In such
    cases, HTTP would make more efficient use of network
    bandwidth if it could transfer a minimal description of the
    changes, rather than the entire new instance of the
    resource.  This is called 'delta encoding.'  This
    document describes how delta encoding can be supported as a
    compatible extension to HTTP/1.1.
    
    A URL for this Internet-Draft is:
    http://www.ietf.org/internet-drafts/draft-mogul-http-delta-05.txt

-Jeff

P.S.: Not to be confused with the far more interesting "Delta 5"
	http://www.hiljaiset.sci.fi/punknet/delta5_e.htm

From mogul  Thu Jul 20 08:59:10 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id IAA23804; Thu, 20 Jul 2000 08:59:10 -0700 (PDT)
Message-Id: <200007201559.IAA23804@wera.pa.dec.com>
To: http-delta
Orig-Date: Sun, 16 Jul 2000 17:10:43 -0500
From: Mike Dahlin <dahlin@cs.utexas.edu>
Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt]
Date: Thu, 20 Jul 2000 08:59:10 -0700
Sender: mogul
X-Mts: smtp

[Re-sent by Jeff Mogul with Mike's permission, with slight editing.]

As for the draft, it looks good. There were a couple points where I got a bit
bogged down/confused. Details below.

-mike


Comments on draft-mogul-http-delta-04.txt

10.5.2 (p 29)

The case when multiple ranges are returned (Content-type:
multipart/byteranges) might be worth describing in a bit more
detail.

In particular, it may not be immediately obvious how the case of

   IM: range, vcdiff

would be decoded if there are multiple ranges (how does the decoder
know the length of the pieces to know when one range stops and the
next begins after a vcdiff encoding?)

The answer (I presume) is that the contents are encoded as
multipart/byteranges as per RFC2046, so the decoder doesn't need to
know the encoded lengths of the pieces to know when one stops and the
next begins -- the decoder uses the delimiters specified in the header
instead.


It might be helpful to see examples of how the Range:,
Content-length:, and RFC2046 delimiters fit together for IM:
range,vcdiff (and maybe IM: vcdiff,range) cases with multiple ranges
returned.


10.5.3 (p30)
"If a request includes an A-IM header field that lists the 'range'
instance-manipulation prior to any delta-coding(s), and the request
also includes an If-Range: header that lists the entity tag of the
current instance, the server SHOULD ignore the delta-codings."

At first glance, the need for the rule is not obvious. It seems like a
server could interpret
       GET /foo.html HTTP/1.1
       host: bar.example.net
       If-None-Match: "A"
       If-Range: "B"
       A-IM: range, vcdiff
       Range: bytes=900-

As meaning "if B is still the current version, send me bytes 900-... of
B and you may vcdiff (with bytes 900-... of A) it if you like." This
seems sensible.

e.g., following the example in section 5.7

if Tcur = "A" -> server replies with 304 (not modified)

if Tcur = "B" --> server replies with 266 (im used) + an "IM:
range,vcdiff" response header, and a message body
including the vcdiff(A[900-], B[900-]);
{If the server doesn't understand IM, the right thing still happens, I
think} {The server, of course, is still welcome to ignore the vcdiff
specification and just send the raw range}

if Tcur = "C" --> send VCDIFF(A, C)


Is the intention to avoid the need to have delta encoding algorithms
be defined for encoding ranges of files against each other?
I don't understand why it is particularly more demanding to expect
clients to be able to run
 VUNDIF(SELECT(A, 900, END),
        VDIFF(SELECT(A, 900, END), SELECT(B, 900, END)))
than to run
        VUNDIF(A, VDIFF(A, B))
But I could easily not be noticing a corner case.

If there are particular codes for which this is a problem, it seems
like this restriction should be part of the specification of the code
not part of the specification for all codes.

So it seems like this rule should be relaxed. Whether or not the rule
is relaxed, it might be good to discuss this case in section 5.7
before this rule appears in 10.5.2


10.5.2 (p31-32)
I don't understand the last full paragraph on p31 ("The server's
choice about whether...") and the Note: spanning p31-32.

My best guess is that this is an explanation of the rule from the
previous paragraph (that I discussed above). Even if so, I still don't
understand the argument.


10.5.2 (p 31)
Typo "subsequentLY-APPLIED"


10.6 (p33)

I don't understand the need for MUST-NOT rule 3 ("If the cache
implementation is not aware of, or is not at least conditionally...")


My understanding of the specification is that

   baseInstance * IM-Pipeline = currentInstance

(Where "* IM-Pipeline" means apply the functions listed in IM: header
in the order listed). E.g., the actual functions implemented by the
IM-Pipeline can be opaque to the cache implementation. The
specification says that the element in the cache must be tagged by
baseInstance and by IM-Pipeline and that it will not return the
element unless both match what the client is able to accept.

I don't understand why the cache would need to understand a new
encoding as long as it has the ability to only supply the new encoding
to clients that understand it. It seems like the matching rules give
it that ability.

It seems like it would be desirable to allow caches to treat new
algorithms as opaque functions but to still allow caching of the
output of those functions.



At the bottom of the subsection there is a note "Rule 3 allows for
extending the set of instance-manipulations without causing deployed
cache implementations to commit errors", but I should think that rules
1 and 2 suffice to prevent errors.


(Is the reason for rule 3 is to help support "new instance
manipulations [that] may include additional caching rules to improve
cache-hit rates in cognizant implementations" as per the Note? I don't
see the need, but perhaps you have a particular optimization in mind?)


10.6 (p. 34)
Separating the Note about MUST-NOT rule 3 from the MUST-NOT list with
another numbered list (range conditions list) is confusing.

From mogul@pa.dec.com  Tue Aug  1 18:41:16 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA04716; Tue, 1 Aug 2000 18:41:15 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA18359; Tue, 1 Aug 2000 18:41:15 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA06662; Tue, 1 Aug 2000 18:41:15 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200008020141.SAA06662@wera.pa.dec.com>
To: Mike Dahlin <dahlin@cs.utexas.edu>
Cc: http-delta@pa.dec.com
Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] 
In-Reply-To: Your message of "Thu, 20 Jul 2000 08:59:10 PDT."
             <200007201559.IAA23804@wera.pa.dec.com> 
Date: Tue, 01 Aug 2000 18:41:15 -0700
X-Mts: smtp

    Comments on draft-mogul-http-delta-04.txt
    
    10.5.2 (p 29)
    
    The case when multiple ranges are returned (Content-type:
    multipart/byteranges) might be worth describing in a bit more
    detail.
    
    In particular, it may not be immediately obvious how the case of
    
       IM: range, vcdiff
    
    would be decoded if there are multiple ranges (how does the decoder
    know the length of the pieces to know when one range stops and the
    next begins after a vcdiff encoding?)
    
    The answer (I presume) is that the contents are encoded as
    multipart/byteranges as per RFC2046, so the decoder doesn't need to
    know the encoded lengths of the pieces to know when one stops and the
    next begins -- the decoder uses the delimiters specified in the header
    instead.
    
Doesn't section 10.10 (Delta encoding and multipart/byteranges)
already cover this issue?  I.e., it has this paragraph:

   When a multipart/byteranges response uses a delta-coding after a
   range selection, the A-IM and IM header fields list the delta-coding
   after the "range" literal.  (Recall that this is the approach taken
   to obtain an updated version just of selected sections of an
   instance.)  The server first selects the specified ranges from the
   current instance, and also selects the same specified ranges from the
   base instance.  (Some of these selected ranges might be the empty
   sequence, if the instance is not long enough.)  The server then
   generates the individual differences (deltas) between the pairs of
   ranges, and transmits each such difference in a part of the
   multipart/byteranges media type.

Perhaps it would be sufficient to add a cross-reference from
section 10.5.2 to 10.10.
    
    It might be helpful to see examples of how the Range:,
    Content-length:, and RFC2046 delimiters fit together for IM:
    range,vcdiff (and maybe IM: vcdiff,range) cases with multiple ranges
    returned.

If the text in 10.10 doesn't seem clear enough, I could add
that, but it would be a fairly lengthy example.

I'll try to get to your other comments tomorrow ...

-Jeff
    

From mogul@pa.dec.com  Wed Aug  2 13:17:36 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA31621; Wed, 2 Aug 2000 13:17:36 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA27740; Wed, 2 Aug 2000 13:17:36 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA02332; Wed, 2 Aug 2000 13:17:35 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200008022017.NAA02332@wera.pa.dec.com>
To: Mike Dahlin <dahlin@cs.utexas.edu>
Cc: http-delta@pa.dec.com
Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] 
In-Reply-To: Your message of "Thu, 20 Jul 2000 08:59:10 PDT."
             <200007201559.IAA23804@wera.pa.dec.com> 
Date: Wed, 02 Aug 2000 13:17:35 -0700
X-Mts: smtp

Going through your comments one by one ...

    10.5.3 (p30)
    "If a request includes an A-IM header field that lists the 'range'
    instance-manipulation prior to any delta-coding(s), and the request
    also includes an If-Range: header that lists the entity tag of the
    current instance, the server SHOULD ignore the delta-codings."
    
    At first glance, the need for the rule is not obvious. It seems like a
    server could interpret
	   GET /foo.html HTTP/1.1
	   host: bar.example.net
	   If-None-Match: "A"
	   If-Range: "B"
	   A-IM: range, vcdiff
	   Range: bytes=900-
    
    As meaning "if B is still the current version, send me bytes 900-... of
    B and you may vcdiff (with bytes 900-... of A) it if you like." This
    seems sensible.
    
    e.g., following the example in section 5.7
    
    if Tcur = "A" -> server replies with 304 (not modified)
    
    if Tcur = "B" --> server replies with 266 (im used) + an "IM:
    range,vcdiff" response header, and a message body
    including the vcdiff(A[900-], B[900-]);
    {If the server doesn't understand IM, the right thing still happens, I
    think} {The server, of course, is still welcome to ignore the vcdiff
    specification and just send the raw range}
    
    if Tcur = "C" --> send VCDIFF(A, C)
    
    Is the intention to avoid the need to have delta encoding algorithms
    be defined for encoding ranges of files against each other?

    I don't understand why it is particularly more demanding to expect
    clients to be able to run
     VUNDIF(SELECT(A, 900, END),
	    VDIFF(SELECT(A, 900, END), SELECT(B, 900, END)))
    than to run
	    VUNDIF(A, VDIFF(A, B))
    But I could easily not be noticing a corner case.
    
    If there are particular codes for which this is a problem, it seems
    like this restriction should be part of the specification of the code
    not part of the specification for all codes.
    
    So it seems like this rule should be relaxed. Whether or not the rule
    is relaxed, it might be good to discuss this case in section 5.7
    before this rule appears in 10.5.2
    
In reviewing the mailing list log:
	ftp://ftp.digital.com/pub/DEC/WRL/mogul/http-delta-log.txt
I found some discussion on this point, but I think we may have been
using faulty logic.   Your example seems to show a valid interpretation
of the A-IM header.

I'm inclined to remove this restriction from 10.5.2.  Perhaps
we need a paragraph in 5.7 explaining how to interpret your
example.

-Jeff

From mogul@pa.dec.com  Wed Aug  2 15:18:47 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA04547; Wed, 2 Aug 2000 15:18:47 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA24625; Wed, 2 Aug 2000 15:18:47 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA12036; Wed, 2 Aug 2000 15:18:46 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200008022218.PAA12036@wera.pa.dec.com>
To: Mike Dahlin <dahlin@cs.utexas.edu>
Cc: http-delta@pa.dec.com
Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] 
In-Reply-To: Your message of "Thu, 20 Jul 2000 08:59:10 PDT."
             <200007201559.IAA23804@wera.pa.dec.com> 
Date: Wed, 02 Aug 2000 15:18:46 -0700
X-Mts: smtp


    10.5.2 (p31-32)
    I don't understand the last full paragraph on p31 ("The server's
    choice about whether...") and the Note: spanning p31-32.
    
    My best guess is that this is an explanation of the rule from the
    previous paragraph (that I discussed above). Even if so, I still don't
    understand the argument.
    
This was also discussed on the mailing list; the Note was
supposed to have captured the intention.  Perhaps a clearer
phrasing of the Note would be:

    Note: the intent of this requirement is to prevent the server from
    generating a delta-encoded response that the client can only decode
    by first applying an instance-manipulation encoding to its cached
    base instance.  One cannot assume that a client willing to decode a
    given instance-manipulation format (as indicated by its A-IM
    request header) is also able encode into that format.  A server
    implementor might wish to consider what the client would logically
    have in its cache, when deciding which instance-manipulations to
    apply prior to a delta-coding.

Consider this case: suppose that the client sends an initial
request:
	GET /foo.html HTTP/1.1
	Host: example.com
and gets back
	HTTP/1.1 200 OK
	Etag: "A"
Then the client sends another request
	GET /foo.html HTTP/1.1
	Host: example.com
	If-None-Match: "A"
	A-IM: diffe,gzip,vcdiff
One plausible interpretation of this A-IM is that the client is
willing to accept a response with either:
	Etag: "B"
	IM: diffe, gzip
or
	Etag: "B"
	IM: vcdiff
but this one:
	Etag: "B"
	IM: gzip,vcdiff
would require the client to compute GZIP(A) before it could
decode the delta.  This violates the rule you're referring to:
   The server's choice about whether to apply an instance-manipulation
   SHOULD be independent of its choice to apply any subsequent
   two-input instance-manipulations, to the response.
because it didn't apply gzip to /foo.html except in the case
where it subsequently applied vcdiff.

Going through my logs of private email with Daniel Hellerstein,
we worked through a number of corner cases before coming up
with this phrasing of the spec.  I can probably reconstruct
the entire argument, but it would take more time than I have
before my next meeting :-)

-Jeff

From douglis@research.att.com  Wed Aug  2 17:31:51 2000
Return-Path: <douglis@research.att.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id RAA09139; Wed, 2 Aug 2000 17:31:51 -0700 (PDT)
Received: from zmamail01.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA19970; Wed, 2 Aug 2000 17:31:50 -0700
Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345)
	id 78E94275A; Wed,  2 Aug 2000 20:31:50 -0400 (EDT)
Received: from mail-green.research.att.com (H-135-207-30-103.research.att.com [135.207.30.103])
	by zmamail01.zma.compaq.com (Postfix) with ESMTP id 67BA32556
	for <http-delta@pa.dec.com>; Wed,  2 Aug 2000 20:31:50 -0400 (EDT)
Received: from alliance.research.att.com (alliance.research.att.com [135.207.26.26])
	by mail-green.research.att.com (Postfix) with ESMTP
	id 1C3E01E037; Wed,  2 Aug 2000 20:31:50 -0400 (EDT)
Received: from windsor.research.att.com (windsor.research.att.com [135.207.26.46])
	by alliance.research.att.com (8.8.7/8.8.7) with ESMTP id UAA28455;
	Wed, 2 Aug 2000 20:31:48 -0400 (EDT)
Received: from windsor.research.att.com (localhost [127.0.0.1])
	by windsor.research.att.com (8.8.8+Sun/8.8.5) with ESMTP id UAA15961;
	Wed, 2 Aug 2000 20:31:47 -0400 (EDT)
Message-Id: <200008030031.UAA15961@windsor.research.att.com>
From: Fred Douglis <douglis@research.att.com>
To: Jim Whitehead <ejw@ics.uci.edu>
Cc: ietf-dav-versioning@w3.org, http-delta@pa.dec.com
Subject: Re: Delta Encoding in HTTP 
In-Reply-To: Your message of "Fri, 21 Jul 2000 10:28:35 PDT."
             <NDBBIKLAGLCOPGKGADOJIEHEDHAA.ejw@ics.uci.edu> 
X-Uri: http://www.research.att.com/~douglis/
Date: Wed, 02 Aug 2000 20:31:47 -0400
Sender: douglis@research.att.com

>This is a feature that would need to interact well with DeltaV versioning
>services.

I fully agree, per my comments at the meeting today.

I think the ability to use stored versions as known base
versons for deltas would be a big plus.  

I'm copying the delta-encoding mailing list; hopefully there can be
some cross-fertilization here.

Fred

From mogul@pa.dec.com  Wed Aug  2 19:04:22 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id TAA18570; Wed, 2 Aug 2000 19:04:22 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA20943; Wed, 2 Aug 2000 19:04:22 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id TAA18089; Wed, 2 Aug 2000 19:04:21 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200008030204.TAA18089@wera.pa.dec.com>
To: Mike Dahlin <dahlin@cs.utexas.edu>
Cc: http-delta@pa.dec.com
Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] 
In-Reply-To: Your message of "Thu, 20 Jul 2000 08:59:10 PDT."
             <200007201559.IAA23804@wera.pa.dec.com> 
Date: Wed, 02 Aug 2000 19:04:21 -0700
X-Mts: smtp

    10.6 (p33)
    
    I don't understand the need for MUST-NOT rule 3 ("If the cache
    implementation is not aware of, or is not at least conditionally...")
    
    My understanding of the specification is that
    
       baseInstance * IM-Pipeline = currentInstance
    
    (Where "* IM-Pipeline" means apply the functions listed in IM: header
    in the order listed). E.g., the actual functions implemented by the
    IM-Pipeline can be opaque to the cache implementation. The
    specification says that the element in the cache must be tagged by
    baseInstance and by IM-Pipeline and that it will not return the
    element unless both match what the client is able to accept.
    
    I don't understand why the cache would need to understand a new
    encoding as long as it has the ability to only supply the new encoding
    to clients that understand it. It seems like the matching rules give
    it that ability.
    
    It seems like it would be desirable to allow caches to treat new
    algorithms as opaque functions but to still allow caching of the
    output of those functions.
    
    At the bottom of the subsection there is a note "Rule 3 allows for
    extending the set of instance-manipulations without causing deployed
    cache implementations to commit errors", but I should think that rules
    1 and 2 suffice to prevent errors.
    
    (Is the reason for rule 3 is to help support "new instance
    manipulations [that] may include additional caching rules to improve
    cache-hit rates in cognizant implementations" as per the Note? I don't
    see the need, but perhaps you have a particular optimization in mind?)

Consider what would have happened without rule #3 if we had
hypothetically standardized on the IM/A-IM mechanism before
delta encodings had been invented.  I.e., it was introduced
to deal with compression and ranges, and caches had been deployed
that followed rules #1 and #2 (matching simply the IM and A-IM
headers), but not rule #3.  Then we introduce new instance
manipulations for delta-encoding.  That would have made it
impossible to later add rules #4 and #5 (which are specific
to deltas and the "Delta-Base" header).

If you can think of a way to replace rules #4 and #5 with
more generic rules that accomplish the same thing (allowing
caching of deltas without allowing incorrect caching), and
if you could argue that these generic rules would work for
all yet-to-be-conceived instance manipulations, then we
could drop rule #3.  Otherwise, I think it's a conservative
approach that does allow future extensions to do aggressive
caching.

You also comment:
    10.6 (p. 34)
    Separating the Note about MUST-NOT rule 3 from the MUST-NOT list with
    another numbered list (range conditions list) is confusing.
    
Hmm.  I'll try to figure out a way to clarify this.  (Maybe
this rule, in particular, needs a name.)

-Jeff

From dahlin@cs.utexas.edu  Tue Aug  8 09:42:40 2000
Return-Path: <dahlin@cs.utexas.edu>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id JAA23774; Tue, 8 Aug 2000 09:42:40 -0700 (PDT)
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA18221; Tue, 8 Aug 2000 09:42:39 -0700
Received: from mail.cs.utexas.edu (mail.cs.utexas.edu [128.83.139.10])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id JAA17352;
	Tue, 8 Aug 2000 09:42:39 -0700 (PDT)
Received: from cs.utexas.edu (vaio.csres.utexas.edu [128.83.141.4])
	by mail.cs.utexas.edu (8.9.3/8.9.3) with ESMTP id LAA23478;
	Tue, 8 Aug 2000 11:38:53 -0500 (CDT)
Message-Id: <39903866.70383CAC@cs.utexas.edu>
Date: Tue, 08 Aug 2000 11:42:14 -0500
From: Mike Dahlin <dahlin@cs.utexas.edu>
X-Mailer: Mozilla 4.74 [en] (Win98; U)
X-Accept-Language: en
Mime-Version: 1.0
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: http-delta@pa.dec.com
Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] 10.5.3 
 p30
References: <200008022017.NAA02332@wera.pa.dec.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


 >
 >
 >
 >
 >
 >
 >
 >
 >
 >
 >
 >
 >     10.5.3 (p30)
 >     "If a request includes an A-IM header field that lists the 'range'
 >     instance-manipulation prior to any delta-coding(s), and the request
 >     also includes an If-Range: header that lists the entity tag of the
 >     current instance, the server SHOULD ignore the delta-codings."
 >
 >     At first glance, the need for the rule is not obvious.
 >
 > ... long example and discussion omitted ...
 >
 > I'm inclined to remove this restriction from 10.5.2.  Perhaps
 > we need a paragraph in 5.7 explaining how to interpret your
 > example.
 >

Yes. Given that the other case is discussed in so much detail, an
example of this case using the same structure would make a lot of
sense.

-mike





From dahlin@cs.utexas.edu  Tue Aug  8 09:42:44 2000
Return-Path: <dahlin@cs.utexas.edu>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id JAA01897; Tue, 8 Aug 2000 09:42:44 -0700 (PDT)
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA28114; Tue, 8 Aug 2000 09:42:44 -0700
Received: from mail.cs.utexas.edu (mail.cs.utexas.edu [128.83.139.10])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id JAA09400;
	Tue, 8 Aug 2000 09:42:43 -0700 (PDT)
Received: from cs.utexas.edu (vaio.csres.utexas.edu [128.83.141.4])
	by mail.cs.utexas.edu (8.9.3/8.9.3) with ESMTP id LAA23490;
	Tue, 8 Aug 2000 11:38:58 -0500 (CDT)
Message-Id: <3990394C.5B5DA2E9@cs.utexas.edu>
Date: Tue, 08 Aug 2000 11:46:04 -0500
From: Mike Dahlin <dahlin@cs.utexas.edu>
X-Mailer: Mozilla 4.74 [en] (Win98; U)
X-Accept-Language: en
Mime-Version: 1.0
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: http-delta@pa.dec.com
Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] 10.6 
 p33-34
References: <200008030204.TAA18089@wera.pa.dec.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

 >     10.6 (p33)
 >
 >     I don't understand the need for MUST-NOT rule 3 ("If the cache
 >     implementation is not aware of, or is not at least
 >     conditionally...")
 > ...
 > ....  Otherwise, I think it's a conservative
 > approach that does allow future extensions to do aggressive
 > caching.
 >

Right. Conservative engineering good.

The only alternative I can see is to create a type system (which you
are on the verge of with "one-input" and "two-input"
manipulations) and to tag each manipulation with its type (then the
rule becomes "if I see I type I don't understand,...".) I doubt there
is much enthusiasm for this. Giving up a bit of performance as new
extensions are deployed for simplicity seems reasonable (unless people
have a sense that there are dozens of people itching to add new
extensions for current types...).


 > You also comment:
 >     10.6 (p. 34)
 >     Separating the Note about MUST-NOT rule 3 from the MUST-NOT list with
 >     another numbered list (range conditions list) is confusing.
 >
 > Hmm.  I'll try to figure out a way to clarify this.  (Maybe
 > this rule, in particular, needs a name.)
 >

I think you could just move the note directly below rule 3 (and indent
it as a continuation of rule 3. e.g.,

   1. lsjfdlsj

   2. sldjfljdf

   3. If the cache implementation is not aware of, or is not at least
      conditionally compliant with...

      Note: Rule #3 allows for..

   4. slkjdflsjf


-mike





From dahlin@cs.utexas.edu  Tue Aug  8 09:42:41 2000
Return-Path: <dahlin@cs.utexas.edu>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id JAA12586; Tue, 8 Aug 2000 09:42:41 -0700 (PDT)
Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA28192; Tue, 8 Aug 2000 09:42:41 -0700
Received: from mail.cs.utexas.edu (mail.cs.utexas.edu [128.83.139.10])
	by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id JAA08604;
	Tue, 8 Aug 2000 09:42:40 -0700 (PDT)
Received: from cs.utexas.edu (vaio.csres.utexas.edu [128.83.141.4])
	by mail.cs.utexas.edu (8.9.3/8.9.3) with ESMTP id LAA23482;
	Tue, 8 Aug 2000 11:38:54 -0500 (CDT)
Message-Id: <39903898.B1F7CB65@cs.utexas.edu>
Date: Tue, 08 Aug 2000 11:43:04 -0500
From: Mike Dahlin <dahlin@cs.utexas.edu>
X-Mailer: Mozilla 4.74 [en] (Win98; U)
X-Accept-Language: en
Mime-Version: 1.0
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: http-delta@pa.dec.com
Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] 10.5.2 
 p29
References: <200008020141.SAA06662@wera.pa.dec.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

 >
 >
 >     Comments on draft-mogul-http-delta-04.txt
 >
 >     10.5.2 (p 29)
 >     ...
 >
 >     In particular, it may not be immediately obvious how the case of
 >
 >        IM: range, vcdiff
 >
 >     would be decoded if there are multiple ranges (how does the
decoder
 >     know the length of the pieces to know when one range stops and
the
 >     next begins after a vcdiff encoding?)
 >
 >     ...
 >
 > Perhaps it would be sufficient to add a cross-reference from
 > section 10.5.2 to 10.10.

After reading it and getting confused and unconfused a few times, I
would suggest the following small addition that would have, I think,
clarify things for me a lot. (Hopefully, I ended in the unconfused
state and this actually makes sense.)

   Current (10.5.2, p 30):
   As a special case, if the instance-manipulations include both range
   selection and at least one other non-identity instance-manipulation,
   the IM header field MUST be used to indicate the order in which all
   of these instance-manipulations, including range selection, were
   applied.  If the IM header lists the "range" instance-manipulation,
   the response MUST include either a Content-Range header or a
   multipart/byteranges Content-Type.


   Proposed:
   As a special case,
   ...
   the response MUST include either a Content-Range header or a
   multipart/byteranges Content-Type in which each part contains a
   Content-Range header.

This requirement is implied by appendix 19.2 of RFC2616 and by the
statement from 2616 "When a client requests multiple byte-ranges in
one request, the server SHOULD return them in the order that they
appeared in the request." But making it explicit here I think would
have helped me see what was going on.

I think the original wording may have confused me because it seemed to
imply that putting Content-Range for the pieces was not a requirement
for a multipart/byteranges reply.

 >     
 >     It might be helpful to see examples of how the Range:,
 >     Content-length:, and RFC2046 delimiters fit together for IM:
 >     range,vcdiff (and maybe IM: vcdiff,range) cases with multiple
ranges
 >     returned.
 > 
 > If the text in 10.10 doesn't seem clear enough, I could add
 > that, but it would be a fairly lengthy example.
 > 

This was a wording/clarity issue only, and I suspect that a long
example would be as likely to confuse as illuminate...

-mike



From dahlin@cs.utexas.edu  Tue Aug  8 09:42:43 2000
Return-Path: <dahlin@cs.utexas.edu>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id JAA02871; Tue, 8 Aug 2000 09:42:43 -0700 (PDT)
Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA30200; Tue, 8 Aug 2000 09:42:42 -0700
Received: from mail.cs.utexas.edu (mail.cs.utexas.edu [128.83.139.10])
	by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id JAA12144;
	Tue, 8 Aug 2000 09:42:42 -0700 (PDT)
Received: from cs.utexas.edu (vaio.csres.utexas.edu [128.83.141.4])
	by mail.cs.utexas.edu (8.9.3/8.9.3) with ESMTP id LAA23486;
	Tue, 8 Aug 2000 11:38:56 -0500 (CDT)
Message-Id: <399038FF.69D6E08A@cs.utexas.edu>
Date: Tue, 08 Aug 2000 11:44:47 -0500
From: Mike Dahlin <dahlin@cs.utexas.edu>
X-Mailer: Mozilla 4.74 [en] (Win98; U)
X-Accept-Language: en
Mime-Version: 1.0
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: http-delta@pa.dec.com
Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] 10.5.2 
 p31-32
References: <200008022218.PAA12036@wera.pa.dec.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


 >
 >     10.5.2 (p31-32)
 >     I don't understand the last full paragraph on p31 ("The server's
 >     choice about whether...") and the Note: spanning p31-32.
 >
 >     ...
 >
 > This was also discussed on the mailing list; the Note was
 > supposed to have captured the intention.  Perhaps a clearer
 > phrasing of the Note would be:
 >
 >     Note: the intent of this requirement is to prevent the server from
 >     generating a delta-encoded response that the client can only decode
 >     by first applying an instance-manipulation encoding to its cached
 >     base instance.  One cannot assume that a client willing to decode a
 >     given instance-manipulation format (as indicated by its A-IM
 >     request header) is also able encode into that format.  A server
 >     implementor might wish to consider what the client would logically
 >     have in its cache, when deciding which instance-manipulations to
 >     apply prior to a delta-coding.
 >

That helps me understand the Note.

 >
 > Going through my logs of private email with Daniel Hellerstein,
 > we worked through a number of corner cases before coming up
 > with this phrasing of the spec.  I can probably reconstruct
 > the entire argument, but it would take more time than I have
 > before my next meeting :-)
 >

Understood.

But the wording ("Server's choice...SHOULD be independent...")still
confuses me, I'm afraid. (Since I jumped in late, I don't want to make
you reconstruct a settled argument. But if I were implementing the
system, I might still be confused about what requirement I'm supposed
to meet here.)

Is this rule trying to restrict what a server can/should send a
client? It seems like that is the intent, but I'm not sure I would
interpret the words that way.  The notion "choose independently"
doesn't seem to restrict what choices are legal. This wording seems to
imply that a server MAY choose to do a 1-instance manipulation before
a 2-instance manipulation or MAY choose not to (since it makes its
choices independently, either possibility could arise, right?). So
this requirement doesn't seem to change what could go over the wire
and what the client has to be ready to accept. (So, we're basically
back to asking the server to "do something sensible"?)


In Jeff's example,
 >
 > ...client sends...
 >    GET /foo.html HTTP/1.1
 >    Host: example.com
 >    If-None-Match: "A"
 >    A-IM: diffe,gzip,vcdiff
 > ...
 > One Plausible interpretation...client is willing to accept
 > either:
 >     Etag: "B"
 >     IM: diffe,gzip
 > or
 >     Etag: "B"
 >     IM: vcdiff
 > but this one:
 >     Etag: "B"
 >     IM: gzip,vcdiff
 > would require the client to computer GZIP(A) before it could decode
 > the delta. This violates the rule you're referring to:
 >     The server's choice about whether to apply an instance
 >     manipulation SHOULD be independent of its choice to apply any
 >     subsequent two-input instance-manipulations, to the response.
 > because it didn't apply gzip to /foo.html except in the case where it
 > subsequently applied vcdiff

I don't see how this violates the rule as stated. The server can and
does choose to apply gzip in cases when it doesn't apply vcdiff
("IM: gzip", "IM: diffe, gzip"). The "problem" example above seems
exactly to be a case of the server making its choice to apply gzip
*independently* from its later choice to apply vcdiff.


I tried to work through what I now understand this rule to be trying
to say (based on the Note more than the rule) and ended up with a
fairly long example of (maybe) a way to say it with a different
rule. Since I have missed the earlier discussions that created "this
phrasing of the spec", this could well end up being more useful as an
illustration of what exactly I missed in understanding the current
spec (so that you guys can figure out the sentence that needs to be
added to fix it) than as a replacement. Again: I'm not even sure I
understand the intent of the original rule, so I could be WAY off
base. I would suggest first reading this at a high-level with the hope
of getting an "Ah, I see where he got confused and I see what we can
add to the current paragraph to fix it" rather than reading
this at a detailed level and generating a critique of whether it
covers all of the corner cases or not. (But on the other hand, if it
does make sense on a first pass, I'm happy to help talk through the
details.)



OK. Really venturing to where I shouldn't be going without having
been in on the earlier discussion ...
I don't suppose you could get the same effect by pushing the onus back
on the client? E.g., if a client says
   If-None-Match: "A"
   A-IM: OI1,TI,OI2

(Where "OIx" represents a "one-input" manipulation and TI represents a
"two-input" manipulation.)
The client implies that it has cached or can generate
    A
    OI1(A)
Since the server can send back any of
    B
    OI2(B)
    TI(A, B) --> client needs A
    OI1(B)
    OI2(TI(A,B)) --> client needs A
    TI(OI1(A), OI1(B)) --> cient needs OI1(A)
    OI2(TI(OI1(A), OI1(B))) --> client needs OI2(A)

Suppose the client doesn't want to be required to generate OI1(A). It
could have sent
    A-IM: TI,OI1,OI2
Which removes
    XX TI(OI1(A), OI1(B))
    XX OI2(TI(OI1(A), OI1(B)))
as legal replies but adds
    OI1(TI(A, B))
    OI2(OI1(TI(A, B)))

The rule would be something like "A server MAY choose to apply any
subset of instance manipulations specified by the client {choose to
apply them independently?} but MUST apply them in the order listed. A
client SHOULD NOT {MUST NOT?} list a 'one-input' manipulation
before a 'two-input' manipulation unless the client is
prepared to provide as input to that two input manipulation the result
of the one-input manipulation operating on any base instance listed
in the If-None-Match header. E.g., if a client lists instance 'A' in
an If-None-Match header and lists one-input manipulation OI before
two-input manipulation TI in the A-IM header, the client implies that
it can provide either A or OI(A) as input to TI from its cache or by
applying OI to a cached instance of A."

Following this rule, in Jeff's example, the client should have said:
    GET /foo.html HTTP/1.1
    Host: example.com
    If-None-Match: "A"
    A-IM: diffe,vcdiff,gzip
or said
    A-IM: vcdiff,diffe,gzip
if it doesn't want to be on the hook for generating gzip(A)


Looking at the mailing list discussion (Thu May 18 17:03:33 2000),
Jeff writes:
 %>   ...
 %>     I was going to say that we could add a sentence:
 %>       However the server  may choose to use only a subset the listed A-IM
 %>       manipulations, so long as they are applied in the order listed in
 %>       the A-IM request header.
 %>
 %>     But is this true -- suppose we have
 %>       A-IM: diff,gzip,range
 %>     say, because the client wants just the range of a prior "diff,gzip'ed"

 %>     response. If the server choosed to use
 %>      IM: diff,range
 %>     the result probably is NOT helpful to the client.
 %>
 %>     I'm not sure what this implies; that a trailing range means "don't use

 %>     range unless you use all the preceding manipulations"????
 %>
 %> Upon analysis, I think we've decided that this particular case
 %> isn't a disaster.  However, during this analysis, we realized
 %> that there is a problem if the server isn't consistent about
 %> what instance-manipulations it applies prior to computing a delta.
 %>
 %> Here's a proposed solution (inserted in section 10.5.3 just
 %> before the Examples): ... {The current wording of the rule}

Hm. This case is a problem. My proposed solution will break (but I
don't see that the original rule made it clear how to fix this
either.)

It seems like the straw man idea in Jeff's May e-mail was right. A
range is a peculiar beast since it specifies start and end coordinates
that only make sense for a particular encoding of the data. I don't
see that anything weaker than that works. I would be inclined to use a
slightly more general rule: a range IM (trailing or not) means "don't
use range unless you use all the preceding manipulations".

"If a server applies a range manipulation, it MUST also apply all
manipulations listed before the range manipulation in the client's
A-IM header."

This allows:
    A-IM: diff,range,gzip --> "IM: diff,range", "IM: diff,range,gzip",
         "IM: diff,gzip", "IM: gzip", "IM: diff"
    A-IM: diff,gzip,range --> "IM: diff,gzip,range", "IM: diff,gzip",
         "IM: diff", "IM: gzip"
All of which are reasonably useful to the client

The downside is that a client cannot say
    A-IM: vcdiff,diff,gzip,range

Since the server would have to apply both vcdiff and diff which makes
no sense. So, if a client says "range", it gives up some flexibilty on
what else it can say. Reasonable compromise?

"A client SHOULD NOT list two two-input manipulations before a range
manipulation in an A-IM header. A server receiving such a header
SHOULD ignore the range manipulation."

(Corner case: a client could still say A-IM: diff,gzip,range,vcdiff as
long as it can handle either gzip(A) or A as an input to vcdiff).


-mike





From mogul  Tue Aug 15 16:20:04 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA15885; Tue, 15 Aug 2000 16:20:04 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200008152320.QAA15885@wera.pa.dec.com>
To: http-delta
Subject: ordering of range and delta-codings when If-Range is used
In-reply-to: Your message of "Wed, 02 Aug 2000 13:17:35 PDT."
             <200008022017.NAA02332@wera.pa.dec.com> 
Date: Tue, 15 Aug 2000 16:20:04 -0700
X-Mts: smtp

Mike Dahlin wrote:

    10.5.3 (p30)
    "If a request includes an A-IM header field that lists the 'range'
    instance-manipulation prior to any delta-coding(s), and the request
    also includes an If-Range: header that lists the entity tag of the
    current instance, the server SHOULD ignore the delta-codings."
    
    At first glance, the need for the rule is not obvious. It seems like a
    server could interpret
	   GET /foo.html HTTP/1.1
	   host: bar.example.net
	   If-None-Match: "A"
	   If-Range: "B"
	   A-IM: range, vcdiff
	   Range: bytes=900-
    
    As meaning "if B is still the current version, send me bytes 900-... of
    B and you may vcdiff (with bytes 900-... of A) it if you like." This
    seems sensible.
    
    e.g., following the example in section 5.7
    
    if Tcur = "A" -> server replies with 304 (not modified)
    
    if Tcur = "B" --> server replies with 266 (im used) + an "IM:
    range,vcdiff" response header, and a message body
    including the vcdiff(A[900-], B[900-]);
    {If the server doesn't understand IM, the right thing still happens, I
    think} {The server, of course, is still welcome to ignore the vcdiff
    specification and just send the raw range}
    
    if Tcur = "C" --> send VCDIFF(A, C)
    
I wrote:
    I'm inclined to remove this restriction from 10.5.2.

That should have been "10.5.3" - it's now removed.

    Perhaps we need a paragraph in 5.7 explaining how to interpret your
    example.

I came up with the following:

   On the other hand, suppose that the client has a cache entry for the
   "A" instance of http://bar.example.net/foo.html, and it has already
   received the first 900 bytes of a new instance "B" (perhaps as the
   result of an aborted transfer).  Now the client wants to receive the
   entire current instance, so it could send this request:

      GET /foo.html HTTP/1.1
      host: bar.example.net
      If-None-Match: "A"
      If-Range: "B"
      A-IM: range,vcdiff
      Range: bytes=900-

   In this example, as in the previous example, if Tcur = "A" then the
   server should send 304 (Not Modified), and if Tcur = "C", then the
   server should send the entire new instance, either as a 200 response
   or as a delta-encoding against instance "A".

   However, if Tcur = "B", in this case the server should first select
   the specified range (bytes 900 through the end) from both instances
   "A" and "B", then compute the delta encoding between these ranges
   (using vcdiff), and then transmit the result using a 226 (IM Used)
   response with an "IM:range,vcdiff" response header.

I think this might be a bit contrived, but I suppose it's now a
valid interpretation of the spec for a client to send this
request, so it's probably worth explaining how the server
responds to it.

This leaves us in the situation where a client that has received
a truncated response can try to fill in the gap using either
	A-IM: range,vcdiff
or
	A-IM: vcdiff,range
and it's not clear how to advise a client implementor which
approach is "better" (if one is a better approach in general).
But we probably don't want to get into that rat-hole, especially
lacking any experience that people would in fact do either 
in practice :-)

-Jeff

From mogul  Tue Aug 15 16:24:10 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA14739; Tue, 15 Aug 2000 16:24:10 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200008152324.QAA14739@wera.pa.dec.com>
To: http-delta
Subject: Responses to Mike Dahlin's comments
Date: Tue, 15 Aug 2000 16:24:10 -0700
X-Mts: smtp

As a result of Mike Dahlin's comments, I have made the following
changes (mostly in the presentation, not the actual spec.):

   Moved a Note in section 10.6 to make it clear what it applies to.

   Added another example in 5.7 for a combination of range and delta.

   Added some clarification in section 10.5.2.

   Removed (section 10.5.3) a restriction on the ordering of "range" and
   delta-codings in the A-IM header.

I'm still wrestling with Mike's lengthy comments regarding
two-input instance manipulations.

I know of no other pending issues, and so if I can figure
out how to deal with the "two-input" issue (or if I decide
to give up on that for now), I'm about ready to issue
a draft-06 version of the delta spec - hopefully, this one
can be used for a Last Call and then submitted to the
IESG as a Proposed Standard.

-Jeff

From mogul  Thu Aug 17 16:56:00 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA11565; Thu, 17 Aug 2000 16:56:00 -0700 (PDT)
Message-Id: <200008172356.QAA11565@wera.pa.dec.com>
To: http-delta
From: Issac Goldstand <issac@mail.jct.ac.il>
Reply-To: neoi@writeme.com
Organization: Jerusalem College of Technology
X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.12-20 i686)
X-Accept-Language: en
Mime-Version: 1.0
Subject: Re: ordering of range and delta-codings when If-Range is used
References: <200008152320.QAA15885@wera.pa.dec.com>
Content-Type: text/plain; charset=iso-8859-7
Content-Transfer-Encoding: 7bit
Date: Thu, 17 Aug 2000 16:56:00 -0700
Sender: mogul
X-Mts: smtp

[Retransmitted by Jeff on Issac's request]

Jeffrey Mogul wrote:

> Mike Dahlin wrote:
>
>     I came up with the following:
>
>    On the other hand, suppose that the client has a cache entry for the
>    "A" instance of http://bar.example.net/foo.html, and it has already
>    received the first 900 bytes of a new instance "B" (perhaps as the
>    result of an aborted transfer).  Now the client wants to receive the
>    entire current instance, so it could send this request:
>
>       GET /foo.html HTTP/1.1
>       host: bar.example.net
>       If-None-Match: "A"
>       If-Range: "B"
>       A-IM: range,vcdiff
>       Range: bytes=900-
>
>    In this example, as in the previous example, if Tcur = "A" then the
>    server should send 304 (Not Modified), and if Tcur = "C", then the
>    server should send the entire new instance, either as a 200 response
>    or as a delta-encoding against instance "A".
>
>    However, if Tcur = "B", in this case the server should first select
>    the specified range (bytes 900 through the end) from both instances
>    "A" and "B", then compute the delta encoding between these ranges
>    (using vcdiff), and then transmit the result using a 226 (IM Used)
>    response with an "IM:range,vcdiff" response header.

That sounds like you're saying that the server should get the
range of the original data ("A" or "B"), select range 900-
from both and calculate the delta and return it.  However, with
deltas it would be extremely difficult for the client to decode
the delta and know exactly how much of the decoded data it
was missing.  In the case above, if TCur="B", the way you
wrote it above should actually not do what you wrote in that
explanitary paragraph, but rather should run vdiff("A","B")
take range 900- of the output of THAT and return a 226.

   Issac

--
Internet is a wonderful mechanism for making a fool of
yourself in front of a very large audience.
  --Anonymous

Moving the mouse won't get you into trouble...  Clicking it might.
  --Anonymous

From mogul@pa.dec.com  Thu Aug 17 18:26:45 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA11641; Thu, 17 Aug 2000 18:26:45 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA03258; Thu, 17 Aug 2000 18:26:45 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA12964; Thu, 17 Aug 2000 18:26:45 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200008180126.SAA12964@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: ordering of range and delta-codings when If-Range is used 
In-Reply-To: Your message of "Thu, 17 Aug 2000 16:56:00 PDT."
             <200008172356.QAA11565@wera.pa.dec.com> 
Date: Thu, 17 Aug 2000 18:26:45 -0700
X-Mts: smtp

Issac Goldstand <issac@mail.jct.ac.il> writes:

>    On the other hand, suppose that the client has a cache entry for the
>    "A" instance of http://bar.example.net/foo.html, and it has already
>    received the first 900 bytes of a new instance "B" (perhaps as the
>    result of an aborted transfer).  Now the client wants to receive the
>    entire current instance, so it could send this request:
>
>       GET /foo.html HTTP/1.1
>       host: bar.example.net
>       If-None-Match: "A"
>       If-Range: "B"
>       A-IM: range,vcdiff
>       Range: bytes=900-
>
>    In this example, as in the previous example, if Tcur = "A" then the
>    server should send 304 (Not Modified), and if Tcur = "C", then the
>    server should send the entire new instance, either as a 200 response
>    or as a delta-encoding against instance "A".
>
>    However, if Tcur = "B", in this case the server should first select
>    the specified range (bytes 900 through the end) from both instances
>    "A" and "B", then compute the delta encoding between these ranges
>    (using vcdiff), and then transmit the result using a 226 (IM Used)
>    response with an "IM:range,vcdiff" response header.

    That sounds like you're saying that the server should get the
    range of the original data ("A" or "B"), select range 900-
    from both and calculate the delta and return it.  However, with
    deltas it would be extremely difficult for the client to decode
    the delta and know exactly how much of the decoded data it
    was missing.  In the case above, if TCur="B", the way you
    wrote it above should actually not do what you wrote in that
    explanitary paragraph, but rather should run vdiff("A","B")
    take range 900- of the output of THAT and return a 226.

No, your description of how it should actually work describes
the previous example, where the A-IM header is
	A-IM: vcdiff, range
Remember, the server isn't allowed to do things except in the
order specified by the A-IM header!

I would agree that the example that Mike Dahlin proposed (with
"A-IM: range,vcdiff") might be somewhat unlikely, but I think
you're missing the point.  The client already knows how much
of instance "B" it has (presumably, it has at least the first
900 bytes of "B"), and it presumably has all of "A", so it shouldn't
be any hard to apply a delta to a suffix of "A" than to the whole
"A" instance.

In summary, a client that needs to recover from a truncated
response and wants to use deltas and ranges does have a choice
between 
	A-IM: vcdiff, range
and
	A-IM: range, vcdiff
Once the *client* has made that choice, the *server* cannot
violate it.  The server *can* choose to apply none of those
ims, either one of them, or both, but if it does apply both,
it must do so in the client's chosen order.

-Jeff

From mogul  Thu Aug 17 19:16:49 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id TAA10669; Thu, 17 Aug 2000 19:16:48 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200008180216.TAA10669@wera.pa.dec.com>
To: http-delta
Subject: two-input instance-manipulations
In-reply-to: Your message of "Tue, 08 Aug 2000 11:44:47 CDT."
             <399038FF.69D6E08A@cs.utexas.edu> 
Date: Thu, 17 Aug 2000 19:16:48 -0700
X-Mts: smtp

Mike Dahlin <dahlin@cs.utexas.edu> writes, regarding this part
of the spec (in section 10.5.3):

   The server's choice about whether to apply an instance-manipulation
   SHOULD be independent of its choice to apply any subsequent two-input
   instance-manipulations to the response.  (Two-input
   instance-manipulations include delta-codings, because they take two
   different values as input.  Compression and "range"
   instance-manipulations take only one input.  Other
   instance-manipulations may be defined in the future.)

      Note: the intent of this requirement is to prevent the server
      from generating a delta-encoded response that the client can
      only decode by first applying an instance-manipulation encoding
      to its cached base instance.  A server implementor might wish
      to consider what the client would logically have in its cache,
      when deciding which instance-manipulations to apply prior to a
      delta-coding.

Mike writes:

    Is this rule trying to restrict what a server can/should send a
    client? It seems like that is the intent, but I'm not sure I would
    interpret the words that way.  The notion "choose independently"
    doesn't seem to restrict what choices are legal. This wording seems to
    imply that a server MAY choose to do a 1-instance manipulation before
    a 2-instance manipulation or MAY choose not to (since it makes its
    choices independently, either possibility could arise, right?). So
    this requirement doesn't seem to change what could go over the wire
    and what the client has to be ready to accept. (So, we're basically
    back to asking the server to "do something sensible"?)
    
Perhaps the word "independently" is too fuzzy.  How about if
I reword it as:

   The server SHOULD NOT apply a sequence of instance manipulations
   IM(1), IM(2), ..., IM(n) in a response if this sequence would
   require the client to encode its cache copy of a base instance using
   IM(j) before it could decode the server's subsequent application of
   IM(k).  In particular, if the server would not have applied IM(j)
   without applying IM(k), and if IM(k) is a two-input instance
   manipulation, then the server SHOULD NOT apply IM(j) followed
   (whether immediately or not) by IM(k).  (Two-input instance
   manipulations include delta-codings, because they take two different
   values as input.  Compression and "range" instance manipulations
   take only one input.  Other instance manipulations may be defined in
   the future.)
    
Does that make sense?  The sentences are a little complex, but
I think they are parsable :-).

After a lot of examples, Mike also writes:

    Since the server would have to apply both vcdiff and diff which
    makes no sense. So, if a client says "range", it gives up some
    flexibilty on what else it can say. Reasonable compromise?

    "A client SHOULD NOT list two two-input manipulations before a
    range manipulation in an A-IM header. A server receiving such a
    header SHOULD ignore the range manipulation."

    (Corner case: a client could still say A-IM: diff,gzip,range,vcdiff
    as long as it can handle either gzip(A) or A as an input to
    vcdiff).
    
I don't think this is sufficient to solve the problem we were
concerned with.  Consider this series of requests and responses
(which might not be the most uncontrived example, but I'm in
a hurry to get home for dinner):

    Client sends
	GET /foo.html HTTP/1.1
	host: example.com
	A-IM: gzip

    Server replies
	HTTP/1.1 200 OK
	ETag: "A"
	Date: Tue, 25 Nov 1997 18:30:05 GMT

I.e., the server decided for some reason not to use compression
as an instance manipulation.

Then the client wants to update its cache entry:
	GET /foo.html HTTP/1.1
	If-None-Match: "A"
	host: example.com
	A-IM: diffe,gzip,vcdiff

The server could reply
	HTTP/1.1 226 IM Used
	ETag: "B"
	Delta-base: "A"
	Date: Tue, 25 Nov 1997 18:30:05 GMT
	IM: diffe, gzip

which would be OK.

but suppose the server instead sends:
	HTTP/1.1 226 IM Used
	ETag: "B"
	Delta-base: "A"
	Date: Tue, 25 Nov 1997 18:30:05 GMT
	IM: gzip,vcdiff

Now the client needs to generate GZIP(A) before it can
decode the body of the response.

I suppose we might be able to work out some language that
forces the client in this example to send
	A-IM: vcdiff,diffe,gzip
since, although this allows the nonsensical (vcdiff+gzip)
combination, this is one that the client really ought to
be able to decode.  Let me think about that.

-Jeff

From mogul@pa.dec.com  Fri Aug 18 11:44:23 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA06874; Fri, 18 Aug 2000 11:44:22 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA27658; Fri, 18 Aug 2000 11:44:22 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA05270; Fri, 18 Aug 2000 11:44:22 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200008181844.LAA05270@wera.pa.dec.com>
To: http-delta@pa.dec.com
Subject: Re: two-input instance-manipulations 
In-Reply-To: Your message of "Thu, 17 Aug 2000 19:16:48 PDT."
             <200008180216.TAA10669@wera.pa.dec.com> 
Date: Fri, 18 Aug 2000 11:44:22 -0700
X-Mts: smtp

Last night, I wrote:
    I suppose we might be able to work out some language that
    forces the client in this example to send
	    A-IM: vcdiff,diffe,gzip
    since, although this allows the nonsensical (vcdiff+gzip)
    combination, this is one that the client really ought to
    be able to decode.  Let me think about that.
    
As soon as I sent that and left work, I realized that a
generalization of this is probably the right approach - that
is, remove the requirement that the server figure out what
is useful to the client, and a requirement on the client
(implementor) to send an A-IM header that cannot lead to
a bad result.

That means
(1) REMOVE this from 10.5.3 (A-IM):
   The server's choice about whether to apply an instance-manipulation
   SHOULD be independent of its choice to apply any subsequent two-input
   instance-manipulations to the response.  (Two-input
   instance-manipulations include delta-codings, because they take two
   different values as input.  Compression and "range"
   instance-manipulations take only one input.  Other
   instance-manipulations may be defined in the future.)

along with the note that follows it.

(2) ADD something like this

   A client SHOULD NOT, in its A-IM header field, list a sequence
   of instance manipulations such that it would be unable to
   decode the result of any order-preserving sub-sequence of that
   sequence.

	Note: the intent of this requirement is to allow the
	server to apply any sequence of instance manipulations
	consistent with the A-IM header, without thereby sending
	a message that the client would be unable to decode.
	For example, if the client sends "A-IM: gzip, vcdiff"
	but does not currently have a compressed copy of the
	base instance in its cache, and is not able to apply
	the gzip algorithm to its cached base instance, then
	the server's choice to compress the inputs to vcdiff
	would result in a response the client could not decode.
	
	Warning: not all implementations of algorithms such
	as gzip will produce identical output for a given
	input, so even a client implementation equipped with
	a gzip encoder (for example) might not be able to
	exactly duplicate the server's gzip-encoding of an
	instance.

Thanks to Clifford Heath for pointing out that unfortunate
property of algorithms such as gzip.

I think this formulation of the requirement (putting the
onus on the client to ask only for things that it can
understand) simplifies the specification considerably.
(We no longer have to explictly discuss "two-input"
instance manipulations, or treat "range" specially.)

On the other hand, it does lead to some potentially interesting
implementation issues for the client - if it sends a long
list of IMs in its A-IM header, there are lots of possible
sub-sequences to worry about.  Presumably, the implementation
does not have to enumerate and test the entire set of
possible sub-sequences, but I haven't come up with a simple
decision algorithm.

Thanks to Mike Dahlin for prodding me in this direction.

And if anyone can find a counter-example (i.e., some problem
that this new rule doesn't solve), please speak up ASAP.

-Jeff

From mogul  Sat Aug 19 10:28:56 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA13966; Sat, 19 Aug 2000 10:28:56 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200008191728.KAA13966@wera.pa.dec.com>
To: http-delta
Subject: First draft of I-D on Clusters/Templates
Date: Sat, 19 Aug 2000 10:28:56 -0700
X-Mts: smtp

I've finished a first draft of an Internet-Draft on
"HTTP Delta Clusters and Templates", available for now as:

ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-dcluster-00.18aug2000.txt

Please review it ASAP if you have an interest; I'd like
to submit it to the IETF by the 25th, before I leave for
SIGCOMM.

Places where I'm sure we need attention are marked with "XXX".

Thanks
-Jeff

From mogul  Mon Aug 21 14:28:28 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA31928; Mon, 21 Aug 2000 14:28:28 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200008212128.OAA31928@wera.pa.dec.com>
To: http-delta
Subject: Another URL for preview draft of Clusters/Templates document
Date: Mon, 21 Aug 2000 14:28:28 -0700
X-Mts: smtp

Some people have had trouble with the ftp: URL I gave out the
other day.  It works for me, but instead please try:

http://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-dcluster-00.18aug2000.txt

-Jeff

From koen@win.tue.nl  Wed Aug 23 23:52:53 2000
Return-Path: <koen@win.tue.nl>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id XAA11858; Wed, 23 Aug 2000 23:52:53 -0700 (PDT)
Received: from ztxmail01.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA13246; Wed, 23 Aug 2000 23:52:52 -0700
Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345)
	id 70346245F; Thu, 24 Aug 2000 01:52:52 -0500 (CDT)
Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157])
	by ztxmail01.ztx.compaq.com (Postfix) with ESMTP
	id 5CA772523; Thu, 24 Aug 2000 01:52:51 -0500 (CDT)
Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3)
	  id IAA04763. Thu, 24 Aug 2000 08:52:49 +0200 (MET DST)
From: koen@win.tue.nl (Koen Holtman)
Message-Id: <200008240652.IAA04763@wsooti09.win.tue.nl>
Subject: Re: First draft of I-D on Clusters/Templates
In-Reply-To: <200008191728.KAA13966@wera.pa.dec.com> from Jeffrey Mogul at "Aug 19, 2000 10:28:56 am"
To: mogul@pa.dec.com (Jeffrey Mogul)
Date: Thu, 24 Aug 2000 08:52:49 +0200 (MET DST)
Cc: http-delta@pa.dec.com
X-Mailer: ELM [version 2.4ME+ PL43 (25)]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

>I've finished a first draft of an Internet-Draft on
>"HTTP Delta Clusters and Templates", available for now as:
>
>ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-dcluster-00.18aug2000.txt
>
>Please review it ASAP if you have an interest; I'd like
>to submit it to the IETF by the 25th, before I leave for
>SIGCOMM.
>
>Places where I'm sure we need attention are marked with "XXX".

Hi Jeff,

Because of time constraints on both sides I only did a very quick scan
of this draft.  Preliminary conclusion: I did not find any internal
inconsistencies but (and this should not surprise you, given the
discussion in we had April) the anti-spoofing requirements/mechanisms
described in this new draft are again much too weak for my taste.

The new (?) "hostport" spoofing detection mechanisms you describe rely
on having a trust relation that spans all content under a single
"hostport".  Such a relation won't always exist, as you mention too in
the 'security considerations', so these "hostport" mechanisms are not
strong enough for me.

Concerning this part of the draft:

# Therefore, a client MUST NOT use condition
#   #3 above (DCluster of a prior response for X includes prefix of
# Request-URI) unless it can securely verify that a resulting delta is
#   not spoofed.

I can't see right now if excluding condition 3 alone above gives a
watertight guarantee that further spoofing verification is not needed
-- is is possible that some interpretation of rule 4 also create a
leak.  I'm not sure, I need to stare at this more.

That is all I have for now.  I suggest you go ahead and submit the
current draft as a formal Internet-Draft.  I hope we'll see some
feedback on this list that indicates whether or not I am alone in
wanting stronger anti-spoofing measures.

>
>Thanks
>-Jeff

Koen.

From danielh@crosslink.net  Thu Aug 24 09:45:16 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id JAA13696; Thu, 24 Aug 2000 09:45:15 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from zmamail01.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA10711; Thu, 24 Aug 2000 09:45:15 -0700
Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345)
	id A182D228F; Thu, 24 Aug 2000 12:45:14 -0400 (EDT)
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by zmamail01.zma.compaq.com (Postfix) with ESMTP id 4FC9420C7
	for <http-delta@pa.dec.com>; Thu, 24 Aug 2000 12:45:14 -0400 (EDT)
Received: from danielh (z_a082.ers.usda.gov [151.121.64.82] (may be forged)) by lycanthrope.crosslink.net (8.9.3/) with SMTP id MAA20241 for <http-delta@pa.dec.com>; Thu, 24 Aug 2000 12:45:13 -0400
Message-Id: <200008241645.MAA20241@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Thu, 24 Aug 2000 12:14:02 -0300
To: http-delta@pa.dec.com
Subject: On assigning cached responses to a dcluster.
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.10a c10 

This somewhat pedantic outline is meant to clarify 
how one might implement Dcluster and DTemplate.  It is meant as an
experiment that might highlight possible misunderstandings.  Personally, I 
advocate including something like this outline in the rfc, but I won't insist 
on it.


------------------------------------------------------------
On assigning cached responses to a dcluster.

In my reading of draft-mogul-http-dcluster-00.txt, the means by which
items are associated with a Dcluster remained vague.  I think an outline
of a possible client-side algorithim might be useful. 
Hence, consider the following:


1) The client maintains a "DCluster-table".
   Each entry in this table is initialized by a DCluster 
   response header, and uses a "Dcluster-prefix" 
   as an identifier.

2) Each entry in this table contains a list of cached
   responses that may be used as a delta base. Each
   cached response consists of the body of a
   response (or a pointer to a cache containing the
   body of a response), and it's etag.
   
For example:
  A request:
      GET /foo?p=1 HTTP/1.1
      Host: bar.example.net
   yielding:
      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "abc"
      DCluster: "//bar.example.net/foo?"

Upon reciept of this response, the client could create a new Dcluster-table
entry that would be:
   a) identified using a Dcluster-prefix ("//bar.example.net/foo?")
and would contain:
   b) the response's content (or a pointer to a cached version of this content)
   c) the  etag ("abc")

3) For all future requests, the request-uri is compared
   against the Dcluster-prefixs in the Dcluster-table.
   If a Dcluster-prefix matches (that is, is a prefix match of) the 
   request-uri, then the client may use include the entry's etag(s) 
   in an If-None-Match: request header (implying that the client 
   can use the appropriate content to delta-decode a response from the 
   server).

The next point complicates the story.

4) Upon reciept of all requests, including requests that do NOT contain a
   Dcluster response header, the client may:
     a) Compare the request-uri to each entry in the Dcluster-table. 
     b) If a Dcluster-prefix matches the request-uri, then
        the content, and etag, of the current response are added to 
        the list of cached responses. 

Note that it is possible (though perhaps inefficient) to have multiple
matches -- which may occur when a Dcluster-prefix is a prefix
of another Dcluster-prefix.  
For example:
   //bar.example.net/foo?"   
is a prefix of 
   //bar.example.net/foo?action=quote"   

Consider the following example:

  A request:
      GET /foo?p=1 HTTP/1.1
      Host: bar.example.net
   yields a response:
      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "abc"
      DCluster: "//bar.example.net/foo?"

      Response to p=1

   followed by:
      GET /foo?r=1 HTTP/1.1
      Host: bar.example.net
   yielding:
      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "def"

      Second-response, r=1

   and lastly:
      GET /foo?s=1 HTTP/1.1
      Host: bar.example.net
   yielding:
      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "ghi"

      s=1 response

Upon recieving the first response, the client would create
a Dcluster-table entry with:
    Dcluster-prefix: "//bar.example.next/foo?"
    Etag(1)="abc"
    Content(1)=First-response
Upon reciept of the second and third responses, the client would modify the
above, yielding:
    Dcluster-prefix: "//bar.example.next/foo?"
    Etag(1)="abc"
    Etag(2)="def"
    Etag(3)="ghi"
    Content(1)=Response to p=1
    Content(2)=Second-response, r=1
    Content(3)=s=1 response

The client could then make a request:
      GET /foo?t=1 HTTP/1.1
      Host: bar.example.net
      If-None-Match:"abc","def","ghi"
      A-IM: diff-e

Note that the server, by specifying a Dcluster header in it's response
to /foo?p=1, is declaring that all subsequent responses to
request-uri's that match "foo?" will have unique etags.
That is, the server is establishing a uniqueness scope defined
by the "foo?" prefix.

DTemplate is supported in a similar fashion, using a DTemplate-table.

5) If a DTemplate is recieved, the client will (as soon as convenient) 
   request the  DTemplate, and store it's content (and it's etag) in the
   DTemplate-table.  Each entry in the Dtemplate-table is
   identified by a "DTemplate-URI" equal to the the request-URI
   (that is, a the identifier is NOT a prefix).

6) For all future requests, the request-uri is compared
   against the DTemplate-URI's in the DTemplate-table.
   If a DTemplate-URI matches (that is, is an exact match of) the 
   request-uri, then the client may use include the entry's etag 
   in an If-None-Match: request header.

Of cousre, the client may check both the Dcluster-table and the
Dtemplate-table; the presumption being that the Dtemplate-table
will yield better matches.

When both a DCluster and a DTemplate are recieved, then the client
should do steps 1, 2, 5, and 6. In addition, the etag and the
content of the response from the request for the DTemplate should
be added to the DCluster-table.

For example:
  A request:
      GET /foo?p=1 HTTP/1.1
      Host: bar.example.net
   yielding:
      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "abc"
      DCluster: "//bar.example.net/foo?"
      DTemplate: "http://bar.example.net/foo.tplt"

      This is my p=1 response
  
The client would create a DCluster-table entry of:
    Dcluster-prefix: "//bar.example.next/foo?"
    Etag(1)="abc"
    Content(1)=First-response

 The client would then request:
      GET /foo.tplt HTTP/1.1
      Host: bar.example.net
   yielding:
      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "tabc"

     This is a template

The client would create a DTemplate-table entry of:
    DTemplate-URI: "//bar.example.next/foo?"
    Etag(1)="abc"
    Content(1)=First-response
and would modify the DCluster-table, yielding:
    Dcluster-prefix: "//bar.example.next/foo?"
    Etag(1)="abc"
    Content(1)=First-response
    Etag(2)="tabc"
    Content(2)=This is a template

The client might also wish to mark the (2) entry as having priority -- so that
should the list for "//bar.example.next/foo?" become long and require trimming,
items associated with DTemplates will be preferentially retained.




 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From danielh@crosslink.net  Thu Aug 24 09:53:03 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id JAA25508; Thu, 24 Aug 2000 09:53:03 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from ztxmail01.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA32620; Thu, 24 Aug 2000 09:53:03 -0700
Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345)
	id B637E100D; Thu, 24 Aug 2000 11:53:02 -0500 (CDT)
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id 5FAA911F7
	for <http-delta@pa.dec.com>; Thu, 24 Aug 2000 11:53:02 -0500 (CDT)
Received: from danielh (z_a082.ers.usda.gov [151.121.64.82] (may be forged)) by lycanthrope.crosslink.net (8.9.3/) with SMTP id MAA22462 for <http-delta@pa.dec.com>; Thu, 24 Aug 2000 12:53:01 -0400
Message-Id: <200008241653.MAA22462@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Thu, 24 Aug 2000 12:52:08 -0300
To: http-delta@pa.dec.com
Subject: spoofing
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.10a c10 



Regarding dcluster and spoofing...

It seems that Jeff is placing great hope on instance-digests,  
wheras Koen is reluctant to depend on this extra info. 

I'm wondering if a compromise would be to define a  Delta-Uri
header, which the server can use to indicate the URI associated 
with a Delta-base; the server would only use this when this uri 
is NOT the same as the request-URI.  

For example:
  A request:
      GET /hello.html HTTP/1.1
      Host: bar.org
  yields:
      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "bcd"
      DCluster: "//foo.net/hello"
  And then,
      GET /hello.html HTTP/1.1
      Host: foo.net
  yields:
      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "fgh"
  
  Later...
      GET /hello.html HTTP/1.1
      Host: foo.net
      A-IM: vcdiff
      If-None-Match:"bcd","fgh"
 
   could yield:
      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "fgh2"
      Delta-base: "fgh"
    or
      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "bcd2"
      Delta-base: "bcd" 
      Delta-uri: "//bar.org/hello.html"

-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From koen@win.tue.nl  Fri Aug 25 11:33:14 2000
Return-Path: <koen@win.tue.nl>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA05133; Fri, 25 Aug 2000 11:33:13 -0700 (PDT)
Received: from zmamail01.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA06679; Fri, 25 Aug 2000 11:33:13 -0700
Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345)
	id 7D7E92196; Fri, 25 Aug 2000 14:33:12 -0400 (EDT)
Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157])
	by zmamail01.zma.compaq.com (Postfix) with ESMTP id 14EBA211A
	for <http-delta@pa.dec.com>; Fri, 25 Aug 2000 14:33:12 -0400 (EDT)
Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3)
	  id UAA06635. Fri, 25 Aug 2000 20:33:03 +0200 (MET DST)
From: koen@win.tue.nl (Koen Holtman)
Message-Id: <200008251833.UAA06635@wsooti09.win.tue.nl>
Subject: Re: spoofing
In-Reply-To: <200008241653.MAA22462@lycanthrope.crosslink.net> from "danielh@crosslink.net" at "Aug 24, 2000 12:52: 8 pm"
To: danielh@crosslink.net
Date: Fri, 25 Aug 2000 20:33:02 +0200 (MET DST)
Cc: http-delta@pa.dec.com
X-Mailer: ELM [version 2.4ME+ PL43 (25)]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

>
>
>Regarding dcluster and spoofing...
>
>It seems that Jeff is placing great hope on instance-digests,  
>wheras Koen is reluctant to depend on this extra info. 
[...]

Just a quick clarification here -- I don't have the time now to study
the rest of your message.

On instance digests: I believe that as a method of spoofing prevention
they are strong enough.  But I don't know if implementers would find
them too heavy to use -- feedback would be appreciated on this.

My security problem with the draft is that is does not currently
_require_ the use of a strong-enough spoofing prevention mechanism like
instance-digests.

Koen.

From danielh@crosslink.net  Fri Aug 25 13:07:42 2000
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA00558; Fri, 25 Aug 2000 13:07:42 -0700 (PDT)
From: <danielh@crosslink.net>
Received: from zmamail01.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA01010; Fri, 25 Aug 2000 13:07:42 -0700
Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345)
	id 3E571215B; Fri, 25 Aug 2000 16:07:40 -0400 (EDT)
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by zmamail01.zma.compaq.com (Postfix) with ESMTP id C5E3A2048
	for <http-delta@pa.dec.com>; Fri, 25 Aug 2000 16:07:39 -0400 (EDT)
Received: from danielh (z_a082.ers.usda.gov [151.121.64.82] (may be forged)) by lycanthrope.crosslink.net (8.9.3/) with SMTP id QAA05491 for <http-delta@pa.dec.com>; Fri, 25 Aug 2000 16:07:39 -0400
Message-Id: <200008252007.QAA05491@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Date: Fri, 25 Aug 2000 15:59:38 -0300
To: http-delta@pa.dec.com
In-Reply-To: <200008251833.UAA06635@wsooti09.win.tue.nl>
Subject: Re: spoofing
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.10a c10 

>>Regarding dcluster and spoofing...
>>
>>It seems that Jeff is placing great hope on instance-digests,  
>>wheras Koen is reluctant to depend on this extra info. 
>[...]

>Just a quick clarification here -- I don't have the time now to study the
>rest of your message.
>On instance digests: I believe that as a method of spoofing prevention
>they are strong enough.  But I don't know if implementers would find them
>too heavy to use -- feedback would be appreciated on this.

Basically that is what I meant. Considering that Content-MD5 response headers
are rare, it is likely that instance digests will be effected by
the same factors (such as the desire to avoid computation of a digest),
hence will also tend not to be computed. 

>My security problem with the draft is that is does not currently
>_require_ the use of a strong-enough spoofing prevention mechanism like
>instance-digests.

My 'umble proposal is meant to be a simple mechanism that SHOULD be used when
a base instance is not a prior instance of the request uri. 
Perhaps it's not as strong  or as elegant as instance-digests, 
but it's probably good enough (and cheap).

-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul  Fri Aug 25 15:33:12 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA26412; Fri, 25 Aug 2000 15:33:12 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200008252233.PAA26412@wera.pa.dec.com>
To: http-delta
Subject: Internet-Drafts have been submitted
Date: Fri, 25 Aug 2000 15:33:12 -0700
X-Mts: smtp

I submitted these two drafts to the Internet-Drafts editor yesterday:

	draft-mogul-http-delta-06.txt
	draft-mogul-http-dcluster-00.txt

I know they have been received (because the IETF folks pointed
out that I initially gave the first one the wrong number), but
they haven't yet been posted to the IETF's server.  Presumably,
this will happen sometime early next week.

I know that Koen and others have already commented on
the security mechanisms in draft-mogul-http-dcluster-00.txt.
I'm too busy getting ready for my trip to SIGCOMM to even
read those, but I gather that we have some more discussions
ahead of us.

However, I would like to reach closure on draft-mogul-http-delta-06.txt
as soon as possible (and before putting a lot of effort into
the clusters/templates stuff), so I encourage people to focus
on that document first.  As far as I know, it currently has
no unresolved issues; if this is still true in two weeks,
I will issue a "Last Call" for comments on the HTTP-WG mailing
list, and (assuming no problems) two weeks after that, I will
ask the IESG to approve this as a Proposed Standard.

After that, I will try to devote the necessary energy to
the clusters/templates document.  But I figured it should
be out there for everyone to read, even in its current
not-quite-final form.

Thanks
-Jeff

From koen@win.tue.nl  Thu Aug 31 00:06:25 2000
Return-Path: <koen@win.tue.nl>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id AAA23204; Thu, 31 Aug 2000 00:06:24 -0700 (PDT)
Received: from ztxmail01.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA29340; Thu, 31 Aug 2000 00:06:24 -0700
Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345)
	id D6BF7273F; Thu, 31 Aug 2000 02:06:23 -0500 (CDT)
Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157])
	by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id 07CCE273E
	for <http-delta@pa.dec.com>; Thu, 31 Aug 2000 02:06:23 -0500 (CDT)
Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3)
	  id JAA01303. Thu, 31 Aug 2000 09:06:19 +0200 (MET DST)
From: koen@win.tue.nl (Koen Holtman)
Message-Id: <200008310706.JAA01303@wsooti09.win.tue.nl>
Subject: Re: spoofing
In-Reply-To: <200008241653.MAA22462@lycanthrope.crosslink.net> from "danielh@crosslink.net" at "Aug 24, 2000 12:52: 8 pm"
To: danielh@crosslink.net
Date: Thu, 31 Aug 2000 09:06:19 +0200 (MET DST)
Cc: http-delta@pa.dec.com
X-Mailer: ELM [version 2.4ME+ PL43 (25)]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Daniel Hellerstein:
>
>
>Regarding dcluster and spoofing...
>
>It seems that Jeff is placing great hope on instance-digests,  
>wheras Koen is reluctant to depend on this extra info. 
>
>I'm wondering if a compromise would be to define a  Delta-Uri
>header, which the server can use to indicate the URI associated 
>with a Delta-base; the server would only use this when this uri 
>is NOT the same as the request-URI.  
>
>For example:
[example deleted]

If I understand your example correctly, the idea is that the recipient
of a response with a delta-uri MUST ONLY apply this response to a base
instance obtained from the Delta-Uri.  Yes, I think that this proposal
provides the watertight anti-spoofing protection that I want.

In fact I believe this Delta-Uri proposal is similar to, but less
complex than, the proposal I made back in April.  In my proposal the
response would have a Dcluster or Dtemplate with the same function as
the delta-uri here.  In both cases, the basic idea is that the resource
A sending the delta response includes a means for the client to check
that the resource B that sent the base instance is in the trust domain
of A.

>Daniel Hellerstein

Koen.


From mogul  Tue Oct  3 17:03:05 2000
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id RAA07205; Tue, 3 Oct 2000 17:03:05 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200010040003.RAA07205@wera.pa.dec.com>
To: http-delta
Subject: I've requested Proposed Standard status for HTTP delta encoding
Date: Tue, 03 Oct 2000 17:03:05 -0700
X-Mts: smtp

I sent a Last Call to the HTTP WG mailing list two weeks ago, asking
for any comments on draft-mogul-http-delta-06.txt.  Dave Kristol
sent some grammatical corrections, but otherwise nobody responded.

Therefore, I just sent a message to the IESG asking for Proposed
Standard status for
	draft-mogul-http-delta-07.txt
	draft-mogul-http-digest-02.txt

draft-mogul-http-delta-07.txt is draft-mogul-http-delta-06.txt
with grammatical corrections; draft-mogul-http-digest-02.txt
is draft-mogul-http-digest-01.txt resubmitted because the
previous version has expired.

We probably also need to resubmit draft-korn-vcdiff-01.txt
(as it has also expired), but Phong Vo has asked for a little
time to make some changes.

I'm not sure I've followed all of the IETF's required procedures
properly; it might be necessary to wait two more weeks after
the resubmission date of the digest draft, and it might possibly
be necessary to wait for the new VCDIFF draft.

As a reminder, while these documents might soon appear as
RFCs, that does NOT mean that people should rush to widely
deploy this protocol.  As it says in RFC2026,

   Implementors should treat Proposed Standards as immature
   specifications.  It is desirable to implement them in order to gain
   experience and to validate, test, and clarify the specification.
   However, since the content of Proposed Standards may be changed if
   problems are found or better solutions are identified, deploying
   implementations of such standards into a disruption-sensitive
   environment is not recommended.

We would like to see implementations (which are necessary before
we can get Draft Standard status), but please don't deploy anything
except in experimental settings.

Thanks
-Jeff

From aking@internet.com  Fri Oct 20 07:31:15 2000
Return-Path: <aking@internet.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id HAA16036; Fri, 20 Oct 2000 07:31:15 -0700 (PDT)
Received: from ztxmail01.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA25978; Fri, 20 Oct 2000 07:31:15 -0700
Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345)
	id E2F0D16E8; Fri, 20 Oct 2000 09:31:14 -0500 (CDT)
Received: from hermes.kos.net (hermes.kos.net [216.13.25.100])
	by ztxmail01.ztx.compaq.com (Postfix) with SMTP id 613DB413E
	for <http-delta@pa.dec.com>; Fri, 20 Oct 2000 09:31:14 -0500 (CDT)
Received: (qmail 24896 invoked from network); 20 Oct 2000 14:34:10 -0000
Received: from mki5-pl-ri4.kos.net (HELO ?216.13.27.196?) (216.13.27.196)
  by hermes.kos.net with SMTP; 20 Oct 2000 14:34:10 -0000
X-Sender: aking@mailhost.iworld.com (Unverified)
Message-Id: <l03102800b6160c704f5c@[216.13.27.196]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Fri, 20 Oct 2000 10:54:00 -0400
To: http-delta@pa.dec.com
From: Andy King <aking@internet.com>
Subject: delta compression

i'm doing a story on delta compression for internet.com
pls give me a summary of the status of it (fred douglis
referred me to you).

what live implementations do you know of?

i know of two

xosoft.com
and
http://linuxcare.com.au/rproxy/

know any others?
what kind of improvement can i expect over mod_gzip?

(remotecommunications.com)

thanks

Andrew B. King                 internet.com Corp.
andrew@internet.com            http://www.internet.com
Managing Editor                2020 Hogback Rd. STE #4
http://www.webreference.com    Ann Arbor, MI 48105
http://www.javascript.com      734.971.7906 v 734.975.9184 x



From aking@internet.com  Wed Nov 15 07:24:50 2000
Return-Path: <aking@internet.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id HAA23121; Wed, 15 Nov 2000 07:24:49 -0800 (PST)
Received: from ztxmail01.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA03666; Wed, 15 Nov 2000 07:24:49 -0800
Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345)
	id 0CBD22125; Wed, 15 Nov 2000 09:24:49 -0600 (CST)
Received: from mailhost.iworld.com (unknown [63.95.15.3])
	by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id 18CA51FC1
	for <http-delta@pa.dec.com>; Wed, 15 Nov 2000 09:24:44 -0600 (CST)
Received: by mailhost.iworld.com; id KAA27918; Wed, 15 Nov 2000 10:24:43 -0500 (EST)
Received: from nodnsquery(10.1.4.47) by darienfw1.iworld.com via smap (V5.5)
	id xma027748; Wed, 15 Nov 00 10:23:50 -0500
Received: from [10.1.26.57] by schubert.iworld.com
          (Netscape Messaging Server 3.6)  with ESMTP id AAA1C63
          for <http-delta@pa.dec.com>; Wed, 15 Nov 2000 10:23:47 -0500
X-Sender: aking@mailhost.iworld.com (Unverified)
Message-Id: <l03102802b6385ffa6c29@[10.1.26.57]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Wed, 15 Nov 2000 11:47:29 -0400
To: http-delta@pa.dec.com
From: "King, Andy" <aking@internet.com>
Subject: delta encoding article

all,

jeffrey mogul has graciously written for us an intro to
delta encoding at:

http://webref.com/internet/software/servers/http/deltaencoding/intro/

appreciate any feedback you have, tx. short desc follows:

What is HTTP Delta Encoding?

By sending just the differences between old and new pages,
Web caching and load times can be dramatically improved.
By Jeffrey Mogul.

Andrew B. King                 internet.com Corp.
andrew@internet.com            http://www.internet.com
Managing Editor                2020 Hogback Rd. STE #4
http://www.webreference.com    Ann Arbor, MI 48105
http://www.javascript.com      734.971.7906 v 734.975.9184 x



From issac@p-roman.jct.ac.il  Thu Dec  7 12:06:21 2000
Return-Path: <issac@p-roman.jct.ac.il>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id MAA15746; Thu, 7 Dec 2000 12:06:20 -0800 (PST)
Received: from zmamail02.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA31760; Thu, 7 Dec 2000 12:06:19 -0800
Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345)
	id B262556F0; Thu,  7 Dec 2000 15:06:18 -0500 (EST)
Received: from mail.jct.ac.il (mail.jct.ac.il [147.161.1.14])
	by zmamail02.zma.compaq.com (Postfix) with ESMTP
	id 10569561F; Thu,  7 Dec 2000 15:06:17 -0500 (EST)
Received: from p-roman.jct.ac.il (p-roman.jct.ac.il [147.161.5.104])
	by mail.jct.ac.il (8.10.1/8.10.1) with ESMTP id eB7K7DF10807;
	Thu, 7 Dec 2000 22:07:13 +0200 (IST)
Received: from localhost (issac@localhost)
	by p-roman.jct.ac.il (8.9.3/8.8.7) with ESMTP id WAA26427;
	Thu, 7 Dec 2000 22:07:00 +0200
Date: Thu, 7 Dec 2000 22:07:00 +0200 (IST)
From: Issac Goldstand <issac@mail.jct.ac.il>
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: http-delta@pa.dec.com
Subject: Re: I've requested Proposed Standard status for HTTP delta encoding
In-Reply-To: <200010040003.RAA07205@wera.pa.dec.com>
Message-Id: <Pine.LNX.4.21.0012072206280.26415-100000@p-roman.jct.ac.il>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: issac@p-roman.jct.ac.il

Isn't this taking a slightly long time to make RFC status???  Or have I
been missing some recent activity?

On Tue, 3 Oct 2000, Jeffrey Mogul wrote:

> I sent a Last Call to the HTTP WG mailing list two weeks ago, asking
> for any comments on draft-mogul-http-delta-06.txt.  Dave Kristol
> sent some grammatical corrections, but otherwise nobody responded.
> 
> Therefore, I just sent a message to the IESG asking for Proposed
> Standard status for
> 	draft-mogul-http-delta-07.txt
> 	draft-mogul-http-digest-02.txt
> 
> draft-mogul-http-delta-07.txt is draft-mogul-http-delta-06.txt
> with grammatical corrections; draft-mogul-http-digest-02.txt
> is draft-mogul-http-digest-01.txt resubmitted because the
> previous version has expired.
> 
> We probably also need to resubmit draft-korn-vcdiff-01.txt
> (as it has also expired), but Phong Vo has asked for a little
> time to make some changes.
> 
> I'm not sure I've followed all of the IETF's required procedures
> properly; it might be necessary to wait two more weeks after
> the resubmission date of the digest draft, and it might possibly
> be necessary to wait for the new VCDIFF draft.
> 
> As a reminder, while these documents might soon appear as
> RFCs, that does NOT mean that people should rush to widely
> deploy this protocol.  As it says in RFC2026,
> 
>    Implementors should treat Proposed Standards as immature
>    specifications.  It is desirable to implement them in order to gain
>    experience and to validate, test, and clarify the specification.
>    However, since the content of Proposed Standards may be changed if
>    problems are found or better solutions are identified, deploying
>    implementations of such standards into a disruption-sensitive
>    environment is not recommended.
> 
> We would like to see implementations (which are necessary before
> we can get Draft Standard status), but please don't deploy anything
> except in experimental settings.
> 
> Thanks
> -Jeff
> 

-- 
Internet is a wonderful mechanism for making a fool of
yourself in front of a very large audience.
  --Anonymous

Moving the mouse won't get you into trouble...  Clicking it might.
  --Anonymous

PGP Key 0xE0FA561B - Fingerprint:
7E18 C018 D623 A57B 7F37 D902 8C84 7675 E0FA 561B



From mogul@pa.dec.com  Thu Dec  7 14:24:05 2000
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA06599; Thu, 7 Dec 2000 14:24:04 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA25790; Thu, 7 Dec 2000 14:24:04 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA23587; Thu, 7 Dec 2000 14:24:02 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200012072224.OAA23587@wera.pa.dec.com>
To: Issac Goldstand <issac@mail.jct.ac.il>
Cc: http-delta@pa.dec.com
Subject: Re: I've requested Proposed Standard status for HTTP delta encoding 
In-Reply-To: Your message of "Thu, 07 Dec 2000 22:07:00 +0200."
             <Pine.LNX.4.21.0012072206280.26415-100000@p-roman.jct.ac.il> 
Date: Thu, 07 Dec 2000 14:24:02 -0800
X-Mts: smtp

    Isn't this taking a slightly long time to make RFC status???
    Or have I been missing some recent activity?

It's a mystery to me.

On October 3 2000, I sent the IETF application area directors
a request to consider draft-mogul-http-delta-07.txt as a Proposed
Standard.

I got no response, so I resubmitted the request on November 22 2000.

I was actually going to send them another message yesterday,
after waiting two weeks, but there was a small flood of IESG
actions on the IETF-Announce mailing list, so I decided to
wait.  But today brought no new IETF-Announce messages, so
I have once again sent email to the area directors.  If I
don't get a response soon, I will try to find someone else
in the IESG who might know something.

Sorry about the delay, but I'm not sure what else to do.

-Jeff

From mogul  Mon Jan 15 14:14:56 2001
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id OAA15998; Mon, 15 Jan 2001 14:14:56 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <200101152214.OAA15998@wera.pa.dec.com>
To: http-delta
Subject: Progress! IESG "Last Calls" for Delta, Vcdiff, Instance Digests
Date: Mon, 15 Jan 2001 14:14:56 -0800
X-Mts: smtp

Sorry folks, this took way too long.  Three months ago, I asked the
IESG to consider the HTTP Delta Encoding document as a Proposed
Standard.  Then, nothing happened.  I prodded every few weeks by
email, but it took a phone call to the IESG Secretary to unjam
the process.  People are apologetic.

Note that this does not have anything to do with the "Cluster"
and "Template" documents, since I had put these off until we have
closure on the basic design.  In retrospect, that might have been
a mistake (given how long this step has taken), but they will
have to wait for a while longer.

-Jeff

FYI, here are the announcements (From: iesg-secretary@ietf.org (The IESG),
To: IETF-Announce)

===

The IESG has received a request to consider Delta encoding in HTTP
<draft-mogul-http-delta-07.txt> as a Proposed Standard.  This has been
reviewed in the IETF but is not the product of an IETF Working Group.

The IESG plans to make a decision in the next few weeks, and solicits
final comments on this action.  Please send any comments to the
iesg@ietf.org or ietf@ietf.org mailing lists by February 12, 2001.

Files can be obtained via
http://www.ietf.org/internet-drafts/draft-mogul-http-delta-07.txt

===

The IESG has received a request to consider Instance Digests in HTTP
<draft-mogul-http-digest-03.txt> as a Proposed Standard.  This has been
reviewed in the IETF but is not the product of an IETF Working Group.

The IESG plans to make a decision in the next few weeks, and solicits
final comments on this action.  Please send any comments to the
iesg@ietf.org or ietf@ietf.org mailing lists by February 12, 2001.

Files can be obtained via
http://www.ietf.org/internet-drafts/draft-mogul-http-digest-03.txt

===

The IESG has received a request to consider The VCDIFF Generic
Differencing and Compression Data Format <draft-korn-vcdiff-02.txt> as
a Proposed Standard.  This has been reviewed in the IETF but is not the
product of an IETF Working Group.

The IESG plans to make a decision in the next few weeks, and solicits
final comments on this action.  Please send any comments to the
iesg@ietf.org or ietf@ietf.org mailing lists by February 12, 2001.

Files can be obtained via
http://www.ietf.org/internet-drafts/draft-korn-vcdiff-02.txt

From philip@alexanderworks.org  Sun Jan 21 18:14:13 2001
Return-Path: <philip@alexanderworks.org>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA17290; Sun, 21 Jan 2001 18:14:13 -0800 (PST)
Received: from zmamail01.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA12522; Sun, 21 Jan 2001 18:14:03 -0800
Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345)
	id 6C7971162B; Sun, 21 Jan 2001 21:13:57 -0500 (EST)
Received: from tungsten.btinternet.com (tungsten.btinternet.com [194.73.73.81])
	by zmamail01.zma.compaq.com (Postfix) with ESMTP id EC25D113CB
	for <http-delta@pa.dec.com>; Sun, 21 Jan 2001 21:13:56 -0500 (EST)
Received: from [213.1.170.142] (helo=dobbin.btinternet.com)
	by tungsten.btinternet.com with esmtp (Exim 3.03 #83)
	id 14KWUU-0002sb-00
	for http-delta@pa.dec.com; Mon, 22 Jan 2001 02:13:55 +0000
Message-Id: <4.3.2.7.2.20010122020602.00acf9d0@mail.btinternet.com>
X-Sender: philippawley@mail.btinternet.com
X-Mailer: QUALCOMM Windows Eudora Version 4.3.2
Date: Mon, 22 Jan 2001 02:07:27 +0000
To: http-delta@pa.dec.com
From: Philip Pawley <philip@alexanderworks.org>
Subject: Re: etag or itag
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed


--
Philip Pawley
Liverpool, UK
http://www.alexanderworks.org/
--





From douglis@research.att.com  Fri Feb  9 15:58:54 2001
Return-Path: <douglis@research.att.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA08032; Fri, 9 Feb 2001 15:58:54 -0800 (PST)
Received: from zmamail02.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA10911; Fri, 9 Feb 2001 15:58:53 -0800
Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345)
	id 55CE158CD; Fri,  9 Feb 2001 18:58:53 -0500 (EST)
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by zmamail02.zma.compaq.com (Postfix) with ESMTP id 2A5E55877
	for <http-delta@pa.dec.com>; Fri,  9 Feb 2001 18:58:53 -0500 (EST)
Received: from alliance.research.att.com (alliance.research.att.com [135.207.26.26])
	by mail-blue.research.att.com (Postfix) with ESMTP
	id AB5F94CE02; Fri,  9 Feb 2001 18:58:52 -0500 (EST)
Received: from windsor.research.att.com (windsor.research.att.com [135.207.26.46])
	by alliance.research.att.com (8.8.7/8.8.7) with ESMTP id SAA27428;
	Fri, 9 Feb 2001 18:58:47 -0500 (EST)
Received: from windsor.research.att.com (localhost [127.0.0.1])
	by windsor.research.att.com (8.8.8+Sun/8.8.5) with ESMTP id SAA11325;
	Fri, 9 Feb 2001 18:58:47 -0500 (EST)
Message-Id: <200102092358.SAA11325@windsor.research.att.com>
From: Fred Douglis <douglis@research.att.com>
To: iesg@ietf.org
Cc: smonetti@att.com, tfrost@att.com, misha@research.att.com,
        http-delta@pa.dec.com
Subject: Re: delta-encoding in HTTP to proposed standard
Date: Fri, 09 Feb 2001 18:58:46 -0500
Sender: douglis@research.att.com

I've been asked to pass along the following advisory.
======

This is to advise the IETF that AT&T has intellectual property that may be
applicable to I-D draft-mogul-http-delta-07.txt.  The intellectual property
includes U.S. patent 5,931,904, Method for reducing the delay between the
time a data page is requested and the time the data page is displayed. 

AT&T is currently reviewing its licensing intent relative to this
Intellectual Property and will notify the IETF accordingly within the next
few weeks.

Tom Frost
AT&T Intellectual Property Management
Room 2E37, Bldg. 104
180 Park Avenue
Florham Park, NJ 07932
tfrost@att.com



From douglis@research.att.com  Fri Feb  9 16:05:59 2001
Return-Path: <douglis@research.att.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA24437; Fri, 9 Feb 2001 16:05:59 -0800 (PST)
Received: from zmamail02.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA15153; Fri, 9 Feb 2001 16:05:58 -0800
Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345)
	id 541FF5839; Fri,  9 Feb 2001 19:05:58 -0500 (EST)
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by zmamail02.zma.compaq.com (Postfix) with ESMTP
	id F191E591B; Fri,  9 Feb 2001 19:05:57 -0500 (EST)
Received: from alliance.research.att.com (alliance.research.att.com [135.207.26.26])
	by mail-blue.research.att.com (Postfix) with ESMTP
	id B95264CE0B; Fri,  9 Feb 2001 19:05:57 -0500 (EST)
Received: from douglux.research.att.com (root@douglux.research.att.com [135.207.26.106])
	by alliance.research.att.com (8.8.7/8.8.7) with ESMTP id TAA27601;
	Fri, 9 Feb 2001 19:05:57 -0500 (EST)
Received: from douglux.research.att.com (IDENT:douglis@localhost.localdomain [127.0.0.1])
	by douglux.research.att.com (8.9.3/8.9.3) with ESMTP id TAA17105;
	Fri, 9 Feb 2001 19:05:57 -0500
Message-Id: <200102100005.TAA17105@douglux.research.att.com>
X-Mailer: exmh version 2.1.1 10/15/1999
From: Fred Douglis <douglis@research.att.com>
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: http-delta@pa.dec.com, ned.freed@innosoft.com, paf@cisco.com
Subject: Re: Last Call: Delta encoding in HTTP to Proposed Standard 
In-Reply-To: Your message of "Mon, 15 Jan 2001 14:14:56 PST."
             <200101152214.OAA15998@wera.pa.dec.com> 
X-Uri: http://www.research.att.com/~douglis/
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 09 Feb 2001 19:05:56 -0500
Sender: douglis@research.att.com

I just copied http-delta on mail to IESG about an AT&T patent that may pertain 
to the delta-encoding I-D.  That was the formal statement; I wanted to make an 
informal comment as well (and to copy the applications area directors).

I apologize if this is perceived to be coming later in the process than it 
should have.  I'm not a long-term/active participant in the IETF and have 
heard various conflicting statements about when it is appropriate to disclose 
such information.  I endeavored to make a statement before the last-call 
deadline for the draft.  I hope this is sufficient. 

In regard to the inclusion of rsync/rproxy in the I-D, I support modifying it 
accordingly.

Fred


From cjh@osa.com.au  Sun Feb 11 21:51:25 2001
Return-Path: <cjh@osa.com.au>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id VAA02126; Sun, 11 Feb 2001 21:51:25 -0800 (PST)
Received: from ztxmail02.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA04819; Sun, 11 Feb 2001 21:51:24 -0800
Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345)
	id 644351E72; Sun, 11 Feb 2001 23:51:24 -0600 (CST)
Received: from fw01.osa.com.au (fw01.osa.com.au [203.6.130.130])
	by ztxmail02.ztx.compaq.com (Postfix) with SMTP id A5FD31F11
	for <http-delta@pa.dec.com>; Sun, 11 Feb 2001 23:51:22 -0600 (CST)
Received: (qmail 11695 invoked by uid 0); 12 Feb 2001 05:51:20 -0000
Received: (ofmipd 172.16.33.89); 12 Feb 2001 05:50:57 -0000
Received: (qmail 26712 invoked by uid 4005); 12 Feb 2001 05:51:19 -0000
Received: from cjh@magpie.osa.com.au by excalibur.osa.com.au with qmail-scanner-0.90 (. Clean. Processed in 0.293723 secs); 12/02/2001 16:51:19
Received: from magpie.osa.com.au (172.16.36.3)
  by excalibur.osa.com.au with SMTP; 12 Feb 2001 05:51:17 -0000
Received: (qmail 1717 invoked from network); 12 Feb 2001 05:51:16 -0000
Received: from localhost.osa.com.au (HELO magpie.osa.com.au) (127.0.0.1)
  by localhost.osa.com.au with SMTP; 12 Feb 2001 05:51:16 -0000
Date: 12 Feb 2001 16:51:16 +1100
Message-Id: <20010212165116.1.11694.qmail@osa.com.au>
From: "Clifford Heath" <cjh@osa.com.au>
To: "Fred Douglis" <douglis@research.att.com>
Cc: http-delta@pa.dec.com
Subject: Re: delta-encoding in HTTP to proposed standard 
In-Reply-To: Your message of "Fri, 09 Feb 2001 18:58:46 CDT."
             <200102092358.SAA11325@windsor.research.att.com> 

> This is to advise the IETF that AT&T has intellectual property that may be
> applicable to I-D draft-mogul-http-delta-07.txt.

I can't see how this is applicable. The patent clearly delineates the
operation of sending an available old version of a document while fetching,
computing and sending differences against the new one, with the goal of an
overall reduction in latency.

I can't see how HTTP deltas would be used to serve this purpose, other than
by abusing other tags, like expiry. Only the current version or deltas to
reach the current version are expected to be sent, no?

--
Clifford Heath, Open Software Associates, mailto:cjh@osa.com.au,
Ph +613 9895 2194, Fax 9895 2020, <http://www.osa.com.au/~cjh>,
56-60 Rutland Rd, Box Hill 3128, Melbourne, Victoria, Australia.

From chair@ietf.org  Mon Feb 12 18:05:51 2001
Return-Path: <chair@ietf.org>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA21331; Mon, 12 Feb 2001 18:05:50 -0800 (PST)
Received: from zmamail02.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA24767; Mon, 12 Feb 2001 18:05:50 -0800
Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345)
	id EF1A55C70; Mon, 12 Feb 2001 21:05:49 -0500 (EST)
Received: from sj-msg-core-2.cisco.com (sj-msg-core-2.cisco.com [171.69.43.88])
	by zmamail02.zma.compaq.com (Postfix) with ESMTP id 6C3915DF7
	for <http-delta@pa.dec.com>; Mon, 12 Feb 2001 21:05:49 -0500 (EST)
Received: from FRED-W2K.ietf.org (fred-hm-dhcp1.cisco.com [171.69.128.116])
	by sj-msg-core-2.cisco.com (8.9.3/8.9.1) with ESMTP id SAA14273;
	Mon, 12 Feb 2001 18:05:30 -0800 (PST)
Message-Id: <4.3.2.7.2.20010212180206.0244b920@mira-sjcm-2.cisco.com>
X-Sender: fred@flipper.cisco.com (Unverified)
X-Mailer: QUALCOMM Windows Eudora Version 4.3.2
Date: Mon, 12 Feb 2001 18:05:05 -0800
To: Fred Douglis <douglis@research.att.com>
From: Fred Baker <chair@ietf.org>
Subject: Re: delta-encoding in HTTP to proposed standard
Cc: iesg@ietf.org, smonetti@att.com, tfrost@att.com, misha@research.att.com,
        http-delta@pa.dec.com
In-Reply-To: <200102092358.SAA11325@windsor.research.att.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed

Maybe you could advise us of your intentions with regard to this. The IETF, 
as a rule, seeks to avoid standardizing people's business plans, especially 
in individual submissions which have not had working group review. Is there 
a strong reason to not take this to Informational status - treat it as a 
corporate white paper?

At 06:58 PM 2/9/2001 -0500, Fred Douglis wrote:
>I've been asked to pass along the following advisory.
>======
>
>This is to advise the IETF that AT&T has intellectual property that may be
>applicable to I-D draft-mogul-http-delta-07.txt.  The intellectual property
>includes U.S. patent 5,931,904, Method for reducing the delay between the
>time a data page is requested and the time the data page is displayed.
>
>AT&T is currently reviewing its licensing intent relative to this
>Intellectual Property and will notify the IETF accordingly within the next
>few weeks.
>
>Tom Frost
>AT&T Intellectual Property Management
>Room 2E37, Bldg. 104
>180 Park Avenue
>Florham Park, NJ 07932
>tfrost@att.com


From mogul  Mon Feb 12 23:09:34 2001
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id XAA01572; Mon, 12 Feb 2001 23:09:34 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <200102130709.XAA01572@wera.pa.dec.com>
To: Fred Baker <chair@ietf.org>
cc: Fred Douglis    <douglis@research.att.com>, iesg@ietf.org,
        smonetti@att.com, tfrost@att.com, misha@research.att.com, mogul,
        http-delta
Subject: Re: delta-encoding in HTTP to proposed standard 
In-reply-to: Your message of "Mon, 12 Feb 2001 18:05:05 PST."
             <4.3.2.7.2.20010212180206.0244b920@mira-sjcm-2.cisco.com> 
Date: Mon, 12 Feb 2001 23:09:34 -0800
X-Mts: smtp

    Maybe you could advise us of your intentions with regard to this.
    The IETF, as a rule, seeks to avoid standardizing people's business
    plans, especially in individual submissions which have not had
    working group review. Is there a strong reason to not take this to
    Informational status - treat it as a corporate white paper?

For the record: the HTTP Delta spec has been developed by a group
of people from several dozen companies and universities.  While
several of the authors are (or were) from AT&T, that should not
obscure the fact that this is definitely a multi-vendor (and
multi-non-vendor) standards proposal.  Although it was not the
product of a formal working group, it was publicized numerous
times within the HTTP working group (but was not within that
group's charter), and received some discussion on the HTTP-WG
mailing list.  Our intention has always been to seek standards-track
status.

I will let the AT&T people explain how and why they believe
that they have intellectual property that is related to this
patent; that was news to me and to most of the other people who
worked on this spec over a period of several years.

However, I want the IESG to be very clear on this one point:
this is NOT an AT&T "corporate white paper" in any way.  If
AT&T's recent claim has confused the IESG about this issue,
this is unfortunate.

I trust that AT&T will clarify the intellectual property issues
promptly.

-Jeff

From fred@cisco.com  Tue Feb 13 00:17:10 2001
Return-Path: <fred@cisco.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id AAA04019; Tue, 13 Feb 2001 00:17:09 -0800 (PST)
Received: from ztxmail02.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA32112; Tue, 13 Feb 2001 00:17:09 -0800
Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345)
	id 258E31C9B; Tue, 13 Feb 2001 02:17:09 -0600 (CST)
Received: from sj-msg-core-2.cisco.com (sj-msg-core-2.cisco.com [171.69.43.88])
	by ztxmail02.ztx.compaq.com (Postfix) with ESMTP
	id 9A08E1E1C; Tue, 13 Feb 2001 02:17:08 -0600 (CST)
Received: from FRED-W2K.cisco.com (fred-hm-dhcp1.cisco.com [171.69.128.116])
	by sj-msg-core-2.cisco.com (8.9.3/8.9.1) with ESMTP id AAA06567;
	Tue, 13 Feb 2001 00:17:22 -0800 (PST)
Message-Id: <4.3.2.7.2.20010213000603.023d1db0@mira-sjcm-2.cisco.com>
X-Sender: fred@mira-sjcm-2.cisco.com
X-Mailer: QUALCOMM Windows Eudora Version 4.3.2
Date: Tue, 13 Feb 2001 00:16:27 -0800
To: Jeffrey Mogul <mogul@pa.dec.com>
From: Fred Baker <fred@cisco.com>
Subject: Re: delta-encoding in HTTP to proposed standard 
Cc: Fred Douglis <douglis@research.att.com>, iesg@ietf.org, smonetti@att.com,
        tfrost@att.com, misha@research.att.com, mogul@pa.dec.com,
        http-delta@pa.dec.com
In-Reply-To: <200102130709.XAA01572@wera.pa.dec.com>
References: <Your message of "Mon, 12 Feb 2001 18:05:05 PST." <4.3.2.7.2.20010212180206.0244b920@mira-sjcm-2.cisco.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed

At 11:09 PM 2/12/2001 -0800, Jeffrey Mogul wrote:
>I trust that AT&T will clarify the intellectual property issues
>promptly.

Thanks. I certainly hope that they will.


From douglis@research.att.com  Tue Feb 13 06:40:05 2001
Return-Path: <douglis@research.att.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id GAA17402; Tue, 13 Feb 2001 06:40:04 -0800 (PST)
Received: from ztxmail01.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA18899; Tue, 13 Feb 2001 06:40:00 -0800
Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345)
	id A0D1929D2; Tue, 13 Feb 2001 08:39:59 -0600 (CST)
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by ztxmail01.ztx.compaq.com (Postfix) with ESMTP
	id 4837C2880; Tue, 13 Feb 2001 08:39:59 -0600 (CST)
Received: from alliance.research.att.com (alliance.research.att.com [135.207.26.26])
	by mail-blue.research.att.com (Postfix) with ESMTP
	id 8E4F24CE2B; Tue, 13 Feb 2001 09:39:58 -0500 (EST)
Received: from douglux.research.att.com (root@douglux.research.att.com [135.207.26.106])
	by alliance.research.att.com (8.8.7/8.8.7) with ESMTP id JAA16846;
	Tue, 13 Feb 2001 09:39:57 -0500 (EST)
Received: from douglux.research.att.com (IDENT:douglis@localhost.localdomain [127.0.0.1])
	by douglux.research.att.com (8.9.3/8.9.3) with ESMTP id JAA01231;
	Tue, 13 Feb 2001 09:39:56 -0500
Message-Id: <200102131439.JAA01231@douglux.research.att.com>
X-Mailer: exmh version 2.1.1 10/15/1999
From: Fred Douglis <douglis@research.att.com>
To: Fred Baker <fred@cisco.com>
Cc: Jeffrey Mogul <mogul@pa.dec.com>, iesg@ietf.org, smonetti@att.com,
        tfrost@att.com, misha@research.att.com, http-delta@pa.dec.com
Subject: Re: delta-encoding in HTTP to proposed standard 
In-Reply-To: Your message of "Tue, 13 Feb 2001 00:16:27 PST."
             <4.3.2.7.2.20010213000603.023d1db0@mira-sjcm-2.cisco.com> 
X-Uri: http://www.research.att.com/~douglis/
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Tue, 13 Feb 2001 09:39:56 -0500
Sender: douglis@research.att.com

You can expect a statement from AT&T soon, most likely by the end of the week.
I won't go into any further comments now, other than to apologize for the
confusion and the timing.  (Note also that I sent a separate note on the timing to
the applications area directors right after the formal notification to the
IESG.)

Fred



From mogul  Tue Feb 13 16:31:34 2001
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA28760; Tue, 13 Feb 2001 16:31:33 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <200102140031.QAA28760@wera.pa.dec.com>
To: http-delta
Subject: Last-Call change request by Larry Masinter
Date: Tue, 13 Feb 2001 16:31:33 -0800
X-Mts: smtp

My response follows.

-Jeff

------- Forwarded Message

Return-Path: lmnet@attglobal.net
From: "Larry Masinter" <lmnet@attglobal.net>
To: <iesg@ietf.org>
Subject: RE: Last Call: Delta encoding in HTTP to Proposed Standard
Date: Sun, 11 Feb 2001 00:06:18 -0800
Message-Id: <NDBBKEBDLFENBJCGFOIJCEGLEGAA.lmnet@attglobal.net>
Mime-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-Msmail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)
Importance: Normal
X-Mimeole: Produced By Microsoft MimeOLE V5.50.4133.2400
In-Reply-To: <200101151228.HAA02160@ietf.org>

# The IESG has received a request to consider Delta encoding in HTTP
# <draft-mogul-http-delta-07.txt> as a Proposed Standard.  This has been
# reviewed in the IETF but is not the product of an IETF Working Group.

I would like to respectfully request that the author(s) tone down
his(their) condemnation of the terminology in RFC 2616 around the
word "entity", since it isn't necessary to the understanding of
the protocol they're proposing.

In almost all system designs with new concepts, it is necessary to take
ordinary words and give them technical meanings that don't exactly match
their dictionary definitions. I can remember that at times the
discussions in HTTP-WG over terminology were heated, but in the end, it
was necessary to make some choices.

I think all that's necessary is to reword, in a minor way, the 
paragraphs in section 3 (Terminology) used to introduce the
term "instance".

OLD:
   The dictionary definition for ``entity'' is ``something that has
   separate and distinct existence and objective or conceptual
   reality'' [21].  Unfortunately, the definition for ``entity'' in
   HTTP/1.1 is similar to that used in MIME [12], based on an entirely
   false analogy between MIME and HTTP.
NEW:
   The dictionary definition for ``entity'' is ``something that has
   separate and distinct existence and objective or conceptual
   reality'' [21].  The definition for ``entity'' in HTTP/1.1 is
   similar to that used in MIME [12], based on an analogy between MIME
   and HTTP.

OLD:
   In MIME, electronic mail messages do have distinct and separate
   existences, so the MIME definition of ``entity'' as something that
   ``refers specifically to the MIME-defined header fields and contents
   of either a message or one of the parts in the body of a multipart
   entity'' makes sense.
NEW:
   In MIME, electronic mail messages have distinct and separate
   existences. MIME defines ``entity'' as something that ``refers
   specifically to the MIME-defined header fields and contents of
   either a message or one of the parts in the body of a multipart
   entity''.

OLD:
   In HTTP, however, a response message to a GET does not have a
   distinct and separate existence.  Rather, it is describing the
   current state of a resource (or a variant, subject to a set of
   constraints).  The HTTP/1.1 specification provides no term to
   describe ``the value that would be returned in response to a GET
   request at the current time for the selected variant of the specified
   resource.''  This leads to awkward wordings in the HTTP/1.1
   specification in places where this concept is necessary.
NEW:
   In HTTP, however, an entity in a response message to a GET is more
   transient. It reflects the current state of a resource (or a
   variant, subject to a set of constraints).  The HTTP/1.1
   specification has no term for ``the value that would be returned in
   response to a GET request at the current time for the selected
   variant of the specified resource.''  This leads to awkward
   wordings in the HTTP/1.1 specification in places where this concept
   is necessary.

OLD:
   It is too late to fix the terminological failure in the HTTP/1.1
   specification, so we instead define a new term, for use in this
   document:
NEW:
   To express this concept, we define a new term, for use in this
   document:


------- End of Forwarded Message


From mogul  Tue Feb 13 16:31:58 2001
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA28796; Tue, 13 Feb 2001 16:31:57 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <200102140031.QAA28796@wera.pa.dec.com>
To: "Larry Masinter" <lmnet@attglobal.net>
cc: <iesg@ietf.org>, http-delta
Subject: Re: Last Call: Delta encoding in HTTP to Proposed Standard 
In-reply-to: Your message of "Sun, 11 Feb 2001 00:06:18 PST."
             <NDBBKEBDLFENBJCGFOIJCEGLEGAA.lmnet@attglobal.net> 
Date: Tue, 13 Feb 2001 16:31:57 -0800
X-Mts: smtp

Larry writes:

  I think all that's necessary is to reword, in a minor way, the 
  paragraphs in section 3 (Terminology) used to introduce the
  term "instance".
  
  OLD:
     The dictionary definition for ``entity'' is ``something that has
     separate and distinct existence and objective or conceptual
     reality'' [21].  Unfortunately, the definition for ``entity'' in
     HTTP/1.1 is similar to that used in MIME [12], based on an entirely
     false analogy between MIME and HTTP.
  NEW:
     The dictionary definition for ``entity'' is ``something that has
     separate and distinct existence and objective or conceptual
     reality'' [21].  The definition for ``entity'' in HTTP/1.1 is
     similar to that used in MIME [12], based on an analogy between MIME
     and HTTP.
  
  OLD:
     In MIME, electronic mail messages do have distinct and separate
     existences, so the MIME definition of ``entity'' as something that
     ``refers specifically to the MIME-defined header fields and contents
     of either a message or one of the parts in the body of a multipart
     entity'' makes sense.
  NEW:
     In MIME, electronic mail messages have distinct and separate
     existences. MIME defines ``entity'' as something that ``refers
     specifically to the MIME-defined header fields and contents of
     either a message or one of the parts in the body of a multipart
     entity''.
  
  OLD:
     In HTTP, however, a response message to a GET does not have a
     distinct and separate existence.  Rather, it is describing the
     current state of a resource (or a variant, subject to a set of
     constraints).  The HTTP/1.1 specification provides no term to
     describe ``the value that would be returned in response to a GET
     request at the current time for the selected variant of the specified
     resource.''  This leads to awkward wordings in the HTTP/1.1
     specification in places where this concept is necessary.
  NEW:
     In HTTP, however, an entity in a response message to a GET is more
     transient. It reflects the current state of a resource (or a
     variant, subject to a set of constraints).  The HTTP/1.1
     specification has no term for ``the value that would be returned in
     response to a GET request at the current time for the selected
     variant of the specified resource.''  This leads to awkward
     wordings in the HTTP/1.1 specification in places where this concept
     is necessary.
  
  OLD:
     It is too late to fix the terminological failure in the HTTP/1.1
     specification, so we instead define a new term, for use in this
     document:
  NEW:
     To express this concept, we define a new term, for use in this
     document:
  
I would accept all of these changes, except that in the first
change Larry suggested, I am going to insist a phrase such as
	based on a false analogy between MIME and HTTP.
Or, if Larry would prefer,
	based on a naive analogy between MIME and HTTP.

-Jeff

From mogul  Tue Feb 13 16:48:46 2001
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA29443; Tue, 13 Feb 2001 16:48:45 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <200102140048.QAA29443@wera.pa.dec.com>
To: Neale Banks <neale@lowendale.com.au>
cc: iesg@ietf.org, http-delta
Subject: Re: Delta encoding in HTTP to Proposed Standard 
In-reply-to: Your message of "Fri, 09 Feb 2001 23:54:34 +1100."
             <Pine.LNX.4.05.10102092334580.21521-100000@marina.lowendale.com.au> 
Date: Tue, 13 Feb 2001 16:48:45 -0800
X-Mts: smtp

Neale Banks writes:

    Submission to the IETF and IESG regarding "Delta encoding in HTTP
    <draft-mogul-http-delta-07.txt>" as a Proposed Standard.
    
    In relation to this Internet-Draft I have a concern regarding its
    acceptance as a Proposed Standard in its current form, due to a
    significant omission.
    
    This Internet-Draft includes section 1.1 titled "Related research
    and proposals". However this section completely fails to acknowledge
    the existence of the rproxy project[1].  Nor is rproxy refered to
    anywhere else in the current draft.  It is my humble opinion that
    this omission renders this Internet-Draft critically incomplete.
    This section could also benefit from a reference to rsync[2].
    
    I in no way submit that the technical proposals of Mogul et al are
    inferior to rproxy, but rather that these two approach similar (if not
    the same) challenges with contrasting solutions.  It is from this
    point of view that I submit that the current draft is critically
    incomplete insomuch as includes a section "Related research and
    proposals" which makes no apparent qualification of incompleteness.
    
    Whilst there may be grounds to allege that rproxy is still a
    work-in-progress, it is a project which has a sound foundation - 
    "The rproxy algorithm is based on the well-known and trustworthy
    rsync software by Andrew Tridgell." [1],[2]
    
    Having discussed this matter with the one of the rproxy developers[3],
    I am sure that the contributors to rproxy would be agreeable to
    providing some assistance with including an appropriate reference in
    this Internet-Draft.

    [1] rproxy: http://www.linuxcare.com.au/rproxy
    [2] rsync: http://rsync.samba.org/
    [3] Conversation with Martin Pool at linux.conf.au, January 2001

I am not aware that an IETF Standards-Track document is required
to include any "related work" section.  This document includes
one because we believe that it clarifies the background behind
the protocol specification.  The title of this section is
"Related research and proposals."

Although I do not believe that it would be necessary for understanding
the HTTP Delta specification, I would be happy to cite the rsync
technical report, as

	Andrew Tridgell and Paul Mackerras.
	The rsync algorithm. Technical Report,
	Department of Computer Science, Australian National University.
	November, 1998.
	http://rsync.samba.org/rsync/tech_report/

especially because there has been some discussion of trying to
fit rsync into the framework that has been developed for HTTP
Deltas.  (I should point out that Andrew Tridgell has been
on the http-delta@pa.dec.com mailing list for some time, and
has occasionally participated in our discussions.  Martin Pool
is also on the mailing list, but no messages from him appear
in our log.)

Andrew, if there is something else I should cite instead, please
let me know ASAP.

As Mark Nottingham writes, regarding the rproxy pages:

    Is there an rsync protocol specification to refer to (in a
    normative or non-normative manner)? I see a presentation about the
    protocol, and a one-page description with some BNF defining
    'delta', but nothing else.

I too do not see anything in the rproxy-related pages that constitutes
a "related proposal."  Therefore, it is hard to see how our failure
to cite rproxy renders the HTTP Delta specification "critically
incomplete."

-Jeff

From tridge@au2.samba.org  Tue Feb 13 17:09:58 2001
Return-Path: <tridge@au2.samba.org>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id RAA20911; Tue, 13 Feb 2001 17:09:58 -0800 (PST)
Received: from ztxmail02.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA22193; Tue, 13 Feb 2001 17:09:58 -0800
Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345)
	id 9577C1CFD; Tue, 13 Feb 2001 19:09:57 -0600 (CST)
Received: from au2.samba.org (ns1.samba.org [203.17.0.92])
	by ztxmail02.ztx.compaq.com (Postfix) with ESMTP
	id BBD651DD8; Tue, 13 Feb 2001 19:09:56 -0600 (CST)
Received: by au2.samba.org (Postfix, from userid 148)
	id D1AA3659838; Wed, 14 Feb 2001 11:57:18 +1100 (EST)
From: Andrew Tridgell <tridge@samba.org>
To: mogul@pa.dec.com
Cc: neale@lowendale.com.au, iesg@ietf.org, http-delta@pa.dec.com
In-Reply-To: <200102140048.QAA29443@wera.pa.dec.com> (message from Jeffrey
	Mogul on Tue, 13 Feb 2001 16:48:45 -0800)
Subject: Re: Delta encoding in HTTP to Proposed Standard
Reply-To: tridge@samba.org
References:  <200102140048.QAA29443@wera.pa.dec.com>
Message-Id: <20010214005718.D1AA3659838@au2.samba.org>
Date: Wed, 14 Feb 2001 11:57:18 +1100 (EST)
Sender: tridge@au2.samba.org

Jeff,

> Andrew, if there is something else I should cite instead, please
> let me know ASAP.

Probably the most useful cite is http://rproxy.samba.org/ as that
gives the most directly relevant information on the interaction of
rsync with http. If you would prefer a more academic cite then my PhD
thesis is probably the best (instead of that technical report) as it
contains much more up to date and complete information, plus it talks
directly about the use of rsync in http.

> I too do not see anything in the rproxy-related pages that constitutes
> a "related proposal."  Therefore, it is hard to see how our failure
> to cite rproxy renders the HTTP Delta specification "critically
> incomplete."

you are correct. It would be nice to have a reference in there for
completeness but we are approaching the problem very much from a
"exploring implementations" standpoint rather than a standards view. I
certainly do not think that the lack of information about rproxy makes
the HTTP delta proposal "critically incomplete".

It's quite likely that we will be pursuing a standards approach at
some time in the future, but that isn't our emphasis at the moment.

Cheers, Tridge

From neoi@writeme.com  Tue Feb 13 23:54:50 2001
Return-Path: <neoi@writeme.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id XAA16161; Tue, 13 Feb 2001 23:54:49 -0800 (PST)
Received: from ztxmail02.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA14212; Tue, 13 Feb 2001 23:54:49 -0800
Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345)
	id D16E21C61; Wed, 14 Feb 2001 01:54:48 -0600 (CST)
Received: from mail.jct.ac.il (mail.jct.ac.il [147.161.1.14])
	by ztxmail02.ztx.compaq.com (Postfix) with ESMTP
	id 5C2091EF4; Wed, 14 Feb 2001 01:54:47 -0600 (CST)
Received: from beamartyr (goldrush.jct.ac.il [147.161.5.215])
	by mail.jct.ac.il (8.10.1/8.10.1) with SMTP id f1E7ub210814;
	Wed, 14 Feb 2001 09:56:39 +0200 (IST)
Message-Id: <001c01c0965b$5e520dc0$d705a193@jct.ac.il>
From: "Issac Goldstand" <neoi@writeme.com>
To: "Jeffrey Mogul" <mogul@pa.dec.com>
Cc: <http-delta@pa.dec.com>
References: <200102140048.QAA29443@wera.pa.dec.com>
Subject: Re: Delta encoding in HTTP to Proposed Standard 
Date: Wed, 14 Feb 2001 09:54:26 +0200
Mime-Version: 1.0
Content-Type: text/plain;
	charset="windows-1255"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-Msmail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4133.2400
X-Mimeole: Produced By Microsoft MimeOLE V5.50.4133.2400

Jeff:

While we're on the subject of related works, Akamai, along with many other
companies, are working on a similar service called ICAP (www.i-cap.org).  In
short, ICAP is supposed to be a protocol in which special servers can modify
HTTP requests, response headers and payloads.  Now, the projects DO differ
as delta is about the content server sending "updated" versions of the
payload, while ICAP is more targeted at changing or adding to the payload
(mush as server-based preprocessors like SSI and PHP).  I still think,
however, that it's worth taking a closer look at.  They have an
Internet-Draft, although to the best of my knowledge, they have not yet
submitted it.  A copy of the current draft is available at
http://www.i-cap.org/icap/media/draft-opes-icap-00.txt

  Issac

Internet is a wonderful mechanism for making a fool of
yourself in front of a very large audience.
  --Anonymous

Moving the mouse won't get you into trouble...  Clicking it might.
  --Anonymous

PGP Key 0xE0FA561B - Fingerprint:
7E18 C018 D623 A57B 7F37 D902 8C84 7675 E0FA 561B


From mogul@pa.dec.com  Wed Feb 14 11:54:27 2001
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA17141; Wed, 14 Feb 2001 11:54:27 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA07584; Wed, 14 Feb 2001 11:54:26 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA17409; Wed, 14 Feb 2001 11:54:25 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200102141954.LAA17409@wera.pa.dec.com>
To: "Issac Goldstand" <neoi@writeme.com>
Cc: <http-delta@pa.dec.com>
Subject: Re: Delta encoding in HTTP to Proposed Standard 
In-Reply-To: Your message of "Wed, 14 Feb 2001 09:54:26 +0200."
             <001c01c0965b$5e520dc0$d705a193@jct.ac.il> 
Date: Wed, 14 Feb 2001 11:54:25 -0800
X-Mts: smtp

With all due respect to all of the other related work
out there, this document is not a "survey of things related
in some vague way to delta encoding".  The section on
related research and proposals is meant to be a background,
not the bibliography from a doctoral dissertation :-).

I do think it makes sense to mention rsync, since we have
made some brief stabs at trying to fit rsync (or something
with a similar function) into the instance-manipulation
layer.

But going beyond that seems like a slippery slope, especially
given the number of research and commercial projects out
there are that are "related".  For a list that someone
else put together, see
	http://webreference.com/internet/software/servers/http/deltaencoding/
and this doesn't include some other things that I know about.

-Jeff

From lmm@acm.org  Wed Feb 14 20:31:07 2001
Return-Path: <lmm@acm.org>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id UAA26149; Wed, 14 Feb 2001 20:31:07 -0800 (PST)
Received: from zmamail02.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA31365; Wed, 14 Feb 2001 20:31:06 -0800
Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345)
	id 506255ED4; Wed, 14 Feb 2001 23:31:06 -0500 (EST)
Received: from smtp-relay-1.Adobe.COM (smtp-relay-1.adobe.com [192.150.11.1])
	by zmamail02.zma.compaq.com (Postfix) with ESMTP
	id ADC2A5C92; Wed, 14 Feb 2001 23:31:05 -0500 (EST)
Received: from inner-relay-1.Adobe.COM (inner-relay-1.corp.adobe.com [153.32.1.51])
	by smtp-relay-1.Adobe.COM (8.8.6) with ESMTP id UAA19977;
	Wed, 14 Feb 2001 20:34:45 -0800 (PST)
Received: from mailsj-v1.corp.adobe.com  by inner-relay-1.Adobe.COM (8.8.5) with ESMTP id UAA03559; Wed, 14 Feb 2001 20:30:12 -0800 (PST)
Received: from larrypad ([153.32.67.80]) by
          mailsj-v1.corp.adobe.com (Netscape Messaging Server 4.15) with
          SMTP id G8S77R00.JDL; Wed, 14 Feb 2001 20:31:03 -0800 
From: "Larry Masinter" <LMM@acm.org>
To: "Jeffrey Mogul" <mogul@pa.dec.com>
Cc: <iesg@ietf.org>, <http-delta@pa.dec.com>
Subject: RE: Last Call: Delta encoding in HTTP to Proposed Standard 
Date: Wed, 14 Feb 2001 20:30:47 -0800
Message-Id: <NDBBKEBDLFENBJCGFOIJEEMMEGAA.LMM@acm.org>
Mime-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-Msmail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)
In-Reply-To: <200102140031.QAA28796@wera.pa.dec.com>
Importance: Normal
X-Mimeole: Produced By Microsoft MimeOLE V5.50.4133.2400

> I would accept all of these changes, except that in the first
> change Larry suggested, I am going to insist a phrase such as
> 	based on a false analogy between MIME and HTTP.
> Or, if Larry would prefer,
> 	based on a naive analogy between MIME and HTTP.

It's hard for an analogy to be false ("the moon is like a piece
of green cheese"), and I don't think the analogy between MIME
and HTTP was particularly naive. 

I might suggest abandoning "analogy" altogether:

    based on the (somewhat problematic) relationship between
    MIME and HTTP.

The relationship between MIME and HTTP is problematic,
(IMHO) due as much to narrowness of the MIME document's focus
on email as it is to the reuse MIME constructs in HTTP.

But I think I've made my point, and I'll go along with whatever
wording the editor(s) choose.

Larry





From neale@lowendale.com.au  Fri Feb 16 06:08:05 2001
Return-Path: <neale@lowendale.com.au>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id GAA11227; Fri, 16 Feb 2001 06:08:05 -0800 (PST)
Received: from zmamail01.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA00482; Fri, 16 Feb 2001 06:07:54 -0800
Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345)
	id 8F1A8BC4D; Fri, 16 Feb 2001 09:07:53 -0500 (EST)
Received: from marina.lowendale.com.au (gw.lowendale.com.au [203.26.242.120])
	by zmamail01.zma.compaq.com (Postfix) with ESMTP
	id 73FFACB53; Fri, 16 Feb 2001 09:07:48 -0500 (EST)
Received: from localhost (neale@localhost)
	by marina.lowendale.com.au (8.9.3/8.9.3/Debian/GNU) with ESMTP id BAA06937;
	Sat, 17 Feb 2001 01:10:04 +1100
Date: Sat, 17 Feb 2001 01:10:01 +1100 (EST)
From: Neale Banks <neale@lowendale.com.au>
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: iesg@ietf.org, http-delta@pa.dec.com
Subject: Re: Delta encoding in HTTP to Proposed Standard 
In-Reply-To: <200102140048.QAA29443@wera.pa.dec.com>
Message-Id: <Pine.LNX.4.05.10102170007270.6823-100000@marina.lowendale.com.au>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Jeff,

On Tue, 13 Feb 2001, Jeffrey Mogul wrote:

[...]
> Andrew, if there is something else I should cite instead, please
> let me know ASAP.

I'll defer to Andrew's and Martin's comments regarding approriate
references/citations.

[...]
> I too do not see anything in the rproxy-related pages that constitutes
> a "related proposal."  Therefore, it is hard to see how our failure
> to cite rproxy renders the HTTP Delta specification "critically
> incomplete."

My concern (which I apologise for not bringing up earlier) was limited to
my perception of a lack of completeness in referencing "related research".

I in no way meant to imply that I considered the HTTP Delta specification
itself to be incomplete.

Regards,
Neale.


From douglis@research.att.com  Wed Feb 28 07:28:51 2001
Return-Path: <douglis@research.att.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id HAA01155; Wed, 28 Feb 2001 07:28:51 -0800 (PST)
Received: from ztxmail02.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA18893; Wed, 28 Feb 2001 07:28:49 -0800
Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345)
	id 807AC1D73; Wed, 28 Feb 2001 09:28:49 -0600 (CST)
Received: from mail-green.research.att.com (H-135-207-30-103.research.att.com [135.207.30.103])
	by ztxmail02.ztx.compaq.com (Postfix) with ESMTP id EE63F1E2E
	for <http-delta@pa.dec.com>; Wed, 28 Feb 2001 09:28:48 -0600 (CST)
Received: from alliance.research.att.com (alliance.research.att.com [135.207.26.26])
	by mail-green.research.att.com (Postfix) with ESMTP
	id 8BB731E010; Wed, 28 Feb 2001 10:28:48 -0500 (EST)
Received: from douglux.research.att.com (root@douglux.research.att.com [135.207.26.106])
	by alliance.research.att.com (8.8.7/8.8.7) with ESMTP id KAA14970;
	Wed, 28 Feb 2001 10:28:45 -0500 (EST)
Received: from douglux.research.att.com (IDENT:douglis@localhost.localdomain [127.0.0.1])
	by douglux.research.att.com (8.9.3/8.9.3) with ESMTP id KAA29562;
	Wed, 28 Feb 2001 10:28:46 -0500
Message-Id: <200102281528.KAA29562@douglux.research.att.com>
X-Mailer: exmh version 2.1.1 10/15/1999
From: Fred Douglis <douglis@research.att.com>
To: "Clifford Heath" <cjh@osa.com.au>
Cc: http-delta@pa.dec.com, iesg@ietf.org
Subject: Re: delta-encoding in HTTP to proposed standard 
In-Reply-To: Your message of "12 Feb 2001 16:51:16 +1100."
             <20010212165116.1.11696.qmail@osa.com.au> 
X-Uri: http://www.research.att.com/~douglis/
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 28 Feb 2001 10:28:46 -0500
Sender: douglis@research.att.com

Clifford,

I am belatedly replying to your note to the http-delta list on the subject of
the applicability of a patent, which I mentioned earlier this month, to
draft-mogul-http-delta-07.txt.  I held off on replying while AT&T sorted out
the situation and made a formal pronouncement to the IESG on our licensing
stance.  I include it here since it didn't go to the deltas list initially.
And, I am copying the IESG because my explanation below applies to them as
well. (However, it is not a formal declaration for the IESG in the way the
forwarded message was.)


------- Forwarded Message

Date:    Fri, 23 Feb 2001 17:18:34 -0500
From:    tfrost@att.com
To:      iesg@ietf.org
cc:      douglis@att.com
Subject: Re: delta-encoding in HTTP to proposed standard

This declaration is being made pursuant to the provisions of IETF IPR
Policy, Sections 10.3.1 and 10.3.2.

This is to advise the IETF that AT&T believes it owns at least one patent
that may relate to Internet Draft document "draft-mogul-http-delta-07.txt",
including United States Patent No. 5,931,904.  To the extent that the
technology discussed in that Internet Draft becomes an IETF Standard and to
the extent claims of AT&T's patents are required to implement the IETF
Standard,  AT&T agrees that, upon written request, AT&T will offer, on a
nondiscriminatory basis, non-exclusive, royalty-free licenses under such
patent claims to implement that IETF Standard.  AT&T's willingness to grant
such licenses is conditioned upon the prospective licensee granting a
reciprocal license to AT&T under any patents that the prospective licensee
has to any technology required to implement that IETF Standard.  

Written requests for licenses may be sent to:

AT&T Intellectual Property Licensing
Room 2E37, Bldg. 104
180 Park Avenue
Florham Park, NJ 07932

------- End of Forwarded Message

The reason for the belated announcement of this intellectual property
claim was that we only recently realized that the patent mentioned
above was very broad and could encompass the proposed standard.  Having
realized this, we strove to make this information public prior to
the IESG last call deadline, and then make a public pronouncement of
our licensing stance as quickly as we could.  We hope that the grant
of a royalty-free license to those following the standard will put to
rest a question of either the timeliness of the announcement or the
specific impact of this intellectual property on the proposed
standard.

Finally, I want to again express my regrets to the IETF and IESG, the
other co-authors of the delta-encoding specification, and anyone who
may already be building systems based on the proposed standard, for
the tardiness of the announcement and the concern the original
announcement may have caused before the royalty-free terms were
publicly announced.

-- 
Fred Douglis <http://www.research.att.com/~douglis/>
-----------------------------------------------------------------
PGP Fingerprint: 83 B9 D6 7E 7F 78 8E BB  16 95 DE 69 1A 52 BC 82



From avh@marimba.com  Mon Mar  5 13:27:07 2001
Return-Path: <avh@marimba.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id NAA13410; Mon, 5 Mar 2001 13:27:07 -0800 (PST)
Received: from zmamail02.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA00722; Mon, 5 Mar 2001 13:27:06 -0800
Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345)
	id 3951E3600; Mon,  5 Mar 2001 16:27:06 -0500 (EST)
Received: from cobra.marimba.com (acheron.marimba.com [207.126.123.64])
	by zmamail02.zma.compaq.com (Postfix) with ESMTP id 304674C4F
	for <http-delta@pa.dec.com>; Mon,  5 Mar 2001 16:27:05 -0500 (EST)
Received: by cobra.marimba.com with Internet Mail Service (5.5.2653.19)
	id <13CRFXRZ>; Mon, 5 Mar 2001 13:27:04 -0800
Message-Id: <02414951E0406C47BF01477DDF8D443E197A37@cobra.marimba.com>
From: Arthur van Hoff <avh@marimba.com>
To: "'http-delta@pa.dec.com'" <http-delta@pa.dec.com>
Subject: FW: DRP and IETF WEBI Working Group
Date: Mon, 5 Mar 2001 13:25:32 -0800 
Mime-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain;
	charset="iso-8859-1"


Hi,

For those that are interested in distribution protocols. Here is
an interesting working group that is doing work related to caching,
Delta-encoding, and DRP.

Have fun,

        Arthur van Hoff

--

From: Mark Nottingham [mnot@akamai.com]
Subject: DRP and IETF WEBI Working Group

Arthur,

Although it's been some time since the publication of the DRP Note, I
thought you might be interested in the work of the IETF WEBI (Web
Intermediaries) Working Group.

One of our work items is to define a "Resource Update Protocol" which
is similar in many ways to DRP. Although the primary interest is
currently in allowing invalidations to be sent to caches, there are
other potential uses, which may be underrepresented in the work at
this stage.

If you (or anyone you know of either at Marimba or who worked on DRP)
are interested, we'd very much welcome input into the work.
Currently, we're gathering requirements, a first draft of which can
be found at:
  http://www.ietf.org/internet-drafts/draft-ietf-webi-rup-reqs-00.txt

If you have additional (or contrary) requirements based on your
experience with DRP, we'd love to have them. 

Our charter is at:
  http://www.ietf.org/html.charters/webi-charter.html

We'll be discussing the requirements during our meeting in the
Minneapolis IETF. After the requirements are finalized, we'll be
soliciting proposals to compare against the requirements; you might
want to consider putting DRP into the ring. 

Cheers,

-- 
Mark Nottingham, Research Scientist
Akamai Technologies (San Mateo, CA USA)

From mogul  Thu Oct 11 16:30:26 2001
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA29237; Thu, 11 Oct 2001 16:30:26 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200110112330.QAA29237@wera.pa.dec.com>
To: http-delta
Subject: FWD: Protocol Action: Delta encoding in HTTP to Proposed Standard
Date: Thu, 11 Oct 2001 16:30:26 -0700
X-Mts: smtp

On October 3 of last year, I asked the IESG to approve the "Delta
encoding in HTTP" specification as a "Proposed Standard".

On October 11, they approved it.

Unfortunately, it took them until October 11 of *this year* - the whole
process took more than a year.  I can't take responsibility for all of
the delay, although in hindsight I should have been a lot more
aggressive about bugging other people to get their jobs done.  (I do
have at least 124 archived email messages, mostly of the form "why is
this taking so long and do you want me to anything more?" or "why
haven't you answered my email for the past two months?")

We finally broke the log-jam by removing the requirement that an
implementation of the Delta Encoding specification SHOULD support, as a
default, the "vcdiff" format.  I decided that I had to do this because
the vcdiff specification seemed not to be making progress, and because
the Delta spec made this SHOULD-level reference to the vcdiff spec, the
IESG refused to act on the Delta spec until vcdiff was ready.

Our hope is that we can get vcdiff back on track very soon, and so by
the time that the Delta spec is ready to go to Draft Standard status,
vcdiff will also be ready for that.  Then we should be able to restore
the SHOULD-level reference that was deleted.

The other changes that have been made since last October are basically
cosmetic and/or procedural.

The Delta specification is now in the RFC Editor's queue.  I hope this
part won't take too long, although there are standards-track documents
that have been in this queue since February 2001.

The next IETF stage is a "Draft Standard".  From RFC2026:

   A specification from which at least two independent and interoperable
   implementations from different code bases have been developed, and
   for which sufficient successful operational experience has been
   obtained, may be elevated to the "Draft Standard" level.  For the
   purposes of this section, "interoperable" means to be functionally
   equivalent or interchangeable components of the system or process in
   which they are used.  If patented or otherwise controlled technology
   is required for implementation, the separate implementations must
   also have resulted from separate exercise of the licensing process.
   Elevation to Draft Standard is a major advance in status, indicating
   a strong belief that the specification is mature and will be useful.

So the next step for our group is to find people to do two independent
implementations of (at least) the major features of the Delta spec.
Or, if any of you have already done implementations that match
our current design, please let me know; we will need to document
that they interoperate.

Thanks for your patience.
-Jeff

------- Forwarded Message

Date:    Thu, 11 Oct 2001 17:36:12 -0400
From:    The IESG <iesg-secretary@ietf.org>
To:      IETF-Announce: ;
cc:      RFC Editor <rfc-editor@isi.edu>, IANA <iana@iana.org>,
	 Internet Architecture Board <iab@isi.edu>
Subject: Protocol Action: Delta encoding in HTTP to Proposed Standard

The IESG has approved the Internet-Draft 'Delta encoding in HTTP'
<draft-mogul-http-delta-10.txt> as a Proposed Standard.  This has been
reviewed in the IETF but is not the product of an IETF Working Group.
The IESG contact persons are Patrik Faltstrom and Ned Freed.

Technical Summary
 
The document specify a way for an HTTP server and client to
negotiate sending only the changed versions of a requested
instance of a resource over a HTTP connection. This is
especially interesting when a cache already have a version
of the instance, and finds that the instance have changed.

Research have shown that changes only in small parts of
instances of resources are frequent, so the ability to only
send changes would speed up the transactions.


Working Group Summary

The document is an individual submission to the IETF, but
the specification have been discussed on the mailing list
<http-delta@pa.dec.com> during the development of the
document.

It is noted that AT&T has filed an IPR note about this
document. See http://www.ietf.org/ietf/IPR/AT&T-MOGUL-HTTP-DELTA.
The IPR note was filed before the Last Call ended, and no
concernes were rised from the community in regards of this
IPR notice.


Protocol Quality

The protocol was reviewed for the IESG by Patrik Faltstrom.


IANA Considerations:

Section 10.2 specifies the creation of a new registry
for instance-manipulation values.


RFC-Editor note:

Please delete section 13 at the time of publication.

Author has asked for review of References Section (15)
at time of publication. Suggestion is that this be handled
via 48 hour notice.

------- End of Forwarded Message


From jmacd@helen.cs.berkeley.edu  Fri Oct 12 01:33:12 2001
Return-Path: <jmacd@helen.cs.berkeley.edu>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id BAA10910; Fri, 12 Oct 2001 01:33:12 -0700 (PDT)
Received: from mailrelay01.cac.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA01240; Fri, 12 Oct 2001 01:33:08 -0700
Received: by mailrelay01.cac.cpqcorp.net (Postfix, from userid 12345)
	id 8282DB26; Fri, 12 Oct 2001 01:33:08 -0700 (PDT)
Received: from ztxmail02.ztx.compaq.com (ztxmail02.nz-cce.cpqcorp.net [161.114.8.206])
	by mailrelay01.cac.cpqcorp.net (Postfix) with ESMTP
	id 60E01BAC; Fri, 12 Oct 2001 01:33:08 -0700 (PDT)
Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345)
	id DB66B4359; Fri, 12 Oct 2001 03:33:07 -0500 (CDT)
Received: from helen.CS.Berkeley.EDU (helen.CS.Berkeley.EDU [128.32.131.251])
	by ztxmail02.ztx.compaq.com (Postfix) with ESMTP
	id 674A941A8; Fri, 12 Oct 2001 03:33:07 -0500 (CDT)
Received: (from jmacd@localhost)
	by helen.CS.Berkeley.EDU (8.9.1a/8.9.1) id BAA03930;
	Fri, 12 Oct 2001 01:33:06 -0700 (PDT)
Date: Fri, 12 Oct 2001 01:33:06 -0700
From: Josh MacDonald <jmacd@CS.Berkeley.EDU>
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: http-delta@pa.dec.com, mihut@cs.berkeley.edu
Subject: Re: FWD: Protocol Action: Delta encoding in HTTP to Proposed Standard
Message-Id: <20011012013306.A3901@helen.CS.Berkeley.EDU>
References: <200110112330.QAA29237@wera.pa.dec.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200110112330.QAA29237@wera.pa.dec.com>; from mogul@pa.dec.com on Thu, Oct 11, 2001 at 04:30:26PM -0700

Quoting Jeffrey Mogul (mogul@pa.dec.com):
> 
> So the next step for our group is to find people to do two independent
> implementations of (at least) the major features of the Delta spec.
> Or, if any of you have already done implementations that match
> our current design, please let me know; we will need to document
> that they interoperate.

Our Xdelta/Xproxy prototype is a fairly close match to the current
design, and it works.  Mihut can provide more specific comments on 
the state of our protocol as compared to the draft.  Our main issue 
has been with vcdiff.  I have personally read the vcdiff draft several 
times and still I am not comfortable with the level of complexity.  

The current Xdelta encoding is quite simple to explain, although it 
is not well suited as a standard either--the encoder/decorder are 
automatically generated code right now.  Delta encoding is still a 
sore point for us, and I would like to replace the existing code.
Have you considered the old W3C/Marimba GDIFF encoding?  At least
it is easy to implement.

Do we know of any independent vcdiff implementations?

I think if the vcdiff proposal is to succeed it needs another author
who has been successful to go through the draft and really improve
the description.  Currently I find it difficult to make sense of.

For anyone interested in Xdelta/Xproxy:

	http://prdownloads.sourceforge.net/xdelta/xdelta-2.0-beta9.tar.gz

-josh

From mihut@eecs.berkeley.edu  Fri Oct 12 11:14:47 2001
Return-Path: <mihut@eecs.berkeley.edu>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA31691; Fri, 12 Oct 2001 11:14:47 -0700 (PDT)
Received: from mailrelay01.cac.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA11712; Fri, 12 Oct 2001 11:14:46 -0700
Received: by mailrelay01.cac.cpqcorp.net (Postfix, from userid 12345)
	id D81C2987; Fri, 12 Oct 2001 11:14:46 -0700 (PDT)
Received: from ztxmail02.ztx.compaq.com (ztxmail02.nz-cce.cpqcorp.net [161.114.8.206])
	by mailrelay01.cac.cpqcorp.net (Postfix) with ESMTP id 1EE85AF9
	for <http-delta@pa.dec.com>; Fri, 12 Oct 2001 11:14:46 -0700 (PDT)
Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345)
	id 8200E42C7; Fri, 12 Oct 2001 13:14:45 -0500 (CDT)
Received: from relay.EECS.Berkeley.EDU (relay.EECS.Berkeley.EDU [169.229.34.228])
	by ztxmail02.ztx.compaq.com (Postfix) with ESMTP id 3A3D54115
	for <http-delta@pa.dec.com>; Fri, 12 Oct 2001 13:14:45 -0500 (CDT)
Received: from EECS.Berkeley.EDU (mihut@argus.EECS.Berkeley.EDU [169.229.60.79])
	by relay.EECS.Berkeley.EDU (8.9.3/8.9.3) with ESMTP id LAA24437
	for <http-delta@pa.dec.com>; Fri, 12 Oct 2001 11:14:44 -0700 (PDT)
Received: from localhost (mihut@localhost)
	by EECS.Berkeley.EDU (8.9.3/8.9.3) with ESMTP id LAA09206
	for <http-delta@pa.dec.com>; Fri, 12 Oct 2001 11:14:40 -0700 (PDT)
X-Authentication-Warning: argus.EECS.Berkeley.EDU: mihut owned process doing -bs
Date: Fri, 12 Oct 2001 11:14:40 -0700 (PDT)
From: Mihut Ionescu <mihut@EECS.Berkeley.EDU>
To: <http-delta@pa.dec.com>
Subject: xProxy, an implementation of delta encoding in HTTP
In-Reply-To: <F66u3C4G2aTMxSpB28s000030df@hotmail.com>
Message-Id: <Pine.SOL.4.30.0110121048580.8977-100000@argus.EECS.Berkeley.EDU>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

xProxy is an HTTP dual-proxy system which implements delta encoding and
compression to increase the performance of web traffic.  A paper
describing the architecture, implementation and performance evaluation of
xProxy can be found at:

http://www.cs.berkeley.edu/~mihut/xproxy-ms.pdf

xProxy has been deployed at UC Berkeley.  The paper evaluates the
total bandwidth savings for all web resource types realized by using
xProxy in "real life", as well as reduction in modem retrieval times for
HTML pages.  Moreoever, the paper provides detailed (comparative)
information on the benefits of compression and delta encoding, and gives
insight into how the size of deltas changes over time.  The analysis is
done in the context of both static and dynamic web content, and gives
insight into the set of features that should be supported by a
delta/compression enabled HTTP system that is to provide the maximum
bandwidth savings with the least amount of computational overhead.

xProxy is compatible with the protocol proposed by IETF, with the
exception that it identifies versions based on their MD5's (using the
"Delta-Base" header to indicate the client proxy version).  However, this
can be (easily) changed so that versions are identified based on entity
tags.  The current implementation supports the major features of the delta
encoding specification, although others can be added if needed.

I will provide later a detailed list of the protocol features supported by
xProxy.  Let me know if you have any suggestions.

Mihut





From bala@research.att.com  Fri Oct 12 11:28:15 2001
Return-Path: <bala@research.att.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA11204; Fri, 12 Oct 2001 11:28:15 -0700 (PDT)
Received: from taynzmail03.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA06024; Fri, 12 Oct 2001 11:28:14 -0700
Received: by taynzmail03.nz-tay.cpqcorp.net (Postfix, from userid 12345)
	id 346C2523; Fri, 12 Oct 2001 14:28:14 -0400 (EDT)
Received: from zmamail01.zma.compaq.com (zmamail01.nz-tay.cpqcorp.net [161.114.72.101])
	by taynzmail03.nz-tay.cpqcorp.net (Postfix) with ESMTP id 2EEEE57F
	for <http-delta@pa.dec.com>; Fri, 12 Oct 2001 14:28:14 -0400 (EDT)
Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345)
	id F1C981578; Fri, 12 Oct 2001 14:28:13 -0400 (EDT)
Received: from mail-green.research.att.com (H-135-207-30-103.research.att.com [135.207.30.103])
	by zmamail01.zma.compaq.com (Postfix) with ESMTP id D5AA31531
	for <http-delta@pa.dec.com>; Fri, 12 Oct 2001 14:28:13 -0400 (EDT)
Received: from raptor.research.att.com (raptor.research.att.com [135.207.23.32])
	by mail-green.research.att.com (Postfix) with ESMTP id 847251E0A7
	for <http-delta@pa.dec.com>; Fri, 12 Oct 2001 14:28:10 -0400 (EDT)
Received: from localhost (bala@localhost)
	by raptor.research.att.com (SGI-8.9.3/8.8.7) with SMTP id OAA70183
	for <http-delta@pa.dec.com>; Fri, 12 Oct 2001 14:28:10 -0400 (EDT)
Message-Id: <200110121828.OAA70183@raptor.research.att.com>
X-Authentication-Warning: raptor.research.att.com: bala@localhost didn't use HELO protocol
To: http-delta@pa.dec.com
Subject: xproxy
Date: Fri, 12 Oct 2001 14:28:10 -0400
From: Balachander Krishnamurthy <bala@research.att.com>

the url should be http://www.cs.berkeley.edu/~mihut/xproxy/xproxy-ms.pdf 

and not http://www.cs.berkeley.edu/~mihut/xproxy-ms.pdf as stated in the mail

From mogul@pa.dec.com  Fri Oct 12 16:38:53 2001
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA06127; Fri, 12 Oct 2001 16:38:53 -0700 (PDT)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA12538; Fri, 12 Oct 2001 16:38:53 -0700
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA29028; Fri, 12 Oct 2001 16:38:52 -0700 (PDT)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200110122338.QAA29028@wera.pa.dec.com>
To: Josh MacDonald <jmacd@CS.Berkeley.EDU>
Cc: http-delta@pa.dec.com
Subject: Re: FWD: Protocol Action: Delta encoding in HTTP to Proposed Standard 
In-Reply-To: Your message of "Fri, 12 Oct 2001 01:33:06 PDT."
             <20011012013306.A3901@helen.CS.Berkeley.EDU> 
Date: Fri, 12 Oct 2001 16:38:52 -0700
X-Mts: smtp

Josh MacDonald wrote:

    Our Xdelta/Xproxy prototype is a fairly close match to the current
    design, and it works.  Mihut can provide more specific comments on 
    the state of our protocol as compared to the draft.  Our main issue 
    has been with vcdiff.  I have personally read the vcdiff draft several 
    times and still I am not comfortable with the level of complexity.  
    
Several comments:
(1) Vcdiff is no longer a SHOULD-level requirement of the Delta
spec, so that shouldn't currently be an issue.  (We hope to restore
this requirement later on, though).

(2) Can you distinguish between your Xdelta/Xproxy implementation
from the protocol that it implements?  The IETF will require us
to show implementations of the actual Delta protocol (as specified
in the soon-to-appear RFC), not something "fairly close".  I would
hope that your code could be modified fairly easily, though.

    The current Xdelta encoding is quite simple to explain, although it 
    is not well suited as a standard either--the encoder/decorder are 
    automatically generated code right now.  Delta encoding is still a 
    sore point for us, and I would like to replace the existing code.
    Have you considered the old W3C/Marimba GDIFF encoding?  At least
    it is easy to implement.
    
"gdiff" is indeed already included in the soon-to-be-created
IANA registry.  In the current version of the Delta spec, it has
equal status with vcdiff.  So it might be a better format for
initial implementations.  (Not necessarily "better" in the
sense of efficiency, just in the sense of making it easier to
start hacking on the Delta protocol itself!)

    I think if the vcdiff proposal is to succeed it needs another author
    who has been successful to go through the draft and really improve
    the description.  Currently I find it difficult to make sense of.

Phong Vo just submitted a revised version,

	http://www.ietf.org/internet-drafts/draft-korn-vcdiff-05.txt

which was announced today on the IETF-Annouce list.  This draft
specifies exactly the same format as previous drafts, but we have
edited it a lot, both in the hope of making it clearer, and also
to remove the need to understand the C code.  There is still some
C code in the draft, but it is only there for clarification purposes.

Phong and I have already started working on a few minor clarifications
for an -06 version of this, but nothing major.

    Do we know of any independent vcdiff implementations?
    
This is something that I am about to start working on.  Since I
helped Phong edit the latest draft, neither one of us is a good
candidate for doing an "independent" implementation.  If anyone
on this list would like to try to do an implementation, that
would be wonderful, otherwise I will try to con one of my
colleagues into it.

Note that at this point, we basically need to find someone to
write a vcdiff *decoder*, which should be fairly simple.  And it
can be as dumb as possible, there is no need at this point to do
something very fast.  This would be sufficient (I hope) to
demonstrate that the spec is written clearly enough.

Later on, we should also find someone to write an independent
implementation of an encoder.  However, this is a trickier
problem, because the encoder has a lot more freedom of action
than the decoder.  One could write an "encoder" that was very
simple, but it might not generate a particularly compact
encoding.

AT&T has agreed to make the existing encoder source code
available "for anyone to use to transmit data via HTTP/1.1
Delta Encoding," so as a practical matter, a high-quality
encoder is already available.  But an independent implementation
is going to be required before we can get Draft Standard status.

-Jeff

From avh@marimba.com  Fri Oct 12 16:42:55 2001
Return-Path: <avh@marimba.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA13758; Fri, 12 Oct 2001 16:42:55 -0700 (PDT)
Received: from mailrelay01.cac.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA18833; Fri, 12 Oct 2001 16:42:55 -0700
Received: by mailrelay01.cac.cpqcorp.net (Postfix, from userid 12345)
	id 62EE1BAD; Fri, 12 Oct 2001 16:42:55 -0700 (PDT)
Received: from zmamail02.zma.compaq.com (zmamail02.nz-tay.cpqcorp.net [161.114.72.102])
	by mailrelay01.cac.cpqcorp.net (Postfix) with ESMTP
	id 2856C83A; Fri, 12 Oct 2001 16:42:55 -0700 (PDT)
Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345)
	id B16284E42; Fri, 12 Oct 2001 19:42:54 -0400 (EDT)
Received: from cobra.marimba.com (unknown [207.126.123.66])
	by zmamail02.zma.compaq.com (Postfix) with ESMTP
	id 194EE4DE7; Fri, 12 Oct 2001 19:42:54 -0400 (EDT)
Received: by cobra.marimba.com with Internet Mail Service (5.5.2653.19)
	id <SG3MV5GS>; Fri, 12 Oct 2001 16:42:53 -0700
Message-Id: <02414951E0406C47BF01477DDF8D443E19800F@cobra.marimba.com>
From: Arthur van Hoff <avh@marimba.com>
To: "'Jeffrey Mogul'" <mogul@pa.dec.com>,
        Josh MacDonald <jmacd@CS.Berkeley.EDU>
Cc: http-delta@pa.dec.com, Jonathan Payne <jpayne@marimba.com>
Subject: RE: FWD: Protocol Action: Delta encoding in HTTP to Proposed Stan
	dard 
Date: Fri, 12 Oct 2001 16:42:03 -0700
Mime-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain;
	charset="iso-8859-1"


Hi Jeff,

We have an vcdiff prototype implementation, but we dont use it in any of
our products. The format is easy enough to implement but constructing
a good enough algorithm that works for various file sizes has proven
much harder. We still use compressed gdiff in our products.

Have fun,

    Arthur van Hoff
  

> -----Original Message-----
> From: Jeffrey Mogul [mailto:mogul@pa.dec.com]
> Sent: Friday, October 12, 2001 4:39 PM
> To: Josh MacDonald
> Cc: http-delta@pa.dec.com
> Subject: Re: FWD: Protocol Action: Delta encoding in HTTP to Proposed
> Standard 
> 
> 
> Josh MacDonald wrote:
> 
>     Our Xdelta/Xproxy prototype is a fairly close match to the current
>     design, and it works.  Mihut can provide more specific 
> comments on 
>     the state of our protocol as compared to the draft.  Our 
> main issue 
>     has been with vcdiff.  I have personally read the vcdiff 
> draft several 
>     times and still I am not comfortable with the level of 
> complexity.  
>     
> Several comments:
> (1) Vcdiff is no longer a SHOULD-level requirement of the Delta
> spec, so that shouldn't currently be an issue.  (We hope to restore
> this requirement later on, though).
> 
> (2) Can you distinguish between your Xdelta/Xproxy implementation
> from the protocol that it implements?  The IETF will require us
> to show implementations of the actual Delta protocol (as specified
> in the soon-to-appear RFC), not something "fairly close".  I would
> hope that your code could be modified fairly easily, though.
> 
>     The current Xdelta encoding is quite simple to explain, 
> although it 
>     is not well suited as a standard either--the encoder/decorder are 
>     automatically generated code right now.  Delta encoding 
> is still a 
>     sore point for us, and I would like to replace the existing code.
>     Have you considered the old W3C/Marimba GDIFF encoding?  At least
>     it is easy to implement.
>     
> "gdiff" is indeed already included in the soon-to-be-created
> IANA registry.  In the current version of the Delta spec, it has
> equal status with vcdiff.  So it might be a better format for
> initial implementations.  (Not necessarily "better" in the
> sense of efficiency, just in the sense of making it easier to
> start hacking on the Delta protocol itself!)
> 
>     I think if the vcdiff proposal is to succeed it needs 
> another author
>     who has been successful to go through the draft and really improve
>     the description.  Currently I find it difficult to make sense of.
> 
> Phong Vo just submitted a revised version,
> 
> 	http://www.ietf.org/internet-drafts/draft-korn-vcdiff-05.txt
> 
> which was announced today on the IETF-Annouce list.  This draft
> specifies exactly the same format as previous drafts, but we have
> edited it a lot, both in the hope of making it clearer, and also
> to remove the need to understand the C code.  There is still some
> C code in the draft, but it is only there for clarification purposes.
> 
> Phong and I have already started working on a few minor clarifications
> for an -06 version of this, but nothing major.
> 
>     Do we know of any independent vcdiff implementations?
>     
> This is something that I am about to start working on.  Since I
> helped Phong edit the latest draft, neither one of us is a good
> candidate for doing an "independent" implementation.  If anyone
> on this list would like to try to do an implementation, that
> would be wonderful, otherwise I will try to con one of my
> colleagues into it.
> 
> Note that at this point, we basically need to find someone to
> write a vcdiff *decoder*, which should be fairly simple.  And it
> can be as dumb as possible, there is no need at this point to do
> something very fast.  This would be sufficient (I hope) to
> demonstrate that the spec is written clearly enough.
> 
> Later on, we should also find someone to write an independent
> implementation of an encoder.  However, this is a trickier
> problem, because the encoder has a lot more freedom of action
> than the decoder.  One could write an "encoder" that was very
> simple, but it might not generate a particularly compact
> encoding.
> 
> AT&T has agreed to make the existing encoder source code
> available "for anyone to use to transmit data via HTTP/1.1
> Delta Encoding," so as a practical matter, a high-quality
> encoder is already available.  But an independent implementation
> is going to be required before we can get Draft Standard status.
> 
> -Jeff
> 

From mogul  Fri Oct 12 16:51:05 2001
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA00382; Fri, 12 Oct 2001 16:51:04 -0700 (PDT)
From: Jeffrey Mogul <mogul>
Message-Id: <200110122351.QAA00382@wera.pa.dec.com>
To: <danielh@crosslink.net>
cc: http-delta
Subject: Re: FWD: Protocol Action: Delta encoding in HTTP to Proposed Standard 
In-reply-to: Your message of "Fri, 12 Oct 2001 15:42:23 EDT."
             <20011012194438.7B575C8B5@ztxmail01.ztx.compaq.com> 
Date: Fri, 12 Oct 2001 16:51:04 -0700
X-Mts: smtp

danielh@crosslink.net writes:

    One favor -- could you post the url to the latest version of the delta
    encoding document.
    
Sure (for some odd reason, the IETF announcements of "Protocol Actions"
don't include URLs!):

	http://www.ietf.org/internet-drafts/draft-mogul-http-delta-10.txt

-Jeff

	

From mihut@eecs.berkeley.edu  Mon Oct 15 18:53:58 2001
Return-Path: <mihut@eecs.berkeley.edu>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA13274; Mon, 15 Oct 2001 18:53:58 -0700 (PDT)
Received: from mailrelay01.cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA01155; Mon, 15 Oct 2001 18:53:58 -0700
Received: by mailrelay01.cce.cpqcorp.net (Postfix, from userid 12345)
	id 1786C58A; Mon, 15 Oct 2001 20:53:58 -0500 (CDT)
Received: from ztxmail01.ztx.compaq.com (ztxmail01.nz-cce.cpqcorp.net [161.114.8.205])
	by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id 069DB41E
	for <http-delta@pa.dec.com>; Mon, 15 Oct 2001 20:53:58 -0500 (CDT)
Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345)
	id D6170CB5A; Mon, 15 Oct 2001 20:53:55 -0500 (CDT)
Received: from relay.EECS.Berkeley.EDU (relay.EECS.Berkeley.EDU [169.229.34.228])
	by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id 768FDC8B4
	for <http-delta@pa.dec.com>; Mon, 15 Oct 2001 20:53:55 -0500 (CDT)
Received: from EECS.Berkeley.EDU (mihut@argus.EECS.Berkeley.EDU [169.229.60.79])
	by relay.EECS.Berkeley.EDU (8.9.3/8.9.3) with ESMTP id SAA13293
	for <http-delta@pa.dec.com>; Mon, 15 Oct 2001 18:53:54 -0700 (PDT)
Received: from localhost (mihut@localhost)
	by EECS.Berkeley.EDU (8.9.3/8.9.3) with ESMTP id SAA01845
	for <http-delta@pa.dec.com>; Mon, 15 Oct 2001 18:53:52 -0700 (PDT)
X-Authentication-Warning: argus.EECS.Berkeley.EDU: mihut owned process doing -bs
Date: Mon, 15 Oct 2001 18:53:52 -0700 (PDT)
From: Mihut Ionescu <mihut@EECS.Berkeley.EDU>
To: <http-delta@pa.dec.com>
Subject: xProxy features
Message-Id: <Pine.SOL.4.30.0110151838010.1493-100000@argus.EECS.Berkeley.EDU>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

xProxy supports the GET and POST methods, most of the relevant HTTP/1.1
headers and cache directives, as well as entity tags and cookies.  The
current implementation "clusters" URLs that execute the same CGI program
but have a different CGI query string (versions are indexed by the prefix
string up to the '?' character).

xProxy supports (only) xdelta for delta encoding and gzip for compression.
It is compatible with the IETF proposed protocol, with the exception that
it identifies versions based on their MD5's (using the "Delta-Base"
header to indicate the client proxy version).  This can be (easily)
changed to identify versions based on entity tags.  xProxy supports the
major features of the IETF specification, with the following exceptions:

* No support for (deltas on) byte ranges.
* No support for the cache directive "retain" ...  XDFS (xDelta File
System), the versioned cache used, did not support deletions when xProxy
was implemented.
* Client proxy specifies in the request only the most recent version it
holds.  It should be fairly easy to modify the client proxy to
indicate multiple versions in the request and add the necessary logic in
the server proxy.

Therefore, version identification based on etag and indication of multiple
versions in a request should complete the xProxy implementation.  Let me
know if you have any questions or suggestions.

Mihut



From mogul  Mon Nov 19 18:13:45 2001
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id SAA01819; Mon, 19 Nov 2001 18:13:45 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <200111200213.SAA01819@wera.pa.dec.com>
To: http-delta
Subject: FWD: Protocol Action: Instance Digests in HTTP to Proposed Standard 
Date: Mon, 19 Nov 2001 18:13:45 -0800
X-Mts: smtp

On October 3 of last year, I asked the IESG to approve the "Instance
Digests in HTTP" specification as a "Proposed Standard".

Today, they finally approved it.

This one should have been approved the same day as the HTTP
Delta Encoding proposal (October 11, 2001) but the Area Director
forgot to put it on the IESG agenda, so it languished for a while,
until I thought to ask why nothing had happened.

-Jeff

    From: iesg-secretary@ietf.org (The IESG)
    Subject: Protocol Action: Instance Digests in HTTP to Proposed Standard
    Date: Mon, 19 Nov 2001 23:23:35 +0000 (UTC)
    Message-ID: <200111192255.RAA01891@ietf.org>
    Cc: RFC Editor <rfc-editor@isi.edu>, IANA <iana@iana.org>,
	   Internet Architecture Board <iab@isi.edu>
    To: IETF-Announce
    
    The IESG has approved the Internet-Draft 'Instance Digests in HTTP'
    <draft-mogul-http-digest-05.txt> as a Proposed Standard.  This has been
    reviewed in the IETF but is not the product of an IETF Working Group.
    The IESG contact persons are Patrik Faltstrom and Ned Freed.
     
    Technical Summary
     
    HTTP/1.1 defines a Content-MD5 header that allows a server
    to include a digest of the response body.  However, this is
    specifically defined to cover the body of the actual
    message, not the contents of the full file (which might be
    quite different, if the response is a Content-Range, or
    uses a delta encoding).  Also, the Content-MD5 is limited
    to one specific digest algorithm; other algorithms, such as
    SHA-1, may be more appropriate in some circumstances.
    Finally, HTTP/1.1 provides no explicit mechanism by which a
    client may request a digest.  This document proposes HTTP
    extensions that solve these problems.
    
    Working Group Summary
    
    This is an individual submission to the IETF, but, the
    document has been discussed on various mailing lists
    which have to do with the HTTP protocol.
    
    Protocol Quality
    
    The spec was reviewed by Patrik Faltstrom.

    

From remove@eif.net  Tue Jan  1 09:28:01 2002
Return-Path: <remove@eif.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id JAA22887; Tue, 1 Jan 2002 09:28:00 -0800 (PST)
Received: from taynzmail03.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA06404; Tue, 1 Jan 2002 09:28:00 -0800
Received: by taynzmail03.nz-tay.cpqcorp.net (Postfix, from userid 12345)
	id 17685FE8; Tue,  1 Jan 2002 12:28:00 -0500 (EST)
Received: from ztxmail02.ztx.compaq.com (ztxmail02.nz-cce.cpqcorp.net [161.114.8.206])
	by taynzmail03.nz-tay.cpqcorp.net (Postfix) with ESMTP id DE4B8C20
	for <http-delta@pa.dec.com>; Tue,  1 Jan 2002 12:27:59 -0500 (EST)
Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345)
	id 7FEA433AB; Tue,  1 Jan 2002 11:27:59 -0600 (CST)
Received: from ruby.he.net (ruby.he.net [216.218.187.2])
	by ztxmail02.ztx.compaq.com (Postfix) with ESMTP id 3230D30E3
	for <http-delta@pa.dec.com>; Tue,  1 Jan 2002 11:27:59 -0600 (CST)
Received: from eif.net ([212.161.14.187] (may be forged)) by ruby.he.net (8.8.6/8.8.2) with SMTP id JAA06058; Tue, 1 Jan 2002 09:27:40 -0800
Message-Id: <200201011727.JAA06058@ruby.he.net>
From: "HAPPY NEW YEAR FROM EIF" <remove@eif.net>
To: <hsosik@whoi.edu>
Subject: NEW YEAR EIF OFFER + CHASE OFFER
Sender: "HAPPY NEW YEAR FROM EIF" <remove@eif.net>
Mime-Version: 1.0
Content-Type: text/html; charset="ISO-8859-1"
Date: Tue, 1 Jan 2002 17:23:59 -0800
X-Priority: 1 (Highest)
Content-Transfer-Encoding: 8bit

<html>
<a href="http://www.eif.net" >Eif Security Solutions and Rapid Traffic Search Optimization</a>
<p>WISHING YOU ALL A VERY HAPPY AND PROSPEROUS NEW YEAR!
<p>FREE PC FIREWALL AND ANTIVIRUS TO ALL THE HUMAN BEINGS CONTACTING US!
<p>THANKS
<p>I TAKE THIS OPPORTUNITY TO TAKE YOU THE MESSAGE FOR THE END OF THE YEAR OF THE PRESIDENT:
<P>'THE ALMOST BEST FUTURE CAN BE MADE BETTER' G CRASTI PRESIDENT HYKSOS GROUP 
<p>Rob
<p(PROVISIONAL ADDED PR HYKSOS GROUP)
<p>Tel + 39 32 00 25 80 44
<p>Fax + 1 212 656 1546
<p>
<p><IMG border="0"  width="66" height="66"   alt="Rob Photo" src="http://www.eif.net/chat/4.jpg"
<p>
<p><a href="www.eif.net" > www.eif.net</a>


<p><a href="http://click.linksynergy.com/fs-bin/click?id=ZTXlct8Csco&offerid=31083.10000002&type=3&subid=0" >Apply Now for the Chase Platinum Credit Card</a><IMG border=0 width=1 height=1
src="http://ad.linksynergy.com/fs-bin/show?id=ZTXlct8Csco&bids=31083.10000002&type=3&subid=0" >		

<a href="<a href="http://click.linksynergy.com/fs-bin/click?id=ZTXlct8Csco&offerid=31083.10000092&subid=0&type=4"><IMG border="0"  width="468" height="60"   alt="iCard Holiday Rewards_468"
src="http://ad.linksynergy.com/fs-bin/show?id=ZTXlct8Csco&bids=31083.10000092&subid=0&type=4"></a>			
<a href="http://click.linksynergy.com/fs-bin/click?id=ZTXlct8Csco&offerid=31083.10000041&subid=0&type=4"><IMG border="0"  width="468" height="60"   alt="Shop Safely_468X60"
src="http://ad.linksynergy.com/fs-bin/show?id=ZTXlct8Csco&bids=31083.10000041&subid=0&type=4"></a>		
<a href="http://click.linksynergy.com/fs-bin/click?id=ZTXlct8Csco&offerid=31083.10000108&subid=0&type=4"><IMG border="0"  width="468" height="60"   alt="Outtatown 468x60"
src="http://ad.linksynergy.com/fs-bin/show?id=ZTXlct8Csco&bids=31083.10000108&subid=0&type=4"></a>	
<a href="http://click.linksynergy.com/fs-bin/click?id=ZTXlct8Csco&offerid=31083.10000106&subid=0&type=4"><IMG border="0"  width="468" height="60"   alt="Platinum Lollipops 468x60"
src="http://ad.linksynergy.com/fs-bin/show?id=ZTXlct8Csco&bids=31083.10000106&subid=0&type=4"></a>		
<p>NOTE: If you have asked to be removed from our mailing list, and are continuing to receive our emails, please send us the names of any older or alias email addresses. These sometimes are forwarded to the new mail address
and we must delete these older or alias addresses from our list in order to stop mail from reaching your current address. We apologize for any inconvenience and appreciate your continued patience and
cooperation.Your address is Opt in under G law.
<p> If you require I will send you $ 6 + apologies having your address.
<p><a href="mailto:remove@eif.net">remove@eif.net</a>
</html>	

From mogul  Tue Jan  8 11:34:09 2002
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA21860; Tue, 8 Jan 2002 11:34:08 -0800 (PST)
Message-Id: <200201081934.LAA21860@wera.pa.dec.com>
From: <danielh@crosslink.net>
To: http-delta
X-Original-Date: Sat, 29 Dec 2001 23:12:18 -0500
Reply-To: danielh@crosslink.net
Subject: Dcluster comment
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.25/25 
Date: Tue, 08 Jan 2002 11:34:08 -0800
Sender: mogul
X-Mts: smtp


Per usual, I've been sidetracked and have not yet started implementing
Delta (much less dcluster). But it still near the front burner (part of
what I got sidetracked onto is writing a CRON-like task manager that I
will use in my eventual implementation of delta).


BTW: happy new year!

===============================================================
29 Dec 2001

Having (re)read the dcluster-00 draft, and having serendipitously reviewed
some earlier comments I had on earlier version of this draft (25 Sept 2000
expiration), there is one important issue that  needs to be addressed. 
Other points, including editorial comments, are partially related to
clarifying this issue.

The issue is whether an instance must be "explicitily" incorporated  into
a Dcluster, or whether this can be "implicit".  That is, if a  Dcluster is
not provided in a response, can it only be used as a delta-base  in future
requests for the same request-uri?

BTW: It would be nice if there was a term that meant "all the request-uris
     that match a Dcluster" and "all the instances, both their contents
and
     their Etags, that are members of a Dcluster". 
     Uniqueness scope isn't quite it.

Consider the case where a first request generates a response
with a Dcluster instance header. For example:
      GET /foo?p=1 HTTP/1.1
      Host: bar.example.net
      A-IM: vcdiff
   yielding:
      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 2001 08:49:37 GMT
      Etag: "abc"
      DCluster: "//bar.example.net/foo?"

Suppose a later request of:
      GET /foo?r=1 HTTP/1.1
      Host: bar.example.net
      If-None-Match: "abc"
      A-IM: vcdiff
yields:
      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 2001 08:49:37 GMT
      Etag: "abc"
      IM: vcdiff

Note that this response does NOT have a Dcluster instance header.

The question is what instances can be used as delta bases
in an even later request for "foo?s=1".

There are two possibilities:
 i) Explicit assignations only:
      GET /foo?s=1 HTTP/1.1
      Host: bar.example.net
      If-None-Match: "abc"
      A-IM: vcdiff

   Here, "abc" is used because the first response,
   (which has an "abc" etag) explicitily contains a Dcluster whose
   value is an abbreviaton for the request-uri (for /foo?s=1).

 ii) Implicit assignations also:
      GET /foo?s=1 HTTP/1.1
      Host: bar.example.net
      If-None-Match: "abc","def"
      A-IM: vcdiff
      
 Here, "def" is also used because:
    a) the "def" instance was received after recieving the
       "//bar.example.net/foo?"  Dcluster definition, 

    and

    b) "//bar.example.net/foo?" matches /foo?s=1.
 

The first interpretation is much more restrictive. It means that an
instance can only be used as a delta-base for future request-uris that:   
   i) are the same as the request-uri that generated the instance
  ii) match the Dcluster defined (in a Dcluster instance header)
      with this instance

The second interpretation implies both of the above. In addition,  any
future instance, that is returned as a response to a request-uri that
matches this DCluster, can be used as a delta-base to any other
request-uri that also matches this Dcluster (I abstract from some timing
conditions).

In a sense, the first interpretation is a one to many relationship --
one instance can be associated with many request-uris that match one (or
perhaps several) Dclusters. The fact that many instances may have been
returned with the same Dcluster definition(s), loosens  but does not
fundamentally change this one-to-many relationship.

The second interpretation defines a many-to-many relationship -- 
all instances whose request-uris match a given Dcluster can be used
as delta-bases for any other request-uri that matches this
given Dcluster.  

These two interpretations have different strengths and weaknesses:

 1a) Explicit (one-to-many) advantages
   i) The server has fine grained control of what the client ought to
      consider using as delta-bases for future request-uris.  
  ii) Dclusters can be terminated, simply by expiring all instances
      that include this Dcluster (say, by using the Retain token
      of Cache-Control)

 1b) Implicit (many-to-many) advantages:

    i) The server can easily define broad Dclusters, with just one
       Dcluster header
   ii) With broad Dclusters, the client has great range of
       delta-bases to choose from.
      
 1a) Explicit (one-to-many) disadvantages

    i) By only allowing instances to be used in Dcluster when
       explicitily declared, the opportunities for using a good
       match are diminished
   ii) There is a small cost of sending a Dcluster header whenever
       needed (as opposed to sending it just once).

 2b) Implicit (many-to-many) disadvantages:
    i) By allowing many instances to be used as delta-bases, the client
       may end up using a poor set.
   ii) Or, the client may send very large If-None-Match request headers
  iii) Once defined, there is no method of terminating a Dcluster.
       In particular, a Dcluster may persist long after the instance that
       originated it has expired.
        
Overall, I lean toward the first interpretation. It's not quite as powerful,
but I like the fine grain control it offers. I'm also uncomfortable
with the permanence of Dclusters (point 2.b.iii).  

In any case, whatever interpretation is adopted it needs to
be clearly described.


-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From danielh@crosslink.net  Wed Jan  9 15:20:55 2002
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA09121; Wed, 9 Jan 2002 15:20:55 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mailrelay01.cac.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA21352; Wed, 9 Jan 2002 15:20:54 -0800
Received: by mailrelay01.cac.cpqcorp.net (Postfix, from userid 12345)
	id 28E13147A; Wed,  9 Jan 2002 15:20:54 -0800 (PST)
Received: from ztxmail01.ztx.compaq.com (ztxmail01.nz-cce.cpqcorp.net [161.114.8.205])
	by mailrelay01.cac.cpqcorp.net (Postfix) with ESMTP id D2E081642
	for <http-delta@pa.dec.com>; Wed,  9 Jan 2002 15:20:53 -0800 (PST)
Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345)
	id 578B484C8; Wed,  9 Jan 2002 17:20:53 -0600 (CST)
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id 130C38680
	for <http-delta@pa.dec.com>; Wed,  9 Jan 2002 17:20:53 -0600 (CST)
Received: from danielh (mail.dannyh.org [209.147.90.202]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id SAA29679 for <http-delta@pa.dec.com>; Wed, 9 Jan 2002 18:20:52 -0500
Message-Id: <200201092320.SAA29679@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Wed, 09 Jan 2002 18:20:16 -0500
To: http-delta@pa.dec.com
Subject: A note on delta and range
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.25/25 

8 Jan 2001

A comment on delta and ranges (arising from my ongoing
programming of a delta-encoding module).

1) It would be useful to include a short note on how
the usual rule of client decoding, that one should do the last first,
don't quite apply in some cases. In particular,
when a delta (say, DIFFE) follows a range.

Example:

 A request:
      GET /foo.html HTTP/1.1
      Host: bar.example.net
      If-None-Match: "abc"
      A-IM: range,diffe,gzip
      Range: bytes=100-1000
        
   yielding:
      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "def"
      Delta-Base: "abc"
      IM: range,diffe,gzip
      Content-Range: 100-1000/2000
      Content-Length: 901
      Content-Type: text/html

   When the client decodes this entity-body, it should:
        1) UNGZIP it
        2) extract bytes 100-1000 from the "abc" base-instance
        3) Use this UnGzipped entity body as a difference file,
           and apply it to bytes 100-1000 of "abc"
        4) This yields bytes 100-1000 of "def"

    Note that the client has to "look ahead", to the range token
    of IM, and the Content-Range header (so as to know that the delta
    is meant to be used against only a portion of "abc"). 
    This bends the rule of "apply encodings from last to first",
    hence warrants a warning note.

2) In multi-part responses with delta-encoding, it's left up in 
   the air what (if any) "part headers" should be used, especially
   the content-type. I assume that content-type "part headers"
   should be that of the current instance; regardless of where
   RANGE may appear in the A-IM header.



-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Wed Jan  9 16:53:25 2002
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA28785; Wed, 9 Jan 2002 16:53:25 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA23527; Wed, 9 Jan 2002 16:53:25 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA16244; Wed, 9 Jan 2002 16:53:24 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200201100053.QAA16244@wera.pa.dec.com>
To: danielh@crosslink.net
Cc: http-delta@pa.dec.com
Subject: Re: A note on delta and range 
In-Reply-To: Your message of "Wed, 09 Jan 2002 18:20:16 EST."
             <200201092320.SAA29679@lycanthrope.crosslink.net> 
Date: Wed, 09 Jan 2002 16:53:24 -0800
X-Mts: smtp

Thanks for your note.

    1) It would be useful to include a short note on how
    the usual rule of client decoding, that one should do the last first,
    don't quite apply in some cases. In particular,
    when a delta (say, DIFFE) follows a range.
    
    Example:
    
     A request:
	  GET /foo.html HTTP/1.1
	  Host: bar.example.net
	  If-None-Match: "abc"
	  A-IM: range,diffe,gzip
	  Range: bytes=100-1000
	    
       yielding:
	  HTTP/1.1 200 OK
	  Date: Sun, 06 Nov 1994 08:49:37 GMT
	  Etag: "def"
	  Delta-Base: "abc"
	  IM: range,diffe,gzip
	  Content-Range: 100-1000/2000
	  Content-Length: 901
	  Content-Type: text/html
    
       When the client decodes this entity-body, it should:
	    1) UNGZIP it
	    2) extract bytes 100-1000 from the "abc" base-instance
	    3) Use this UnGzipped entity body as a difference file,
	       and apply it to bytes 100-1000 of "abc"
	    4) This yields bytes 100-1000 of "def"
    
	Note that the client has to "look ahead", to the range token
	of IM, and the Content-Range header (so as to know that the delta
	is meant to be used against only a portion of "abc"). 
	This bends the rule of "apply encodings from last to first",
	hence warrants a warning note.
    
Are you sure about this?  I don't have time right now to do a
careful analysis of your example, but I believe we tried to cover
this quite carefully in section

    2) In multi-part responses with delta-encoding, it's left up in 
       the air what (if any) "part headers" should be used, especially
       the content-type. I assume that content-type "part headers"
       should be that of the current instance; regardless of where
       RANGE may appear in the A-IM header.

Doesn't Section 10.10 (Delta encoding and multipart/byteranges)
cover this?  Remember that (RFC 2616, section 3.7.2, Multipart
Types):

   In general, HTTP treats a multipart message-body no differently than
   any other media type: strictly as payload. The one exception is the
   "multipart/byteranges" type (appendix 19.2) when it appears in a 206
   (Partial Content) response, which will be interpreted by some HTTP
   caching mechanisms as described in sections 13.5.4 and 14.16. In all
   other cases, an HTTP user agent SHOULD follow the same or similar
   behavior as a MIME user agent would upon receipt of a multipart type.
   The MIME header fields within each body-part of a multipart message-
   body do not have any significance to HTTP beyond that defined by
   their MIME semantics.

so I think the *only* case of "multipart/*" that the Delta encoding
specification needs to cover is "multipart/byteranges".

-Jeff

From danielh@crosslink.net  Wed Jan  9 19:05:48 2002
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id TAA22231; Wed, 9 Jan 2002 19:05:48 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mailrelay01.cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA22746; Wed, 9 Jan 2002 19:05:47 -0800
Received: by mailrelay01.cce.cpqcorp.net (Postfix, from userid 12345)
	id 48D511EB0; Wed,  9 Jan 2002 21:05:47 -0600 (CST)
Received: from zmamail01.zma.compaq.com (zmamail01.nz-tay.cpqcorp.net [161.114.72.101])
	by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id EAA261F48
	for <http-delta@pa.dec.com>; Wed,  9 Jan 2002 21:05:46 -0600 (CST)
Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345)
	id 6645D36B8; Wed,  9 Jan 2002 22:05:46 -0500 (EST)
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by zmamail01.zma.compaq.com (Postfix) with ESMTP id 02F443625
	for <http-delta@pa.dec.com>; Wed,  9 Jan 2002 22:05:45 -0500 (EST)
Received: from danielh (mail.dannyh.org [209.147.90.202]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id WAA17001 for <http-delta@pa.dec.com>; Wed, 9 Jan 2002 22:05:45 -0500
Message-Id: <200201100305.WAA17001@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Wed, 09 Jan 2002 22:06:09 -0500
To: http-delta@pa.dec.com
In-Reply-To: <200201100054.QAA32396@wera.pa.dec.com>
Subject: Last minute nits
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.25/25 

>By the way, as you have seen today, the "Proposed Standard"
>RFC for Delta Encoding is about to come out.  We can't make any technical
>changes at this point.  (And PLEASE don't try to
>delay that RFC any longer!)  If you can convince me that you've found
>real bugs, those can be addressed before we go to
>Draft Standard.

What, you mean a two (or 2.3) year delay is a lot? You must not work for
the government :>

Here are my last points. The first one is longish and refers to my most
recent comment. The latter three are shorter.
Do what you want, none of them are important enough to stop the clock.

            -------------------

1) Regarding range,diffe (or range,vcdiff or range,gdiff):

>>    1) It would be useful to include a short note on how
>>    the usual rule of client decoding, that one should do the last first,
>>    don't quite apply in some cases. In particular,
>>    when a delta (say, DIFFE) follows a range.
>>    
>>    Example:
>>    
>>     A request:
>>	  GET /foo.html HTTP/1.1
>>	  Host: bar.example.net
>>	  If-None-Match: "abc"
>>	  A-IM: range,diffe,gzip
>>	  Range: bytes=100-1000
>>	    
>>       yielding:
>>	  HTTP/1.1 200 OK
>>	  Date: Sun, 06 Nov 1994 08:49:37 GMT
>>	  Etag: "def"
>>	  Delta-Base: "abc"
>>	  IM: range,diffe,gzip
>>	  Content-Range: 100-1000/2000
>>	  Content-Length: 901
>>	  Content-Type: text/html
>>    
>>       When the client decodes this entity-body, it should:
>>	    1) UNGZIP it
>>	    2) extract bytes 100-1000 from the "abc" base-instance
>>	    3) Use this UnGzipped entity body as a difference file,
>>	       and apply it to bytes 100-1000 of "abc"
>>	    4) This yields bytes 100-1000 of "def"
>>    
>>	Note that the client has to "look ahead", to the range token
>>	of IM, and the Content-Range header (so as to know that the delta
>>	is meant to be used against only a portion of "abc"). 
>>	This bends the rule of "apply encodings from last to first",
>> 	hence warrants a warning note.
    
>Are you sure about this?  I don't have time right now to do a
>careful analysis of your example, but I believe we tried to cover
>this quite carefully in section

I looked through rfc3329 and here's what I found that is relevant:

i) 
Section 5.7 (Examples of requests combining Range and delta encoding)
discusses range and delta, but does not contain any examples of a server
response that contains Range in the IM header.  


ii) 
Section 10.5.2 mentions Range and IM, as follows:

   As a special case, if the instance-manipulations include both range
   selection and at least one other non-identity instance-manipulation,
   the IM header field MUST be used to indicate the order in which all
   of these instance-manipulations, including range selection, were
   applied.  If the IM header lists the "range" instance-manipulation,
   the response MUST include either a Content-Range header or a
   multipart/byteranges Content-Type in which each part contains a
   Content-Range header.  (See section 10.10 for specific discussion of
   combining delta encoding and multipart/byteranges.)

   Responses that include an IM header MUST carry a response status code
   of 226 (IM Used), as specified in section 10.4.1.

   The server SHOULD omit the IM header if it would list only the
   "range" instance-manipulation.  Such responses would normally be sent
   with response status code 206 (Partial Content), as specified by
   HTTP/1.1 [10].

iii) also in 10.5.2

      IM: range, vcdiff

   This example indicates that one or more ranges of the instance have
   been selected, and the result has then been delta encoded against
   identical ranges of a previous base instance.

Note of these address my point: 
A client MUST check that a differencing (say, a diffe) may be  done
against a range; this check consisting of looking for a RANGE  token
preceding one of the delta tokens (vcdiff, gdiff, and diffe).  If this is
the case, the the client should first extract the range  (as provided in a
Content-Range header) of the base-instance, and  apply the difference to
this extracted range, thereby creating the r ange of the current instance.

I also note that a similar language problem occurs when taking a  "range
of a difference". 
When considering the range of a difference: the range is NOT of  the
instance, it is of the  entity body that will be used  to create an
instance.

So... we can either punt, and hope that the client implementors are smart
enough to realize the "last to first" decoding rule should NOT be
slavishly  followed (and that the "range as range of instance" rule is not
always the case). Personally, I think a few "implementation notes" would
suffice;  perhaps something  like the preceding two paragraphs added to
10.5.2.

            -------------------

2)Section 10.7.3

The phrase:
       (i.e., without any content coding), after recovering that entity by
       applying the delta to it's previous cache entry.

since the previous cache entry, "abc", has a GZIP content coding, this
should say:

      (i.e., without any content coding), after recovering that entity by
      applying the delta to an unGZIP'ed version of it's previous cache
      entry.

Or, to be pedantic:

      (i.e., without any content coding), after recovering that entity by
      applying the delta to an unGZIP'ed version of the "abc" cache
      entry (which as a GZIP content-coding).

            -------------------

3) In 10.5.3

   The server's choice about whether to apply an instance-manipulation
   SHOULD be independent of its choice to apply any subsequent two-input
   instance-manipulations to the response.  (Two-input instance-
   manipulations include delta-codings, because they take two different
   values as input.  Compression and "range" instance-manipulations take
   only one input.  Other instance-manipulations may be defined in the
   future.)

      Note: the intent of this requirement is to prevent the server from
      generating a delta-encoded response that the client can only
      decode by first applying an instance-manipulation encoding to its
      cached base instance.  A server implementor might wish to consider
      what the client would logically have in its cache, when deciding
      which instance-manipulations to apply prior to a delta-coding.

A forward reference to 10.7.1 would help.

BTW: I still find this obtuse. For example, suppose the client says (and
yes, I realize this is a peculiar example)
  A-IM: gzip,gdfiff
and the server acquieseces, returning
  IM: gzip,gdiff
Well... this requires that the client "applies an instance-manipulation
encoding to its cached base instance". In fact, that's what the client
explicitily requested!  So the word "prevent" is kind of contradictory.

                     -------------------

4) In 10.7.2

   When a client receives a delta response with one or more non-identity
   content codings:

This should be:

   When a client receives a delta response with one or more non-identity
   content codings, or the base-instance has one or more non-identity
   content coding:












-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From danielh@crosslink.net  Sat Jan 12 10:12:21 2002
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA01427; Sat, 12 Jan 2002 10:12:21 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mailrelay01.cac.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA25204; Sat, 12 Jan 2002 10:12:21 -0800
Received: by mailrelay01.cac.cpqcorp.net (Postfix, from userid 12345)
	id 28E1C15ED; Sat, 12 Jan 2002 10:12:21 -0800 (PST)
Received: from zmamail01.zma.compaq.com (zmamail01.nz-tay.cpqcorp.net [161.114.72.101])
	by mailrelay01.cac.cpqcorp.net (Postfix) with ESMTP id 018101723
	for <http-delta@pa.dec.com>; Sat, 12 Jan 2002 10:12:20 -0800 (PST)
Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345)
	id 70893355F; Sat, 12 Jan 2002 13:12:20 -0500 (EST)
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by zmamail01.zma.compaq.com (Postfix) with ESMTP id 39C183625
	for <http-delta@pa.dec.com>; Sat, 12 Jan 2002 13:12:20 -0500 (EST)
Received: from danielh (mail.dannyh.org [209.147.90.202]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id NAA14739 for <http-delta@pa.dec.com>; Sat, 12 Jan 2002 13:12:19 -0500
Message-Id: <200201121812.NAA14739@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Sat, 12 Jan 2002 12:53:36 -0500
To: http-delta@pa.dec.com
Subject: On identical instances, different etags
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.25/25 

12 Jan 2002
From: Daniel Hellerstein
Re: What to do with exact matches

Now that the 48 hr comment period on the draft standard had passed, this
comment relates to issues that came up while implementing rfc3229.

When there is an exact match between the current instance and a
base-instance, it's not always easy to produce an appropriate difference
file. In particular, if two  identical binary instances are compared, and
DIFFE is used, then  there is no obvious DIFFE difference file. 
BTW:
  * for identical text instances, a simple 
      0a
      .
    will usually work (though a termating ^Z  may be dropped)

  * I understand that diff should not be used against binary files,
    but what if that's all that is supported, and the instances 
    match exactly? Having some means of telling the client to
    use his "exactly matching" base instance would be useful.

One solution is to ignore the problem -- it is likely to be rare.

Another is to add a new "instance-manipulation value": MATCH
MATCHs would mean "the base-instance (say, as specified in a Delta-Base
response header) exactly matches the current instance".

In this case, there would be no need to send a response body. 



----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org


From danielh@crosslink.net  Sat Jan 12 11:30:38 2002
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id LAA19840; Sat, 12 Jan 2002 11:30:38 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mailrelay01.cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA08351; Sat, 12 Jan 2002 11:30:37 -0800
Received: by mailrelay01.cce.cpqcorp.net (Postfix, from userid 12345)
	id E6A1A1D2B; Sat, 12 Jan 2002 13:30:36 -0600 (CST)
Received: from zmamail02.zma.compaq.com (zmamail02.nz-tay.cpqcorp.net [161.114.72.102])
	by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id AABA21F27
	for <http-delta@pa.dec.com>; Sat, 12 Jan 2002 13:30:36 -0600 (CST)
Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345)
	id 35F8D3B92; Sat, 12 Jan 2002 14:30:36 -0500 (EST)
Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36])
	by zmamail02.zma.compaq.com (Postfix) with ESMTP id 06D043A3C
	for <http-delta@pa.dec.com>; Sat, 12 Jan 2002 14:30:36 -0500 (EST)
Received: from danielh (mail.dannyh.org [209.147.90.202]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id OAA15366 for <http-delta@pa.dec.com>; Sat, 12 Jan 2002 14:30:35 -0500
Message-Id: <200201121930.OAA15366@lycanthrope.crosslink.net>
X-Really-To: <http-delta@pa.dec.com>
Reply-To: danielh@crosslink.net
Date: Sat, 12 Jan 2002 14:30:18 -0500
To: http-delta@pa.dec.com
Subject: Entity-headers, instance-headers, and differences
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.25/25 

12 Jan 2002
From: Daniel Hellerstein
Re: Entity headers, instance headers, and what to difference

Now that the 48 hr comment period on the draft standard had passed, this
comment relates to issues that came up while implementing rfc3229.

It is not clear just how entity headers should be considered when
determining differences between two instances.

Let's start with the definition of "entity" from rfc2616 (section 7)

a) 7 Entity

   Request and Response messages MAY transfer an entity if not otherwise
   restricted by the request method or response status code. An entity
   consists of entity-header fields and an entity-body, although some
   responses will only include the entity-headers.


RFC3229 extends this by definining an instance (from section 3):

   instance        The entity that would be returned in a status-200
                   response to a GET request, at the current time, for
                   the selected variant of the specified resource, with
                   the application of zero or more content-codings, but
                   without the application of any instance manipulations
                   (see below) or transfer-codings.

   It is convenient to think of an entity tag, in HTTP/1.1, as being
   associated with an instance, rather than an entity.  That is, for a
   given resource, two different response messages might include the
   same entity tag, but two different instances of the resource should
   never be associated with the same (strong) entity tag.

The above implies that an instance consists of BOTH entity headers (and
NOT general and response headers), as well as the body.

However, from section 4 of RFC3229 (from the "sequence of
transformations")
      4. The result of the first three steps, at the time when the
         request is processed, is an instance.  The instance includes a
         body (possibly empty) and possibly some instance headers.  The
         entity tag, if any, is assigned at this point.  That is, an
         entity tag is associated with an instance, NOT an entity.

      5.   ...

      6. The result of the fifth step becomes the entity, consisting of
         entity headers and an entity body.

This introduces "instance headers" (the only mention of "instance headers"
in the document).

The question is what, if any, headers should be considered to be part of
an instance. There are several possibilities:

 1) an instance includes all entity headers. If so, 
    "instance headers" are just entity-headers, though perhaps
    they are specific to this instance.
 2) an instance header only includes a special class of "instance
headers",
    most entity headers are not considered to be "instance headers"
 3) an instance does not include any headers

This is more then a symantic concern, it effects just what should be
included when computing differences. Consider, for the preceding
possibilities:

 1) When computing a difference between two instances of a request-uri
    (say "abc" and "def"), then one MUST include both the body
    and the entity-headers.  This includes the headers (from RFC2616):
       entity-header  = Allow                    
                      | Content-Encoding         
                      | Content-Language         
                      | Content-Length           
                      | Content-Location         
                      | Content-MD5              
                      | Content-Range            
                      | Content-Type             
                      | Expires                  
                      | Last-Modified       
    Also, from RFC3229, Delta-Base and IM.

 2 and 3) If there are no "instance headers", 2 and 3 are identical.
    Otherwise, under 2 the instance headers would have to be included
    when computing a difference.

Note that under interpretation 1, the ordering of entity-headers  can
effect the length of the delta -- "abc" and "def" may have  the same set
of entity-headers, but if they appear in a different  order the resulting
difference may be lengthy.

However, if entity-headers are contained in the difference, there is no
need to include entity headers in the actual response. This could result
in substantial savings (especially for small files where the
entity-headers may be a substantial potion of the response).

This interpretation would also require defining Delta-Base and IM as
response-headers. This is somewhat problemmatic given from RFC2616,
section 6.2)

     Response-header field names can be extended reliably only in
     combination with a change in the protocol version. However, new or
     experimental header fields MAY be given the semantics of response-
     header fields if all parties in the communication recognize them to
     be response-header fields. Unrecognized header fields are treated as
     entity-header fields.

Considering these difficulties, and considering implementation hassles of
#1 and #2,  I advocate #3 -- that headers should not be 
included when computing differences between instances. 
Furthermore, the entity-headers included in a 226 response should be used,
 but not any of the entity-headers associated with the base instance.

  This does imply that for delta, two instances that differ in their
  entity-headers but not entity-body (or, more precisely, their 
  instance-body), will yield equivalent responses (that only
  differ in their headers).

 

-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From danielh@crosslink.net  Wed Jan 16 09:52:19 2002
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id JAA05291; Wed, 16 Jan 2002 09:52:18 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mailrelay01.cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA28258; Wed, 16 Jan 2002 09:52:18 -0800
Received: by mailrelay01.cce.cpqcorp.net (Postfix, from userid 12345)
	id 2E5C51C6B; Wed, 16 Jan 2002 11:52:14 -0600 (CST)
Received: from zmamail02.zma.compaq.com (zmamail02.nz-tay.cpqcorp.net [161.114.72.102])
	by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id B88901E8A
	for <http-delta@pa.dec.com>; Wed, 16 Jan 2002 11:52:13 -0600 (CST)
Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345)
	id 5092B398A; Wed, 16 Jan 2002 12:52:13 -0500 (EST)
Received: from relayout.ers.usda.gov (relayout.ers.usda.gov [151.121.68.20])
	by zmamail02.zma.compaq.com (Postfix) with ESMTP id 1814C3850
	for <http-delta@pa.dec.com>; Wed, 16 Jan 2002 12:52:13 -0500 (EST)
Received: from router-3.ers.usda.gov (ers-68-17.ers.usda.gov) by relayout.ers.usda.gov (LSMTP for Windows NT v1.1b) with SMTP id <0.000B018A@relayout.ers.usda.gov>; Wed, 16 Jan 2002 12:52:10 -0500
Received: from danielh (z_a082.ers.usda.gov) by email.ers.usda.gov (LSMTP for Windows NT v1.1b) with SMTP id <0.000619F0@email.ers.usda.gov>; Wed, 16 Jan 2002 12:52:09 -0500
Date: Wed, 16 Jan 2002 12:52:06 -0500
To: http-delta@pa.dec.com
Subject: Etag, entity headers, and delta
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.25/25 
Message-Id: <20020116175213.5092B398A@zmamail02.zma.compaq.com>

16 Jan 2002
From: Daniel Hellerstein
Re: Etag, entity-headers, and delta

In an earlier message I raised a few questions about choosing an etag 
value when entity-headers, but not the entity body, changes.

Having perused RFC2616, shared thoughts with Koen Holtman, and having gone 
ahead with my implementation; my conclusion is that:

    RFC2616's treatment of the use of entity headers when choosing an etag 
    value is, at best, muddled. Hence, the proper approach to choosing an 
    etag should be based on the premise that where there is no clear 
    rule, the origin server should consider  what behavior is best for 
    its needs, with careful consideration of what downstream caches may do.  
    For delta-aware servers (and for other instance manipulations)
    this may imply a loosening of the rule (as stated in 13.3.3 of RFC2616)
    that "if an entity-header changes, so should the etag".

        For example, in order to reduce cache storage requirements, 
        a delta aware origin server may use the same etag for two 
        instances; even though an entity-header (say, the 
        last-modified, or expires entity-headers), but not the 
        instance body, has changed.

I note that a possible solution to the problem (of using the same etag
value even though entity-headers have changed) is to use a "weak" etag.
However, this is not a fully satisfactory solution, since RFC2616 
(again, 13.3.3) places strong restrictions on the use of weak validators 
in sub-range retrieval.

This raises a few questions regarding modifications to RFC3229, the most important
being #3

  1) Should the above (or something like it) be noted. I advocate doing so, 
     but can live with a tactful silence.
         
  2) The phrase "weak etag" appears nowhere in RFC3229. 
     Does that imply agreement with 13.3.3 (that weak etags should not be used
     in 226 instannce manipulation responses, just as they should not be used
     in 206 range responses)?
         
  3) My interpretation is that the entity-headers from a client's
     cached base instance should NOT be used as the entity headers
     for newly recieved, delta-encoded, instance. That is, the entity
     headers should be only those contained in the current response.
    
    This does reduce delta efficiency, since it requires resending
    entity headers that have not changed. Perhaps the rule should be 
    "you can use cached entity headers if they are not overridden by
    an entity-header in the current response"

   One or the other must be stated somewhere. I need the guidance!


BTW: Here's the beginning part of RFC2616 13.3.3

         13.3.3 Weak and Strong Validators

       Since both origin servers and caches will compare two validators to
       decide if they represent the same or different entities, one normally
       would expect that if the entity (the entity-body or any entity-
       headers) changes in any way, then the associated validator would
       change as well. If this is true, then we call this validator a
       "strong validator."

       However, there might be cases when a server prefers to change the
       validator only on semantically significant changes, and not when
       insignificant aspects of the entity change. A validator that does not
       always change when the resource changes is a "weak validator."

       Entity tags are normally "strong validators," but the protocol
       provides a mechanism to tag an entity tag as "weak." One can think of
       a strong validator as one that changes whenever the bits of an entity
       changes, while a weak value changes whenever the meaning of an entity
       changes. Alternatively, one can think of a strong validator as part
       of an identifier for a specific entity, while a weak validator is
       part of an identifier for a set of semantically equivalent entities.




 


 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul  Wed Jan 16 16:38:49 2002
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA16130; Wed, 16 Jan 2002 16:38:49 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <200201170038.QAA16130@wera.pa.dec.com>
To: http-delta
Subject: Success, at last: RFCs 3229 and 3230
Date: Wed, 16 Jan 2002 16:38:49 -0800
X-Mts: smtp

The "Proposed Standard" RFCs have (finally!) been released:

        RFC 3229

        Title:	    Delta encoding in HTTP
        Author(s):  J. Mogul, B. Krishnamurthy, F. Douglis,
                    A. Feldmann, Y. Goland, A. van Hoff,
                    D. Hellerstein 
        Status:     Standards Track
	Date:       January 2002
        Mailbox:    JeffMogul@acm.org, bala@research.att.com,
                    douglis@research.att.com, anja@cs.uni-sb.de,
                    yaron@goland.org, avh@marimba.com,
                    danielh@crosslink.net 
        Pages:      49
        Characters: 111953
        Updates/Obsoletes/SeeAlso:  None

        I-D Tag:    draft-mogul-http-delta-10.txt

        URL:        ftp://ftp.rfc-editor.org/in-notes/rfc3229.txt

and

        RFC 3230

        Title:	    Instance Digests in HTTP
        Author(s):  J. Mogul, A. Van Hoff
        Status:     Standards Track
	Date:       January 2002
        Mailbox:    JeffMogul@acm.org, avh@marimba.com
        Pages:      13
        Characters: 26846
        Updates/Obsoletes/SeeAlso:  None

        I-D Tag:    draft-mogul-http-digest-05.txt

        URL:        ftp://ftp.rfc-editor.org/in-notes/rfc3230.txt

Thanks to *all* of you who helped with this long process.

I know there are already suggestions for improvements.  However,
I have to spend all of my available time working on something
else for the next few weeks, so please excuse my apparent lack of 
interest in delta-related issues!  I'll respond when I can.

-Jeff

From danielh@crosslink.net  Tue Jan 22 15:27:00 2002
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id PAA05172; Tue, 22 Jan 2002 15:27:00 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mailrelay01.cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA14110; Tue, 22 Jan 2002 15:26:59 -0800
Received: by mailrelay01.cce.cpqcorp.net (Postfix, from userid 12345)
	id F005912EE; Tue, 22 Jan 2002 17:26:58 -0600 (CST)
Received: from ztxmail01.ztx.compaq.com (ztxmail01.nz-cce.cpqcorp.net [161.114.8.205])
	by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id EAC5711FA
	for <http-delta@pa.dec.com>; Tue, 22 Jan 2002 17:26:58 -0600 (CST)
Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345)
	id A7818264D; Tue, 22 Jan 2002 17:26:58 -0600 (CST)
Received: from relayout.ers.usda.gov (relayout.ers.usda.gov [151.121.68.20])
	by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id 52B7424B8
	for <http-delta@pa.dec.com>; Tue, 22 Jan 2002 17:26:58 -0600 (CST)
Received: from router-3.ers.usda.gov (ers-68-17.ers.usda.gov) by relayout.ers.usda.gov (LSMTP for Windows NT v1.1b) with SMTP id <0.000B2CC9@relayout.ers.usda.gov>; Tue, 22 Jan 2002 18:26:57 -0500
Received: from danielh (z_a082.ers.usda.gov) by email.ers.usda.gov (LSMTP for Windows NT v1.1b) with SMTP id <0.000640D8@email.ers.usda.gov>; Tue, 22 Jan 2002 18:26:52 -0500
Date: Tue, 22 Jan 2002 18:14:27 -0500
To: http-delta@pa.dec.com
Subject: An implementation of rfc3229
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.25/25 
Message-Id: <20020122232658.A7818264D@ztxmail01.ztx.compaq.com>

I've pretty much got IM and delta encoding, as spec'ed in rfc3229,
implemented under my server.  At the moment, I've only tested it
behind my firewall.  If there is any interest, I can open
up a port to a machine that is running this delta-aware
server.  Since this is a bit of a hassle, I won't bother
unless someone asks (when I get the rest of the pieces of this
server finished, in a month or two, my public server will be 
delta aware).

I also wrote a simple (command line) client that supports delta
encoding.  So I can use this to test someone else's delta
aware server.

Notes:

 * both client and server are written in REXX, and run under OS/2.
   They use a set of DLLs, otherwise it would be very easy to
   port the client to other platforms. If interest
   is expressed, I can try porting the client to win98.

 *  DIFF -e and GDIFF are supported (not VCDIFF)
 
 *  support for multiple-ranges is limited to "multiple ranges
    AFTER a delta". For now, it was just too much of a pain to have to
    deal with seperate deltas for each of several seperate ranges.

 *  Lacking a definitive answer, I assume that entity-headers
    are NOT included in the delta comparison. Which also means
    that all entity headers are sent to the client (even if they
    have not changed).  On small files (say, where just a date or
    textual hit counter change), these entity headers are a signficant fraction
    of the entire 226 response.


 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From kpv@research.att.com  Tue Jan 22 16:09:08 2002
Return-Path: <kpv@research.att.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id QAA09398; Tue, 22 Jan 2002 16:09:07 -0800 (PST)
Received: from mailrelay01.cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA18548; Tue, 22 Jan 2002 16:09:07 -0800
Received: by mailrelay01.cce.cpqcorp.net (Postfix, from userid 12345)
	id 836B41807; Tue, 22 Jan 2002 18:09:01 -0600 (CST)
Received: from ztxmail01.ztx.compaq.com (ztxmail01.nz-cce.cpqcorp.net [161.114.8.205])
	by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id 4D8331850
	for <http-delta@pa.dec.com>; Tue, 22 Jan 2002 18:09:01 -0600 (CST)
Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345)
	id 092CF24CD; Tue, 22 Jan 2002 18:09:01 -0600 (CST)
Received: from mail-green.research.att.com (H-135-207-30-103.research.att.com [135.207.30.103])
	by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id D4C5F24DF
	for <http-delta@pa.dec.com>; Tue, 22 Jan 2002 18:08:22 -0600 (CST)
Received: from raptor.research.att.com (raptor.research.att.com [135.207.23.32])
	by mail-green.research.att.com (Postfix) with ESMTP
	id D3EFB1E176; Tue, 22 Jan 2002 19:08:21 -0500 (EST)
Received: (from kpv@localhost)
	by raptor.research.att.com (SGI-8.9.3/8.8.7) id TAA57151;
	Tue, 22 Jan 2002 19:08:21 -0500 (EST)
Date: Tue, 22 Jan 2002 19:08:21 -0500 (EST)
From: Phong Vo <kpv@research.att.com>
Message-Id: <200201230008.TAA57151@raptor.research.att.com>
Organization: AT&T Research
X-Mailer: mailx (AT&T/BSD) 9.9 2002-01-16
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
To: danielh@crosslink.net, http-delta@pa.dec.com
Subject: Re: An implementation of rfc3229



Daniel,
The source code for Vcdiff is available at www.research.att.com/sw/tools
in case you'd like to add it.
Phong

> From danielh@crosslink.net Tue Jan 22 18:23 EST 2002
> To: http-delta@pa.dec.com
> Subject: An implementation of rfc3229

> I've pretty much got IM and delta encoding, as spec'ed in rfc3229,
> implemented under my server.  At the moment, I've only tested it
> behind my firewall.  If there is any interest, I can open
> up a port to a machine that is running this delta-aware
> server.  Since this is a bit of a hassle, I won't bother
> unless someone asks (when I get the rest of the pieces of this
> server finished, in a month or two, my public server will be 
> delta aware).

> I also wrote a simple (command line) client that supports delta
> encoding.  So I can use this to test someone else's delta
> aware server.

> Notes:

>  * both client and server are written in REXX, and run under OS/2.
>    They use a set of DLLs, otherwise it would be very easy to
>    port the client to other platforms. If interest
>    is expressed, I can try porting the client to win98.

>  *  DIFF -e and GDIFF are supported (not VCDIFF)
>  
>  *  support for multiple-ranges is limited to "multiple ranges
>     AFTER a delta". For now, it was just too much of a pain to have to
>     deal with seperate deltas for each of several seperate ranges.

>  *  Lacking a definitive answer, I assume that entity-headers
>     are NOT included in the delta comparison. Which also means
>     that all entity headers are sent to the client (even if they
>     have not changed).  On small files (say, where just a date or
>     textual hit counter change), these entity headers are a signficant fraction
>     of the entire 226 response.

>  
> -----------------------------------------------------------
> Daniel Hellerstein
> danielh@crosslink.net
> http://www.srehttp.org
> -----------------------------------------------------------


From mogul  Mon Feb 18 17:45:20 2002
Return-Path: <mogul>
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id RAA04669; Mon, 18 Feb 2002 17:45:20 -0800 (PST)
From: Jeffrey Mogul <mogul>
Message-Id: <200202190145.RAA04669@wera.pa.dec.com>
To: http-delta
Subject: FYI: IESG approves vcdiff spec. as Proposed Standard
Date: Mon, 18 Feb 2002 17:45:20 -0800
X-Mts: smtp

    From: iesg-secretary@ietf.org (The IESG)
    Subject: Protocol Action: The VCDIFF Generic Differencing and
    Date: Fri, 15 Feb 2002 23:00:02 +0000 (UTC)
    
    The IESG has approved the Internet-Draft 'The VCDIFF Generic
    Differencing and Compression Data Format' <draft-korn-vcdiff-06.txt> as
    a Proposed Standard.  This has been reviewed in the IETF but is not the
    product of an IETF Working Group.  The IESG contact persons are Patrik
    Faltstrom and Ned Freed.
    
    Technical Summary
     
    The memo describes a general and efficient data format suitable
    for encoding compressed and/or differencing data so that they can
    be easily transported among computers. It is used as one of the
    proposed format for transfer of differencing data over HTTP.
    
    Working Group Summary
    
    This is an individual submission to the IETF. The document was
    discussed on various mailing lists, including <http-delta@pa.dec.com>,
    about the HTTP protocol.
    
    Protocol Quality
    
    The protocol was reviewed for the IESG by Patrik Faltstrom.
    
Congratulations to David Korn and Phong Vo for their success on this!

What this means is that we now have a reasonably well-documented
and space-efficient encoding format for deltas.  It would be nice
to see multiple implementations of both the encoder and decoder,
since we will need that for "Draft Standard" status.  There is
some source code available from AT&T (see draft-korn-vcdiff-06.txt
for details).

If we want to make vcdiff the recommended format for delta
encoding, we will need vcdiff to be advanced to Draft Standard
status *before* we can advance the delta spec to Draft Standard
(by IETF rules).

-Jeff

From danielh@crosslink.net  Tue Feb 19 09:18:35 2002
Return-Path: <danielh@crosslink.net>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id JAA17358; Tue, 19 Feb 2002 09:18:35 -0800 (PST)
From: <danielh@crosslink.net>
Received: from mailrelay01.cac.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA30746; Tue, 19 Feb 2002 09:18:34 -0800
Received: from ztxmail02.ztx.compaq.com (ztxmail02.nz-cce.cpqcorp.net [161.114.8.206])
	by mailrelay01.cac.cpqcorp.net (Postfix) with ESMTP id 6D3B21BAB
	for <http-delta@pa.dec.com>; Tue, 19 Feb 2002 09:18:34 -0800 (PST)
Received: from relayout.ers.usda.gov (relayout.ers.usda.gov [151.121.68.20])
	by ztxmail02.ztx.compaq.com (Postfix) with ESMTP id 60BC23371
	for <http-delta@pa.dec.com>; Tue, 19 Feb 2002 11:18:33 -0600 (CST)
Received: from router-3.ers.usda.gov (ers-68-17.ers.usda.gov) by relayout.ers.usda.gov (LSMTP for Windows NT v1.1b) with SMTP id <0.000BFEB8@relayout.ers.usda.gov>; Tue, 19 Feb 2002 12:18:28 -0500
Received: from danielh (z_a082.ers.usda.gov) by email.ers.usda.gov (LSMTP for Windows NT v1.1b) with SMTP id <0.0007007C@email.ers.usda.gov>; Tue, 19 Feb 2002 12:18:26 -0500
Date: Tue, 19 Feb 2002 12:18:02 -0500
To: http-delta@pa.dec.com
In-Reply-To: <200202190145.RAA04669@wera.pa.dec.com>
Subject: Re: FYI: IESG approves vcdiff spec. as Proposed Standard
X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.30a/30 
Message-Id: <20020219171833.60BC23371@ztxmail02.ztx.compaq.com>


>If we want to make vcdiff the recommended format for delta
>encoding, we will need vcdiff to be advanced to Draft Standard status
>*before* we can advance the delta spec to Draft Standard (by IETF rules).

>From my very narrow point of view, having a broad (in terms of platforms)
availability of VCDIFF is crucial if we are to "make vcdiff the recommended
format".  I trust this will happen, but let's not put the cart before the horse.

(BTW: Phong... any progress on the os/2 version?)

 
-----------------------------------------------------------
Daniel Hellerstein
danielh@crosslink.net
http://www.srehttp.org
-----------------------------------------------------------


From mogul@pa.dec.com  Tue Feb 19 10:20:07 2002
Return-Path: <mogul@pa.dec.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA21568; Tue, 19 Feb 2002 10:20:07 -0800 (PST)
Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA02180; Tue, 19 Feb 2002 10:20:07 -0800
Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA21921; Tue, 19 Feb 2002 10:20:06 -0800 (PST)
From: Jeffrey Mogul <mogul@pa.dec.com>
Message-Id: <200202191820.KAA21921@wera.pa.dec.com>
To: <danielh@crosslink.net>
Cc: http-delta@pa.dec.com
Subject: Re: FYI: IESG approves vcdiff spec. as Proposed Standard 
In-Reply-To: Your message of "Tue, 19 Feb 2002 12:18:02 EST."
             <20020219171833.60BC23371@ztxmail02.ztx.compaq.com> 
Date: Tue, 19 Feb 2002 10:20:06 -0800
X-Mts: smtp

    >If we want to make vcdiff the recommended format for delta
    >encoding, we will need vcdiff to be advanced to Draft Standard status
    >*before* we can advance the delta spec to Draft Standard (by IETF rules).
    
    From my very narrow point of view, having a broad (in terms of
    platforms) availability of VCDIFF is crucial if we are to "make
    vcdiff the recommended format".  I trust this will happen, but
    let's not put the cart before the horse.
    
Of course we want a wide set of platforms covered, but the IETF
standards process lives and dies by the "rough consensus" model,
not the "no platform left behind model."  I certainly don't want
to leave out popular but minority platforms such as OS/2 and
MacOS, but let's please not wait until we have vcdiff ported to
AmigaOS and TRS-80, OK?

And vcdiff would be RECOMMENDED, not MANDATORY, in any case.

-Jeff

From kpv@research.att.com  Tue Feb 19 10:52:43 2002
Return-Path: <kpv@research.att.com>
Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM)
	id KAA25519; Tue, 19 Feb 2002 10:52:43 -0800 (PST)
Received: from mailrelay01.cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM)
	id AA11686; Tue, 19 Feb 2002 10:52:42 -0800
Received: from zmamail02.zma.compaq.com (zmamail02.nz-tay.cpqcorp.net [161.114.72.102])
	by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id 56F401402
	for <http-delta@pa.dec.com>; Tue, 19 Feb 2002 12:52:42 -0600 (CST)
Received: from mail-green.research.att.com (H-135-207-30-103.research.att.com [135.207.30.103])
	by zmamail02.zma.compaq.com (Postfix) with ESMTP id 35F691EC1
	for <http-delta@pa.dec.com>; Tue, 19 Feb 2002 13:52:41 -0500 (EST)
Received: from raptor.research.att.com (raptor.research.att.com [135.207.23.32])
	by mail-green.research.att.com (Postfix) with ESMTP
	id CED9E1E0C7; Tue, 19 Feb 2002 13:52:39 -0500 (EST)
Received: (from kpv@localhost)
	by raptor.research.att.com (SGI-8.9.3/8.8.7) id NAA80024;
	Tue, 19 Feb 2002 13:52:39 -0500 (EST)
Date: Tue, 19 Feb 2002 13:52:39 -0500 (EST)
From: Phong Vo <kpv@research.att.com>
Message-Id: <200202191852.NAA80024@raptor.research.att.com>
Organization: AT&T Research
X-Mailer: mailx (AT&T/BSD) 9.9 2002-01-31
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
To: danielh@crosslink.net, http-delta@pa.dec.com
Subject: Re: FYI: IESG approves vcdiff spec. as Proposed Standard


> From danielh@crosslink.net Tue Feb 19 12:15 EST 2002
> To: http-delta@pa.dec.com
> Subject: Re: FYI: IESG approves vcdiff spec. as Proposed Standard

> >If we want to make vcdiff the recommended format for delta
> >encoding, we will need vcdiff to be advanced to Draft Standard status
> >*before* we can advance the delta spec to Draft Standard (by IETF rules).

> >From my very narrow point of view, having a broad (in terms of platforms)
> availability of VCDIFF is crucial if we are to "make vcdiff the recommended
> format".  I trust this will happen, but let's not put the cart before the horse.

My code is carefully crafted to be portable with respect to all standard flavors
of C and C++. So in this sense, I believe that the code is available on
most platforms that we care about.

On the other hand, portable C code is not the same as a buildable package.
The build procedure for the distributed package on the AT&T site
(www.research.att.com/sw/tools) requires the make & shell tools.
As far as I know, this can be built and run transparently on all flavors
of Unix including Linux, BSD, Irix, Solaris, etc.  On Windows varieties,
using something like Dave Korn's Uwin system as a base would work.

For people who like to read code, the core encoding algorithm is in the
file src/lib/vcodex/Vcdiff/vcddiff.c. The fast string matcher is in
the routine vcdfold() in the same file. The core decoding algorithm is in
the file src/lib/vcodex/Vcdiff/vcdundiff.c. By "core", I mean the code
that deals with each window of data as described in the Proposed Standard.
For file handling level, read src/lib/sfvcodex/sfwindow.c to see the
different strategies whereby windows are selected.  src/lib/sfvcodex/sfvcdiff.c
and src/lib/sfvcodex/sfvcundiff.c do file level encoding and decoding.

> (BTW: Phong... any progress on the os/2 version?)
Not yet. First we need to bring up an os/2 machine and that's taking time.

Phong


From apache@ns3.super-hosts.com Tue Jul 30 00:54:06 2002
Return-Path: <apache@ns3.super-hosts.com>
Received: from mailrelay01.cac.cpqcorp.net (mailrelay01.cac.cpqcorp.net [16.47.132.152])
	by wera.hpl.hp.com (8.12.3/8.12.2) with ESMTP id g6U7s6ho005303
	for <http-delta@pa.dec.com>; Tue, 30 Jul 2002 00:54:06 -0700 (PDT)
Received: from zmamail01.zma.compaq.com (zmamail01.nz-tay.cpqcorp.net [161.114.72.101])
	by mailrelay01.cac.cpqcorp.net (Postfix) with ESMTP id F3DA8E05
	for <http-delta@pa.dec.com>; Tue, 30 Jul 2002 00:54:07 -0700 (PDT)
Received: from ns3.super-hosts.com (unknown [216.12.213.215])
	by zmamail01.zma.compaq.com (Postfix) with ESMTP id 440738EDB
	for <http-delta@pa.dec.com>; Tue, 30 Jul 2002 03:54:05 -0400 (EDT)
Received: (from apache@localhost)
	by ns3.super-hosts.com (8.11.6/8.11.6) id g6U83HN20574;
	Tue, 30 Jul 2002 04:03:18 -0400
Date: Tue, 30 Jul 2002 04:03:18 -0400
Message-Id: <200207300803.g6U83HN20574@ns3.super-hosts.com>
To: scoya@cnri.reston.va.us, sigtran@standards.nortelnetworks.com,
       ietf-languages@alvestrand.no, ietf-languages@iana.org,
       http-delta@pa.dec.com, yoakum@nortelnetworks.com,
       200104121657.JAA28693@jet.isi.edu, isis-wg@juniper.net,
       mibs@ops.ietf.org, gsmp@revnetworks.com
From: mike@winners.com ()
Subject: GET OVER $500 CASH in 2 minutes!!!

Below is the result of your feedback form.  It was submitted by  (mike@winners.com) on Tuesday, July 30, 19102 at 04:03:17
---------------------------------------------------------------------------

message: GET OVER $500 DOLLARS CASH!!! Just Click Here!!!! - http://www.reelten.com/redirect/500/index.htm -     If you have received this Email in error please contact: lon_chaney_jr@hotmail.com with subject: remove. 

---------------------------------------------------------------------------