From mogul Fri Aug 20 15:26:54 1999 Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM) id AA10061; Fri, 20 Aug 1999 15:26:54 -0700 Message-Id: <9908202226.AA10061@youra.pa.dec.com> To: http-delta Subject: Announcing/testing the http-delta mailing list Date: Fri, 20 Aug 99 15:26:54 -0700 From: mogul X-Mts: smtp As I wrote yesterday, I've created a new mailing list, http-delta@pa.dec.com for further discussion of delta encoding in HTTP. (My hope is that this list will have a very short active lifetime, but who knows?) Usual administrivia: DON'T SEND (UN)SUBSCRIPTION REQUESTS TO THE ENTIRE LIST! Send them to http-delta-request@pa.dec.com FYI, it's possible that (due to the highly distributed implementation of mailing lists here), you won't be able to post messages yourselves for several hours. Please don't all rush to send mail! My next step will be to forward several people's mail to the entire mailing list, so that we all see the messages and so that they end up in the archive. (Sorry, for the time being I don't have an easy way to mirror the archive onto a public Web site - you can ask me to email it to you.) -Jeff From mogul Fri Aug 20 15:30:05 1999 Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM) id AA10103; Fri, 20 Aug 1999 15:30:05 -0700 Message-Id: <9908202230.AA10103@youra.pa.dec.com> To: http-delta Subject: Delta encoding in HTTP From: Clifford Heath X-Original-Date: Wed, 28 Jul 1999 11:10:53 +1000 Date: Fri, 20 Aug 99 15:30:05 -0700 Sender: mogul X-Mts: smtp Folk, I am one of the designers of OSA's netDeploy product. OSA has experimented with various forms of delta encoding for several years now, including various forms of rsync-like protocols over HTTP. We wish to contribute some of this experience to strengthening your work towards an RFC for delta encodings in HTTP. Rsync in HTTP (standardised perhaps differently to how Andrew Tridgell recently suggested) offers the possibility of removing the need for the server to store either old versions of a resource, or (precomputed or cached) delta files for updating between specific versions. It does this however at the cost of requiring on-the-fly difference computation on the web server - this cost may be problematic in some situations. There is also an additional cost in increasing the size of the HTTP request. The response from an rsync-enabled web server could be encoded using a format similar to vcdiff, with only minor encoding changes. However we have also invented and filed a patent application for a technology which has additional advantages over rsync in an HTTP context, and which we believe avoids conflict with Pyne's patent. We expect to be able to get our board of directors to approve disclosure of this invention and to allow its use within the terms required by the IETF for an RFC. Specifically, the invention allows: - distributed byte-level differencing (server only requires current instances, no precomputed deltas or previous instance versions, as for rsync), - difference computation is performed by the client, relieving the server of this additional load, - difference files are cachable by unmodified existing web caches. - no modification is required to either existing web servers or to web caches, although there is some management advantage if changes can be made. A web cache can be independently fitted with enhanced differencing capability without needing servers to also be enhanced, and vice versa. We also have a means whereby this differencing can be performed on a compressed file, generating compressed differences, again without requiring server or cache modifications including addition of transfer encodings. The cost for these benefits is an additional HTTP request per transfer, meaning an extra network round-trip. In some situations this additional cost is unacceptable (your draft rules out the use of additional requests), but in many situations it has no real impact. I understand that there is no formal working group for your proposals. Please reply indicating your interest in discussing our work and the processes by which we can contribute to formulating an RFC that includes some of the advantages I have mentioned. ------------------------------------------------------------ Clifford Heath http://www.osa.com.au/~cjh Open Software Associates Limited mailto:cjh@osa.com.au 29 Ringwood Street / PO Box 4414 Phone +613 9871 1694 Ringwood VIC 3134 AUSTRALIA Fax +613 9871 1711 ------------------------------------------------------------ Proven Solution Deployment for the Global Enterprise From mogul Fri Aug 20 15:31:07 1999 Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM) id AA10055; Fri, 20 Aug 1999 15:31:07 -0700 Message-Id: <9908202231.AA10055@youra.pa.dec.com> To: http-delta Subject: Re: Delta encoding in HTTP From: Andrew Tridgell X-Original-Date: Wed, 28 Jul 1999 13:06:10 +1000 Date: Fri, 20 Aug 99 15:31:07 -0700 Sender: mogul X-Mts: smtp Clifford, Thanks for raising this. for those of you who haven't see the rproxy paper I gave at CALU you can grab a copy at ftp://samba.org/pub/tridge/rproxy/ or look at a "very alpha" prototype implementaion at ftp://samba.org/pub/unpacked/rproxy/ > It does this however at the cost of requiring on-the-fly difference > computation on the web server - this cost may be problematic in some > situations. Luckily it turns out to be not all that expensive. A simple implementaion easily saturates my 10MBit lan at home on a low-end PC and I'm sure it could be made a lot faster. If you include compression then it gets a lot slower, but still is much faster than is needed for most peoples internet links. It won't win a specweb benchmark but it doesn't aim to :) > There is also an additional cost in increasing the size of the HTTP > request. I think that isn't too much of a problem. In the current rproxy it adds about 500 bytes to the request which leaves the request a long way short of the common 1500 MTU, and thus within one IP segment. As long as the request stays as a single packet I don't think the overhead is excessive, especially when server-side signatures are used as that ensures that signatures are only sent when both ends of the link understand the protocol extension. Still, it would be interesting to explore ways of avoiding this overhead. Paul and I have come up with a couple of ways of doing this but they involve server-side storage (not much storage, but some) which we have been trying to avoid. Our basic rules have been that we want no server storage, no extra round-trips and using existing HTTP infrastructure whenever possible. > However we have also invented and filed a patent application for a > technology which has additional advantages over rsync in an HTTP context, > and which we believe avoids conflict with Pyne's patent. We expect to be > able to get our board of directors to approve disclosure of this invention > and to allow its use within the terms required by the IETF for an RFC. would that allow 3rd party implementaion without permission from your company? > Specifically, the invention allows: > - distributed byte-level differencing (server only requires current > instances, no precomputed deltas or previous instance versions, as for > rsync), ummm, rsync doesn't require precomputed deltas or previous instance versions. Maybe I didn't make that clear enough in the paper? > - difference computation is performed by the client, relieving the server > of this additional load, that's certainly interesting > - difference files are cachable by unmodified existing web caches. we have a way of doing that in rproxy (using a content-encoding trick) although I'm the first to admit it's a bit of a hack. I'll be interested to see what your solution is. > - no modification is required to either existing web servers or to web > caches, although there is some management advantage if changes can > be made. A web cache can be independently fitted with enhanced > differencing capability without needing servers to also be enhanced, > and vice versa. that is also the case with rproxy. I currently run: netscape -> rproxy -> modem link -> rproxy -> squid -> world and I get the delta benefit over the modem link. One interesting choice is minimal path versus maximal path delta encoding. In a minimal path system you do delta encoding between each nearest pair of delta-enabled elements in the path. This makes intermediate cacheing easier (less hackish) but means everyone pays the computational cost. In maximal path delta encoding the two furthest apart elements in the chain do the encode/decode. Personally I prefer maximal path encoding (and that is what rproxy implements) but Peter Barker prefers minimal path encoding (Peter is the co-author of rproxy). > We also have a means whereby this differencing can be performed on > a compressed file, generating compressed differences, again without > requiring server or cache modifications including addition of transfer > encodings. Do you mean using existing compressed files (eg. gzip, zip, bzip2) or do you mean using a modified compression algorithm? > The cost for these benefits is an additional HTTP request per transfer, > meaning an extra network round-trip. ugggh, that is a really big downside. It would break the normal flow of HTTP which is a serious price to pay. > I understand that there is no formal working group for your proposals. > Please reply indicating your interest in discussing our work and the > processes by which we can contribute to formulating an RFC that includes > some of the advantages I have mentioned. I'd certainly be interested in further discussions. My plan at this stage is to play some more with the design of rproxy then to implement it as a patch to squid, apache and mozilla. I'll then get it deployed on some really large sites and see how it stands up to a real battering. I haven't yet started to look into a standardisation process because I wanted the protocol extensions to be well and truly tested before going down that path. Cheers, Tridge From mogul Fri Aug 20 15:33:15 1999 Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM) id AA10154; Fri, 20 Aug 1999 15:33:15 -0700 Message-Id: <9908202233.AA10154@youra.pa.dec.com> To: http-delta Subject: Comments on Delta Encoding in HTTP From: danielh@crosslink.net X-Original-Date: Thu, 19 Aug 1999 00:52:03 -0400 Date: Fri, 20 Aug 99 15:33:15 -0700 Sender: mogul X-Mts: smtp 18 Aug 1999 From: Daniel Hellerstein (danielh@crosslink.net) To: Jeffrey Mogul Re: Comments on "Delta Encoding in HTTP" I've read (a few times) the Draft encoding in the HTTP Internet-Draft, and would like to make a few comments. Since I'm not sure what the appropriate forum for such comments may, I figure it's reasonable to send them to you for a preliminary vetting. Let me start by commending the quality of the writing, it's generally quite good. Aside from my major comment, most of my comments reflect problems I had comprehending the complete picture. Lastly, I'm considering an experimental implementation of this proposal in my "SRE-http" web server. Do you anticipate any major changes to this proposal (abstracting from the major change I mention below!) Major Comment: The proposed use of templates is problemmatic. In particular, an additional GET is required for each DTemplate, with no guarantee that the results of this GET will ever be used. Even if a well designed user-agent makes these requests in a way that does not effect the client's percieved response time, these extra requests will reduce available bandwidth for everyone else. Was any thought given to a scheme where the template is returned first, after which the client requests a delta against this template? Alternatively, the template and the delta could be returned as a multi-part document. Perhaps this would use a new status code, say 228 Delta Template. Also, the client could signal it's unwillingness to accept this "delta template response" via a new request header, or with a "no-template" Accept-encoding? Minor comments: 1) Page 7, definition of instance. It would be useful to further clarify the relationship between resources, instances, and entities. For example: "One can think of an instance as a snapshot in the life of a resource. Diagramattically: a resource -- yields --> an instance, an instance -- yields --> an entity. where the entity incorporates the effects of content-encodings and range extractions. 2) Page 9, point 5. the phrase "... and an appropriately encoded body" is a bit terse. Perhaps "... and the appropriate range(s) from the possibly encoded body." 3) Page 10, before section 5. It might be useful to remind the reader that the client should decode using the reverse order of methods listed in the Content-encoding header. Thus, given a response header of Content-encoding: vcgiff,gzip the client should first decompress, and then apply the reverse delta encoding. 4) Page 14, para 3. The sentence (Presumably, the interrupted response used the same delta encoding, if any.) seems too weak; it is hard to image any other circumstance for which a client would want a range of a delta content-encoding delta (since the entity body returned by the server is a piece of the delta). Hence, instead of "Presumably", how about "It is expected that ". Basically, it took me more then one read to understand that a range of a delta content-encoding means something like "return a piece of the output of vcdiff"; implying that the client supplied byte range has little to do with the byte range of the newly created instance. Reiterating this may be overkill, but not letting people think otherwise is important. 5) 226 and 227 status codes Why use two new status codes, instead of one. That is, use 226 for both delta and range of delta, and add a new response header (or modify an existing response header) to indicate that "this is a range response". I would guess that using 227 provides a useful hint to clients (that they should look for a Content-Range header)? 6)Page 18, second paragraph For clarity sake, how about this change: Suppose that the server's current instance has entity tag "B", and that the server also has retained a copy of the instance with entity tag "A". Then, the server could compute the difference between "B" and "A" and respond with: 7) Top of page 19. Further stressing the concept of range of delta content encoding, how about: selection, and returns a 227 (Range of Delta) response with an entity body containing bytes 900 to 999 of the vcdiff computed difference. 8) Page 22 LRU is never defined (least recently used?) 9) Page 24. It might be useful to add this example: Suppose : a)the client requests /help/foo.bar, and the server responds with: HTTP/1.1 200 OK Etag: "abc" b)the client then requests /help/fun.bar, and the server responds with: HTTP/1.1 200 OK Etag: "efg" DCluster: "//bar.example.net/help/" c) Then, if the client re-requests /help/foo.bar, it should add a request header of: If-none-match:"abc","efg" 10) Page 25. It might be useful to add (after the "It would not make sense paragraph") a reminder that use of broad uniqueness scope may also increase the work the server must do to ensure that no two URIs yield the same entity tag. 11) Page 28. In the example containing DTemplate: "http://bar.example.net/foo.tplt"/etag="pqr" What if an etag of "zyx" is returned by the server in response to a request for foo.tplt? Should the next request for foo.html use If-none-match: "pqr" or If-none-match" "zyx". I'ld think the latter, but the example does not make that clear. Also, the following modification may be useful: This means that for any Request-URI matching the prefix specified in the Dcluster header field, the URI specified in the DTemplate field is an appropriate template; and If-none-match should use "pqr" (assuming that "pqr" is the etag for foo.tplt). 12) Pag 32, 12.3.1 The phrase "... or if the uniqueness scope for an entity tag of any instance of the requested resource has ever included aonther resource". seems unnecessary. That is, if the client only included one etag in the If-none-match, and the server didn't make any uniqueness scope errors, why would there ever be any ambiguity (since the client has the entity associated with this single etag, as does the server)? 13) Page 32, 12.3.2 The descripton of Dcluster never mentions it's prime purpose -- to identify an "etag" to use for a set of Request-Uris. Instead, 12.3.2 is framed in terms of uniqueness scopes. While these are important constraints affecting DCluster, it's not the reason one would use DCluster. ----------------------------------------------------------- danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul Fri Aug 20 15:43:43 1999 Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM) id AA10428; Fri, 20 Aug 1999 15:43:43 -0700 Message-Id: <9908202243.AA10428@youra.pa.dec.com> To: http-delta Subject: Re: Delta encoding in HTTP From: Fred Douglis Date: Fri, 20 Aug 99 15:43:43 -0700 Sender: mogul X-Mts: smtp And speaking of the I-D, you should probably know that a patent that we (myself, Misha, Gaurav, Phong, and Jagadish) filed way back when we first did the optimistic delta work. It issued earlier this month: ``Method for reducing the delay between the time a data page is requested and the time the data page is displayed,'' U.S. patent 5,931,904, August 3, 1999. Its impact on the I-D is left as an exercise for the reader -- after all this time and multiple iterations, I'm not even sure I know myself what this thing claims! But I think it's fair to say that AT&T will be reasonable about it and wants to see this technology through the IETF. I can't speak for AT&T in an official capacity as far as licensing goes, however. Fred From mogul Fri Aug 20 15:51:10 1999 Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM) id AA10490; Fri, 20 Aug 1999 15:51:10 -0700 Message-Id: <9908202251.AA10490@youra.pa.dec.com> To: http-delta Subject: FYI: IETF rules on patents Date: Fri, 20 Aug 99 15:51:10 -0700 From: mogul X-Mts: smtp Since the issue of patents relevant to delta encoding has come up: From RFC2026, "The Internet Standards Process -- Revision 3" 10.3.2. Standards Track Documents (A) Where any patents, patent applications, or other proprietary rights are known, or claimed, with respect to any specification on the standards track, and brought to the attention of the IESG, the IESG shall not advance the specification without including in the document a note indicating the existence of such rights, or claimed rights. Where implementations are required before advancement of a specification, only implementations that have, by statement of the implementors, taken adequate steps to comply with any such rights, or claimed rights, shall be considered for the purpose of showing the adequacy of the specification. (I'm not sure if this is the only IETF rule regarding patents relevant to proposals for standards!) At any rate, I'd encourage Fred or one of the other AT&T people to take a look at the claims in their patent, at some point not too far in the future, to see if there is anything there that would affect the proposal we've been working on. It may be that the difference between "optimistic deltas" claimed in the AT&T patent, and the "non-optimistic" approach in the current proposal, avoid problems. Likewise, I'm going to assume that the Open Software Associates Limited patent doesn't conflict with this proposal, unless someone from OSA thinks we should assume otherwise. -Jeff From yarong@exchange.microsoft.com Sat Aug 21 15:48:54 1999 Received: from pobox1.pa.dec.com by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM) id AA19608; Sat, 21 Aug 1999 15:48:54 -0700 Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA01399; Sat, 21 Aug 1999 15:48:54 -0700 Received: from doggate.exchange.microsoft.com (doggate.exchange.microsoft.com [131.107.88.55]) by mail2.digital.com (8.9.2/8.9.2/WV2.0g) with ESMTP id PAA27202; Sat, 21 Aug 1999 15:48:53 -0700 (PDT) Received: by doggate.exchange.microsoft.com with Internet Mail Service (5.5.2232.9) id ; Sat, 21 Aug 1999 15:48:10 -0700 Message-Id: <078292D50C98D2118D090008C7E9C6A603C964EF@STAY.platinum.corp.microsoft.com> From: "Yaron Goland (Exchange)" To: "'mogul@pa.dec.com'" , http-delta@pa.dec.com Subject: Patent Issues Date: Sat, 21 Aug 1999 15:47:58 -0700 Mime-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2232.9) Content-Type: text/plain; charset="iso-8859-1" Folks I really don't want to know about these patents. So far I have managed to delete every piece of mail involving any reference to a patent, potential or issued. I don't know what the patents cover and I don't want to know. I leave that nonsense to the lawyers. So if you could please do me the kindness of putting the word "patent" into the subject line so I can delete your e-mail unread I and my attorneys would appreciate it. Thanks, Yaron From cjh@osa.com.au Sun Aug 22 18:13:21 1999 Received: from pobox1.pa.dec.com by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM) id AA20382; Sun, 22 Aug 1999 18:13:21 -0700 Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA14159; Sun, 22 Aug 1999 18:13:20 -0700 Received: from osa.osa.com.au (osa.osa.com.au [203.6.130.129]) by mail1.digital.com (8.9.2/8.9.2/WV2.0g) with ESMTP id SAA01439 for ; Sun, 22 Aug 1999 18:13:14 -0700 (PDT) Received: (from uucp@localhost) by osa.osa.com.au (8.8.5/8.6.9) id LAA21644 for ; Mon, 23 Aug 1999 11:13:09 +1000 Received: from UNKNOWN(15.16.33.1), claiming to be "redgum.osa.com.au" via SMTP by osa.osa.com.au, id smtpda21582; Mon Aug 23 01:13:07 1999 Received: from magpie.osa.com.au (magpie.osa.com.au [15.16.36.3]) by redgum.osa.com.au (8.6.9/8.6.9) with ESMTP id LAA13148 for ; Mon, 23 Aug 1999 11:10:25 +1000 Received: from magpie.osa.com.au ([127.0.0.1]) by magpie.osa.com.au with esmtp (ident cjh using rfc1413) id m11Iicz-0001frC (Debian Smail-3.2.0.102 1998-Aug-2 #2); Mon, 23 Aug 1999 11:10:25 +1000 (EST) Message-Id: To: http-delta@pa.dec.com Subject: Re: FYI: IETF rules on patents In-Reply-To: Your message of "Fri, 20 Aug 1999 15:51:10 MST." <9908202251.AA10490@youra.pa.dec.com> Date: Mon, 23 Aug 1999 11:10:25 +1000 From: Clifford Heath > Likewise, I'm going to assume that the Open Software Associates > Limited patent doesn't conflict with this proposal, unless > someone from OSA thinks we should assume otherwise. Nothing in the current draft conflicts with our patent application. If I propose material that would conflict, it will be under open licencing terms (to be decided). But at present, I just want some tweaks that generalise the proposed standard to make it more effective for rsync-like delta computation. ------------------------------------------------------------ Clifford Heath http://www.osa.com.au/~cjh Open Software Associates Limited mailto:cjh@osa.com.au 29 Ringwood Street / PO Box 4414 Phone +613 9871 1694 Ringwood VIC 3134 AUSTRALIA Fax +613 9871 1711 ------------------------------------------------------------ Proven Solution Deployment for the Global Enterprise From mogul@pa.dec.com Mon Aug 23 14:55:13 1999 Received: from pobox1.pa.dec.com by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM) id AA23517; Mon, 23 Aug 1999 14:55:13 -0700 Received: from youra.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA27124; Mon, 23 Aug 1999 14:55:13 -0700 Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM) id AA23515; Mon, 23 Aug 1999 14:55:11 -0700 Message-Id: <9908232155.AA23515@youra.pa.dec.com> To: Clifford Heath Cc: http-delta@pa.dec.com Subject: Relationship between current Delta Encoding draft and OSA's design In-Reply-To: Your message of "Fri, 20 Aug 99 15:30:05 PDT." <9908202230.AA10103@youra.pa.dec.com> Date: Mon, 23 Aug 99 14:55:11 -0700 From: mogul@pa.dec.com X-Mts: smtp Clifford Heath wrote: I understand that there is no formal working group for your proposals. Please reply indicating your interest in discussing our work and the processes by which we can contribute to formulating an RFC that includes some of the advantages I have mentioned. and also But at present, I just want some tweaks that generalise the proposed standard to make it more effective for rsync-like delta computation. If you have specific changes that you would like to propose to make to the current draft (draft-mogul-http-delta-01.txt), please suggest them on this mailing list. My inclination would be to suggest that any major changes be proposed in the context of another document, rather than as a revision to draft-mogul-http-delta-01.txt -- it sounds from your relatively vague description that you are proposing a distinctly different mechanism. Our main concern at this point is to avoid putting anything in the Delta Encoding spec that would create significant problems for other extensions to HTTP, such as the OSA design or the "Rsync in HTTP" design. At the moment, my assumption is that there is no such conflict. -Jeff From mogul@pa.dec.com Mon Aug 23 15:04:16 1999 Received: from pobox1.pa.dec.com by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM) id AA23718; Mon, 23 Aug 1999 15:04:16 -0700 Received: from youra.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA11173; Mon, 23 Aug 1999 15:04:16 -0700 Received: from localhost by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM) id AA23662; Mon, 23 Aug 1999 15:04:14 -0700 Message-Id: <9908232204.AA23662@youra.pa.dec.com> To: http-delta@pa.dec.com Subject: Relationship between Rsync and Delta Encoding in HTTP In-Reply-To: Your message of "Fri, 20 Aug 99 15:31:07 PDT." <9908202231.AA10055@youra.pa.dec.com> Date: Mon, 23 Aug 99 15:04:14 -0700 From: mogul@pa.dec.com X-Mts: smtp Andrew Tridgell writes: for those of you who haven't see the rproxy paper I gave at CALU you can grab a copy at ftp://samba.org/pub/tridge/rproxy/ Thanks for the pointer; can you define "CALU" for those of us not aware of this? I gathered from this paper that your proposal involves adding a new content-coding ("rsync") and a new HTTP header ("Rsync-Signature", although perhaps you should think of a shorter name?) The paper doesn't give a careful specification, and you also wrote: > - difference files are cachable by unmodified existing web caches. we have a way of doing that in rproxy (using a content-encoding trick) although I'm the first to admit it's a bit of a hack. I'll be interested to see what your solution is. so it sounds like there are protocol details that aren't obvious from the paper. Again, I would ask whether you seen any potential conflict between the draft-mogul-http-delta-01.txt specification, and your own work. If not, we should probably not try to couple the two proposals. My plan at this stage is to play some more with the design of rproxy then to implement it as a patch to squid, apache and mozilla. I'll then get it deployed on some really large sites and see how it stands up to a real battering. I haven't yet started to look into a standardisation process because I wanted the protocol extensions to be well and truly tested before going down that path. I agree, don't try to standardize too soon. However, it might help other people to critique your design if you could provide a preliminary specification. -Jeff From tridge@samba.anu.edu.au Thu Aug 26 21:12:53 1999 Received: from pobox1.pa.dec.com by youra.pa.dec.com; (5.65v3.2/1.1.8.2/06Jun96-0357PM) id AA03292; Thu, 26 Aug 1999 21:12:53 -0700 Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA26460; Thu, 26 Aug 1999 21:12:51 -0700 Received: from samba.anu.edu.au (samba.anu.edu.au [150.203.164.44]) by mail2.digital.com (8.9.2/8.9.2/WV2.0g) with ESMTP id VAA20215; Thu, 26 Aug 1999 21:12:39 -0700 (PDT) Received: (from localhost user: 'tridge', uid#148) by samba.anu.edu.au id ; Fri, 27 Aug 1999 13:15:07 +1000 Sender: Andrew Tridgell From: To: mogul@pa.dec.com Cc: http-delta@pa.dec.com In-Reply-To: <9908232204.AA23662@youra.pa.dec.com> (mogul@pa.dec.com) Subject: Re: Relationship between Rsync and Delta Encoding in HTTP Reply-To: tridge@linuxcare.com References: <9908232204.AA23662@youra.pa.dec.com> Message-Id: <19990827031517Z12869326-13538+45@samba.anu.edu.au> Date: Fri, 27 Aug 1999 13:15:07 +1000 Sorry for the slow reply on this, I've been at a couple of US conferences. I'm on a plane on the way back now :) > Thanks for the pointer; can you define "CALU" for those of us > not aware of this? CALU is "conference of australian linux users". I know it wasn't exactly the best forum for introducing this stuff, it just happened to be the first conference I was going to after Peter and I did the work. > I gathered from this paper that your proposal involves adding > a new content-coding ("rsync") and a new HTTP header ("Rsync-Signature", > although perhaps you should think of a shorter name?) The > paper doesn't give a careful specification, and you also wrote: yes, that's right. I'm not fussed about the name of the header, and I quite deliberately don't tie down the exact spec just yet as I'm looking for comments on the general method rather than precisely what each bit on the wire should mean. > so it sounds like there are protocol details that aren't obvious > from the paper. that is certainly true. Everything we've actually implemented is available in the rproxy cvs area (also available as ftp://samba.org/pub/unpacked/rproxy/) so you can see specifics there, but I should once again point out that although the implementation does work (and a few people actively use it) it is far from complete. > Again, I would ask whether you seen any potential conflict between > the draft-mogul-http-delta-01.txt specification, and your own work. > If not, we should probably not try to couple the two proposals. I'll answer that in a separate email when I get home and have a copy of the draft in front of me. > I agree, don't try to standardize too soon. However, it might > help other people to critique your design if you could provide > a preliminary specification. the code provides that to a large extent (it is deliberately kept very simplistic) but I do plan on writing a better spec once I've finished with the few conferences I'm doing at the moment. Cheers, Tridge From mogul@pa.dec.com Fri Oct 1 15:55:59 1999 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA01209; Fri, 1 Oct 1999 15:55:59 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA19715; Fri, 1 Oct 1999 15:55:58 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA16282; Fri, 1 Oct 1999 15:55:58 -0700 (PDT) From: Jeffrey Mogul Message-Id: <199910012255.PAA16282@wera.pa.dec.com> To: danielh@crosslink.net Cc: http-delta@pa.dec.com Subject: Re: Comments on Delta Encoding in HTTP In-Reply-To: Your message of "Fri, 20 Aug 1999 15:33:15 PDT." <9908202233.AA10154@youra.pa.dec.com> Date: Fri, 01 Oct 1999 15:55:58 -0700 X-Mts: smtp I'm really sorry that it took me 6+ weeks to respond to your message. My only excuse is that this isn't exactly my "day job". Let me start by commending the quality of the writing, it's generally quite good. Aside from my major comment, most of my comments reflect problems I had comprehending the complete picture. Thanks. Lastly, I'm considering an experimental implementation of this proposal in my "SRE-http" web server. Do you anticipate any major changes to this proposal (abstracting from the major change I mention below!) No. Major Comment: The proposed use of templates is problemmatic. In particular, an additional GET is required for each DTemplate, with no guarantee that the results of this GET will ever be used. Even if a well designed user-agent makes these requests in a way that does not effect the client's percieved response time, these extra requests will reduce available bandwidth for everyone else. Was any thought given to a scheme where the template is returned first, after which the client requests a delta against this template? Alternatively, the template and the delta could be returned as a multi-part document. The template mechanism is intended to support approaches such as HTML Macros: 9. Fred Douglis, Antonio Haro, and Michael Rabinovich. HPP: HTML Macro-Preprocessing to Support Dynamic Document Caching. Proc. USENIX Symposium on Internet Technologies and Systems, USENIX, Monterey, CA, December, 1997, pp. 83-94. and you should read that paper to get a better sense of why it might or might not pay off. The basic idea is to significantly reduce the bandwidth requirements for repeated accesses to a group of similar pages, but this approach does NOT try to minimize the number of HTTP requests. Perhaps this would use a new status code, say 228 Delta Template. Also, the client could signal it's unwillingness to accept this "delta template response" via a new request header, or with a "no-template" Accept-encoding? I think we already addressed this issue with: Note that an origin server ought not necessarily send a DTemplate header field on every response; doing so could waste network bandwidth, if the recipient is not delta-capable. Instead, the server should employ heuristics to decide whether to send this header field. For example, it might be worth sending it whenever the client's request message indicates its willingness to accept a delta-encoded response, and when the If-None-Match field in the request does not already specify the entity-tag of the template resource. I'd encourage you to work with Fred Douglis and his co-authors on the HTML Macros paper, if you think that this part of the draft is badly designed. Note that it's completely optional, though. Minor comments: 1) Page 7, definition of instance. It would be useful to further clarify the relationship between resources, instances, and entities. For example: "One can think of an instance as a snapshot in the life of a resource. Diagramattically: a resource -- yields --> an instance, an instance -- yields --> an entity. where the entity incorporates the effects of content-encodings and range extractions. I'll try to add some clarification here. 2) Page 9, point 5. the phrase "... and an appropriately encoded body" is a bit terse. Perhaps "... and the appropriate range(s) from the possibly encoded body." Ditto. 3) Page 10, before section 5. It might be useful to remind the reader that the client should decode using the reverse order of methods listed in the Content-encoding header. Thus, given a response header of Content-encoding: vcgiff,gzip the client should first decompress, and then apply the reverse delta encoding. Good point - not that we think that client implementors are stupid, but it's probably a good idea to make this explicit. 4) Page 14, para 3. The sentence (Presumably, the interrupted response used the same delta encoding, if any.) seems too weak; it is hard to image any other circumstance for which a client would want a range of a delta content-encoding delta (since the entity body returned by the server is a piece of the delta). Hence, instead of "Presumably", how about "It is expected that ". That's basically the dictionary definition of "presumably". Basically, it took me more then one read to understand that a range of a delta content-encoding means something like "return a piece of the output of vcdiff"; implying that the client supplied byte range has little to do with the byte range of the newly created instance. Reiterating this may be overkill, but not letting people think otherwise is important. I don't pretend that this is simple stuff, but I'm not sure what to say that hasn't already been said. 5) 226 and 227 status codes Why use two new status codes, instead of one. That is, use 226 for both delta and range of delta, and add a new response header (or modify an existing response header) to indicate that "this is a range response". I would guess that using 227 provides a useful hint to clients (that they should look for a Content-Range header)? Yes, this is by analogy with the 200/206 distinction for supporting range responses without deltas. I think it is generally better to make things as explicit as possible, especially if this can be done without actually adding more fields to the headers. 6)Page 18, second paragraph For clarity sake, how about this change: Suppose that the server's current instance has entity tag "B", and that the server also has retained a copy of the instance with entity tag "A". Then, the server could compute the difference between "B" and "A" and respond with: Good suggestion. 7) Top of page 19. Further stressing the concept of range of delta content encoding, how about: selection, and returns a 227 (Range of Delta) response with an entity body containing bytes 900 to 999 of the vcdiff computed difference. OK. 8) Page 22 LRU is never defined (least recently used?) Sorry (and you're right about the definition). 9) Page 24. It might be useful to add this example: Suppose : a)the client requests /help/foo.bar, and the server responds with: HTTP/1.1 200 OK Etag: "abc" b)the client then requests /help/fun.bar, and the server responds with: HTTP/1.1 200 OK Etag: "efg" DCluster: "//bar.example.net/help/" c) Then, if the client re-requests /help/foo.bar, it should add a request header of: If-none-match:"abc","efg" Actually, I'm not sure this necessarily makes sense, but it points out a question that is currently only implicit in the spec: for what period of time is a DCluster header value valid? We don't want the DCluster header applicability to expire when the response that it came with expires, because this would severely limit the utility of delta encoding. However, a server knows at least that, at the time that it first sends a DCluster header, it has to start maintaining the constraints on entity tags implied by the header value (i.e., that entity tags issued for resources covered by the header field are unique). But what can we say about its meaning with respect to entity tags that were put into the cache *before* the DCluster header was received? In other words, in your example above, suppose step (a) takes place many days (or months or years) before step (b). And note that if the client is re-requesting /help/foo.bar in step (c), that's probably because the cache entry created in step (a) has expired. So we are in a situation where there is no obvious way to guarantee that the constraint on entity-tag values implied by the DCluster header received at step (b) actually applies to the entity tag received in step (a). It might, but it might not, and if we allow an arbitrarily long gap here, then we make it impossible for a server administrator to forget about any previously-issued entity tag, no matter how long ago. So I see two solutions: come up with a mechanism that lets the server specify how far back in time to go in applying a DCluster header, or simply to say that it never applies to previously received cache entries. I'd vote for the latter, and I'll add something to the effect that The uniqueness scope specified by a DCluster header is valid only for entity tags received in the same response or in subsequent responses, never for entity tags received in previous responses. and, by analogy The URI specified by a DTemplate header is valid only for use with entity tags received in the same response or in subsequent responses, never for use with entity tags received in previous responses. How about that? 10) Page 25. It might be useful to add (after the "It would not make sense paragraph") a reminder that use of broad uniqueness scope may also increase the work the server must do to ensure that no two URIs yield the same entity tag. I this is such a small point, in comparison to the other reason for not having an over-broad uniqueness scope, that it isn't worth saying. Also, IETF specs typically concentrate on issues of interoperability and network efficiency, and give implementors as much freedom as they want to create work for themselves. 11) Page 28. In the example containing DTemplate: "http://bar.example.net/foo.tplt"/etag="pqr" What if an etag of "zyx" is returned by the server in response to a request for foo.tplt? Should the next request for foo.html use If-none-match: "pqr" or If-none-match" "zyx". I'ld think the latter, but the example does not make that clear. I think it should be obvious that if you wait too long, the server might replace the base instance and delete the original one (i.e., make the instance with etag="pqr" unavailable). The issue here, then, isn't whether the client should use one or the other in its If-None-Match header (I think the protocol needs to work right in either case), but rather what the server should return if the client says If-none-match: "pqr", and that instance no longer exists. And the answer to that is obvious, you get back a status-200 response. Again, the IETF practice is to avoid specifying stuff that isn't required for interoperability or overall performance, so I think we can leave the client implementors some freedom in this respect. Also, the following modification may be useful: This means that for any Request-URI matching the prefix specified in the Dcluster header field, the URI specified in the DTemplate field is an appropriate template; and If-none-match should use "pqr" (assuming that "pqr" is the etag for foo.tplt). I'll add something like that. 12) Pag 32, 12.3.1 The phrase "... or if the uniqueness scope for an entity tag of any instance of the requested resource has ever included aonther resource". seems unnecessary. That is, if the client only included one etag in the If-none-match, and the server didn't make any uniqueness scope errors, why would there ever be any ambiguity (since the client has the entity associated with this single etag, as does the server)? I'm not 100% sure of reasoning behind this (even though I probably wrote it myself), but I think the concern was what would happen if the response ended up in a proxy cache. It would then be hard to know whether it could be safely used later on. I suppose we could give the origin server the option of either including the Delta-Base header or a "Vary: if-none-match" header (except that the Vary header would have to list a number of fields that could potentially carry the request etag!). Again, I think it's safer to make this information explicit, since if there is any ambiguity it is likely to lead to cache transparency errors sooner or later. 13) Page 32, 12.3.2 The descripton of Dcluster never mentions it's prime purpose -- to identify an "etag" to use for a set of Request-Uris. Instead, 12.3.2 is framed in terms of uniqueness scopes. While these are important constraints affecting DCluster, it's not the reason one would use DCluster. Huh? A DCluster header *never* specifies an etag, it *always* specifies a set of Request-URIs. Which is, by definition, a uniqueness scope. -Jeff From mogul Thu Oct 7 13:34:45 1999 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA28488; Thu, 7 Oct 1999 13:34:45 -0700 (PDT) From: Jeffrey Mogul Message-Id: <199910072034.NAA28488@wera.pa.dec.com> To: http-delta Subject: Slightly revised HTTP Delta Encoding draft Date: Thu, 07 Oct 1999 13:34:45 -0700 X-Mts: smtp Based on the comments from several of you, I've made some minor revisions to the HTTP Delta Encoding draft; the revised version is temporarily on: ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-delta-02.txt If nobody objects, I'll submit this draft to the IETF within the next few days. Then maybe Bala can take care of asking the IESG to bless this as a proposed standard. We'll probably also need to issue revised versions of the digest and vcdiff drafts, since they have both technically expired. Thanks -Jeff P.S.: Yes, I know that reference 10 in the draft above may be wrong; I've fixed it but it takes a while to propagate the new version through our firewall. From mogul Mon Oct 25 18:01:50 1999 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA03501; Mon, 25 Oct 1999 18:01:50 -0700 (PDT) From: Jeffrey Mogul Message-Id: <199910260101.SAA03501@wera.pa.dec.com> To: http-delta Subject: draft-mogul-http-delta-02.txt released by IETF Date: Mon, 25 Oct 1999 18:01:50 -0700 X-Mts: smtp The latest revision of "Delta encoding in HTTP" is available at http://www.ietf.org/internet-drafts/draft-mogul-http-delta-02.txt I believe that Bala intends to asked the IESG to consider this version as a Proposed Standard. -Jeff From mogul Tue Dec 7 15:35:00 1999 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA29555; Tue, 7 Dec 1999 15:35:00 -0800 (PST) Message-Id: <199912072335.PAA29555@wera.pa.dec.com> To: http-delta From: X-original-Date: Fri, 12 Nov 1999 14:11:28 -0500 X-originally-To: mogul@pa.dec.com Subject: a delta encoding and range conundrum... X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v1.61 b62 Date: Tue, 07 Dec 1999 15:35:00 -0800 Sender: mogul X-Mts: smtp While implementing delta encoding, this transfer delta encoding conudundrum arose: Consider a request: get /foo/bar.html http/1.1 host: mysite.net If-none-match: "ver1" TE: diff-e where we assume that "ver1" is a prior instance of /foo/bar.html Let's suppose that the current instance of foo.bar is different then the "ver1" instance, and let's assume that it has an etag of "ver2". Let's assume that bytes 500 to 500 of the "ver2" instance are different then the "ver1" intsance. Then the response could be something lke: http/1.1 226 Delta Delta-base: "ver1" Transfer-encoding: diff-e Content-length: 98 Etag: "ver2" Instead, suppose the client requests a range, using: get /foo/bar.html http/1.1 host: mysite.net If-none-match: "ver1" Range: bytes=200-299 TE: diff-e This range hasn't changed; hence the DIFF -e is empty. That is, bytes 200-299 of "ver1" are the same as bytes 200-299 of "ver2". So what should the server do? I can see 3 possibilities: a) return an empty response, and assume the client will take this as meaning "no difference" b) return a 304, with an etag of "ver1", and hope the client will assume that this means that the requested range has not changed (with no implications as to the rest of the resource). c) Avoid the hassle, and send the whole thing (don't do any encoding) Solution b makes the most sense, but it does depend on the client agreeing to this interpretation (in the context of a range request). ----------------------------------------------------------- danielh@crosslink.net ----------------------------------------------------------- From mogul Tue Dec 7 15:37:03 1999 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA06083; Tue, 7 Dec 1999 15:37:03 -0800 (PST) From: Jeffrey Mogul Message-Id: <199912072337.PAA06083@wera.pa.dec.com> To: cc: http-delta Subject: Re: a delta encoding and range conundrum... In-reply-to: Your message of "Fri, 12 Nov 1999 14:11:28 EST." <199911121952.OAA12695@lycanthrope.crosslink.net> Date: Tue, 07 Dec 1999 15:37:03 -0800 X-Mts: smtp Sorry it took me so long to reply to this. I've been postponing work on the Delta draft because I had other deadlines that had to be settled first. You wrote: While implementing delta encoding, this transfer delta encoding conudundrum arose: Consider a request: get /foo/bar.html http/1.1 host: mysite.net If-none-match: "ver1" TE: diff-e where we assume that "ver1" is a prior instance of /foo/bar.html Let's suppose that the current instance of foo.bar is different then the "ver1" instance, and let's assume that it has an etag of "ver2". Let's assume that bytes 500 to 500 of the "ver2" instance are different then the "ver1" intsance. Then the response could be something lke: http/1.1 226 Delta Delta-base: "ver1" Transfer-encoding: diff-e Content-length: 98 Etag: "ver2" That example is actually illegal, because of this from section 4.4 from RFC2616: The transfer-length of a message is the length of the message-body as it appears in the message; that is, after any transfer-codings have been applied. When a message-body is included with a message, the transfer-length of that body is determined by one of the following (in order of precedence): [...] 2.If a Transfer-Encoding header field (section 14.41) is present and has any value other than "identity", then the transfer-length is defined by use of the "chunked" transfer-coding (section 3.6), unless the message is terminated by closing the connection. 3.If a Content-Length header field (section 14.13) is present, its decimal value in OCTETs represents both the entity-length and the transfer-length. The Content-Length header field MUST NOT be sent if these two lengths are different (i.e., if a Transfer-Encoding header field is present). If a message is received with both a Transfer-Encoding header field and a Content-Length header field, the latter MUST be ignored. That is, you can't send both Transfer-Encoding and Content-Length! However, I think this bug in your example is unrelated to the main issue. Instead, suppose the client requests a range, using: get /foo/bar.html http/1.1 host: mysite.net If-none-match: "ver1" Range: bytes=200-299 TE: diff-e This range hasn't changed; hence the DIFF -e is empty. That is, bytes 200-299 of "ver1" are the same as bytes 200-299 of "ver2". So what should the server do? I can see 3 possibilities: a) return an empty response, and assume the client will take this as meaning "no difference" b) return a 304, with an etag of "ver1", and hope the client will assume that this means that the requested range has not changed (with no implications as to the rest of the resource). c) Avoid the hassle, and send the whole thing (don't do any encoding) Solution b makes the most sense, but it does depend on the client agreeing to this interpretation (in the context of a range request). I'll start by reminding you about section 4 of the draft (titled "Relationship between content-coding, transfer-coding, and ranges"). Remember that transfer-codings, unlike Content-codings, are hop-by-hop. This is a key distinction, because if you really are talking about using a transfer-coding here, then the decision about whether to apply the delta transfer-coding MUST be made after all of the other relevant decisions (in particular, the choice between 200, 226, and 304 status codes). So one important tool in thinking about this is that the example has to work if you remove all of the transfer coding stuff. I think (b) makes no sense at all in this case, since the underlying resource variant has definitely changed (by the way you set up the example). If you used no transfer codings, it would be entirely wrong to send 304, and adding a transfer coding isn't allow to change this. (c) is always legal, but it seems like a cop out. So what is the "right" way to use a delta transfer-coding in this example? We start by constructing the respose that the server would send without the transfer-coding: HTTP/1.1 206 Partial Content Etag: "ver2" Content-type: text/html Content-Length: 100 Content-Range: bytes=200-299/1234 Date: whatever <100 bytes of content> This is the result of steps 1-6 in section 4 of the draft. Now, because we want to try to apply a delta transfer-coding (step 7), we would do the following: (7a) identify the base instance for the delta, which in this case is "ver1" (7b) make sure that we have a copy of that base instance; this might not be possible if the transfer-coding is being originally applied at an intermediate proxy cache! (7c) generate the required sub-range (bytes 200-299) of the base instance ("ver1"). (7d) compute the delta between the result of step 7c and the "<100 bytes of content>" resulting from step 6. I'll assume you want to use diff-e here. Let's assume that the result of this requires 17 bytes. (7e) replace the body resulting from step 6 with the output from 7d, using a chunked encoding (because of the requirements of section 4.4 of RFC2616), add these fields to the response headers: Transfer-encoding: diff-e Delta-base: "ver1" and (because of RFC2616, section 4.4), remove the Content-length header. So, the result would be HTTP/1.1 206 Partial Content Etag: "ver2" Content-type: text/html Content-Range: bytes=200-299/1234 Transfer-encoding: diff-e Delta-base: "ver1" Date: whatever 17 <17 bytes of diff-e result> 0 Note that section 4 of the draft says: Ranges are used for two main purposes: 1. to complete a partial response after a premature termination of a message 2. to obtain just selected sections of an instance. The former use of Range is consistent with the use of delta encoding as a content-coding; the latter requires the use of delta encoding as a transfer-coding. Implicitly, your example falls into case (2), since it doesn't make sense to use If-none-match in case (1) - you'd probably be using If-Range in that case, or perhaps If-Match (in the subcase where the client doesn't want to receive anything if the underlying resource has changed). All of this brings out one point, which is perhaps implicit in the current delta draft, but which probably needs to be made explicit. This is the use of "Delta-base" in a response using a delta transfer-coding. In your example, since the request only identifies one possible base version, the Delta-base response header is superfluous; the requester knows what the only possible base version is. But if the requester had sent, e.g., get /foo/bar.html http/1.1 host: mysite.net If-none-match: "ver1", "ver0" Range: bytes=200-299 TE: diff-e then it would not be possible to use a delta encoding without sending Delta-Base. Currently, the delta draft spec (section 12.3.1) only discusses the use of Delta-Base in conjunction with delta content-codings. But there doesn't seem any reason not to include it in responses using delta tranfer-codings, as long as the recipient strips the Delta-Base header if it also strips the delta transfer-coding. That is, I would change this, in section 12.3.1: A Delta-Base header field MUST be included in a 226 (Delta) or 227 (Range of Delta) response if the request included more than one entity tag in its If-None-Match header field, or if the uniqueness scope for an entity tag of any instance of the requested resource has ever included another resource. Any 226 or 227 response MAY include a Delta-base header. to this: A Delta-Base header field MUST be included in a 226 (Delta) or 227 (Range of Delta) response, or in a response that uses a delta transfer-coding, if the request included more than one entity tag in its If-None-Match header field, or if the uniqueness scope for an entity tag of any instance of the requested resource has ever included another resource. Any 226 or 227 response MAY include a Delta-base header. A Delta-Base header MAY be included in a response using a delta transfer-coding, but if so, and if a forwarding agent also removes the delta transfer-coding, the Delta-Base header MUST be removed before the message is forwarded. OK? -Jeff From mogul Wed Dec 8 15:08:52 1999 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA04727; Wed, 8 Dec 1999 15:08:52 -0800 (PST) From: Jeffrey Mogul Message-Id: <199912082308.PAA04727@wera.pa.dec.com> To: danielh@crosslink.net cc: http-delta Subject: Re: a delta encoding and range conundrum... In-reply-to: Your message of "Wed, 08 Dec 1999 00:09:11 EST." <199912080525.AAA21225@lycanthrope.crosslink.net> Date: Wed, 08 Dec 1999 15:08:52 -0800 X-Mts: smtp >Remember that transfer-codings, unlike Content-codings, are hop-by-hop. >This is a key distinction, because if you really are talking about using >a transfer-coding here, then the decision about whether to apply the >delta transfer-coding MUST be made after all of the other relevant >decisions (in particular, the choice between 200, 226, and 304 status >codes). So one important tool in thinking about this is that the example >has to work if you remove all of the transfer coding stuff. I definitely missed that: that 226 and 227 are ONLY used when a delta content-encoding has been applied. I think that should be stated explicitily somewhere (if you'ld like, I'll look for a good place to put such a reminder). Please feel free to suggest something (the more specific, the better, although I might want to edit it). >So, the result would be > HTTP/1.1 206 Partial Content > Etag: "ver2" > Content-type: text/html > Content-Range: bytes=200-299/1234 > Transfer-encoding: diff-e > Delta-base: "ver1" > Date: whatever > 17 > <17 bytes of diff-e result> > 0 Shouldn't that be: Transfer-encoding: diff-e,chunked Yup, my mistake. One small point -- on my platform, DIFF -e foo.1 copy_of_foo.1 yields an empty string (a 0 length response). So you'ld end up chunking an empty string and hope the recipient figures out that a chunked "empty string" means "no change". Presumably, all users of a delta coding (including diff -e) agree on the meaning of all legal coding outputs. Thanks -Jeff From mogul Fri Mar 10 10:07:39 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA16314; Fri, 10 Mar 2000 10:07:39 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003101807.KAA16314@wera.pa.dec.com> To: http-delta Subject: Delta-encoding: revised drafts submitted to the IETF Date: Fri, 10 Mar 2000 10:07:39 -0800 X-Mts: smtp It took us way too long, but ... Almost 4 hours before the deadline for submitting Internet-Drafts prior to the next IETF meeting, we've submitted the following revised drafts to the IETF: draft-mogul-http-delta-03.txt ("Delta encoding in HTTP") draft-mogul-http-digest-02.txt ("Instance Digests in HTTP") draft-korn-vcdiff-01.txt ("The VCDIFF Generic Differencing and Compression Data Format") These three documents constitute the core of the Delta-encoding specification. Once these have made it through the queue of pending I-D announcements, the plan is to issue a "last call" for advancing delta-encoding on the IETF standards track, and then to submit a request to the IESG. I believe that Bala had volunteered to do these steps. -Jeff From mogul Mon Mar 20 18:06:56 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA26378; Mon, 20 Mar 2000 18:06:56 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003210206.SAA26378@wera.pa.dec.com> To: http-delta Subject: URLs for the revised Delta-encoding Internet-Drafts Date: Mon, 20 Mar 2000 18:06:56 -0800 X-Mts: smtp In case anyone needs these URLs for reference purposes: Title : Instance Digests in HTTP Author(s) : J. Mogul, A. van Hoff Filename : draft-mogul-http-digest-02.txt Pages : 12 Date : 14-Mar-00 http://www.ietf.org/internet-drafts/draft-mogul-http-digest-02.txt and Title : Delta encoding in HTTP Author(s) : J. Mogul, B. Krishnamurthy, Y. Goland, A. van Hoff, F. Douglis, A. Feldmann Filename : draft-mogul-http-delta-03.txt Pages : 45 Date : 14-Mar-00 http://www.ietf.org/internet-drafts/draft-mogul-http-delta-03.txt Note that Yaron Goland is no longer at Microsoft (I discovered this after the I-D submission deadline), but may be reached as yaron@goland.org I will update this in the next draft. -Jeff From danielh@crosslink.net Mon Mar 20 21:37:14 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id VAA02902; Mon, 20 Mar 2000 21:37:14 -0800 (PST) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA06803; Mon, 20 Mar 2000 21:37:14 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id VAA05549 for ; Mon, 20 Mar 2000 21:37:13 -0800 (PST) Received: from smtp.crosslink.net (dyn56.c5200-1.springfield.236.crosslink.net [207.199.142.57]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id AAA31305 for ; Tue, 21 Mar 2000 00:37:11 -0500 Message-Id: <200003210537.AAA31305@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Tue, 21 Mar 2000 00:34:00 -0500 To: http-delta@pa.dec.com In-Reply-To: <200003210206.SAA26378@wera.pa.dec.com> Subject: A possible problem: when are etags assigned X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Since I'm currently revising the delta-encoding module in my server, the following conunudrum is of immediate concern... My reading of draft-mogul-http-delta-03.txt indicates that there may be a problem concerning the proper moment to assign an etag to an instance. a) Draft 3 suggests that an etag should be assigned BEFORE content encoding (such as GZIP compression), and also before range extraction. b) Iit is not clear whether this reading agrees with the sense of rfc2616. Others (for example, Koen Holtman) have suggested that they read RFC2616 to dictate that an etag be assigned after content-encoding, but before range extraction. So, am I just mis-understanding draft 3 -- in which case it would be useful to add a note clarifying this point. Or, is there really an ambiguity? Personally, I like the idea of assigning an etag before content-encoding; an etag identified, content-encoded (which may mean delta encoded) "entity body" is unlikely to be of future interest to the client; whereas an etag identified unencoded instance may be of great interest (for use as in a future If-None). Roughly speaking, this does mean that the user-agent should cache the de-content-encoded entity body, but the de-content-encoding has to be done anyways. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Tue Mar 21 15:56:22 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA03494; Tue, 21 Mar 2000 15:56:22 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA03399; Tue, 21 Mar 2000 15:56:21 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA03351; Tue, 21 Mar 2000 15:56:21 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003212356.PAA03351@wera.pa.dec.com> To: danielh@crosslink.net Cc: http-delta@pa.dec.com Subject: Re: A possible problem: when are etags assigned In-Reply-To: Your message of "Tue, 21 Mar 2000 00:34:00 EST." <200003210537.AAA31305@lycanthrope.crosslink.net> Date: Tue, 21 Mar 2000 15:56:21 -0800 X-Mts: smtp Ouch. I think you're right, there is a problem here. (On the one hand: why couldn't you have pointed this out a year ago! On the other hand: this is exactly why the IETF process values "working code", and your efforts to implement something have certainly proved useful to this process.) I can't say that I'm entirely sure I've figured this out (I've spent several sessions wandering around in the fresh air and trying to think this through), but here's my current take. The best way that I know how to resolve this kind of question is to look at all the possible choices, and then for each choice, see whether I can construct a plausible scenario that leads to a bad result (or to a contradiction). It's quite clear that the entity tag must be assigned *before* a delta content coding. Otherwise, the entity tag would be useless in deciding how to combine a delta with a previous instance. However, you write: Others (for example, Koen Holtman) have suggested that they read RFC2616 to dictate that an etag be assigned after content-encoding, but before range extraction. I'm not always in agreement with Koen, but this time I think he may be right. Consider this scenario: (1) Content author creates foo.html (2) some software does "gzip -c foo.html >foo.html.gz" Should foo.html and foo.html.gz have the same entity tag? On the one hand, one could argue that these two files represent identical content, but one of them is encoded differently. On the other hand, we have three practical issues that suggest that these two files should not have the same entity tag: (A) RFC2616 section 3.11 (Entity Tags) says: A "strong entity tag" MAY be shared by two entities of a resource only if they are equivalent by octet equality. Unfortunately, since RFC2616 uses an ambiguous definition for "entity", it's a little hard to pin down what this means. Strictly speaking, this might not even allow two different ranges of the same instance to share an entity tag, but that seems preposterous (and Koen seems to agree). However, it does argue against assigning the same entity tag to foo.html and foo.html.gz (B) As a practical matter, I believe that most (all?) existing servers would not recognize that foo.html and foo.html.gz are different encodings of the same content (for one thing, it might be computationally expensive to verify this), and so it would be difficult to get these servers to assign the same entity tag. (C) Although we expect the ultimate client (e.g., browser) that receives a message to be able to decode a content-coding, we can't in general have the same expectation for intermediate proxies - they might not be able to decode all content-codings. And so it would be confusing if a server first sends foo.html with entity tag "XYZZY", and then later sends a range of the same file, with the same entity tag, but with a different content-coding having been applied. I believe that this is not actually an error situation - the proxy could revert to being a non-caching tunnel in this case - but it shows how complex things get if we allow entity tags to be assigned before content-codings. So it looks like we have a contradiction: the entity tag must be assigned before a delta content-coding, but after content-codings in general. Ouch. There are three ways to resolve this contradiction: by kludging (e.g., making delta encoding a special kind of content-coding that is applied after the entity tag is assigned - yuck!), by banning the combination of delta content-coding with any other content-codings (this is probably not a useful approach, or by realizing that it was a mistake to treat delta encoding as a form of content-encoding, after all. The kludge approach might require only minor tweaks to the document, but I think it would lead to a big mess. The last approach seems cleanest, but would require a number of changes, including but certainly not limited to these: (1) Section 4 (Relationship between content-coding, transfer-coding, and ranges) needs to be changed to make it clear that the instance is the result of a possible content coding, not an input to it. (2) Applying similar changes to the I-D on instance digests (we need to be consistent about assigning entity tags and instance digests at the same point!) (3) Creating a new message header (e.g., "DE") so that we would send: HTTP/1.1 226 Delta ETag: "1acl059" DE: vcdiff Delta-base: "337pey" Date: Tue, 25 Nov 1997 18:30:05 GMT and a new non-terminal, e.g., delta-encoding, and changing the BNF so that "vcdiff", etc. are examples of delta-encoding, not content-coding. (4) Various changes to related text, examples, etc. Does this make sense to the rest of you? I guess I have some work cut out. Fortunately (?) the IETF won't be publishing any new I-Ds for about two weeks. -Jeff From mogul Wed Mar 22 10:46:07 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA12789; Wed, 22 Mar 2000 10:46:06 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003221846.KAA12789@wera.pa.dec.com> To: http-delta Subject: PLEASE send Delta-related messages to http-delta@pa.dec.com Date: Wed, 22 Mar 2000 10:46:06 -0800 X-Mts: smtp NOT just to me. I'll be resending a bunch of on-topic messages that others have sent to me. Thanks, -Jeff P.S.: Remember, http-delta-request@pa.dec.com for mailing list additions/deletions/changes. From mogul Wed Mar 22 10:48:01 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA10135; Wed, 22 Mar 2000 10:48:01 -0800 (PST) Message-Id: <200003221848.KAA10135@wera.pa.dec.com> To: http-delta From: Reply-To: danielh@crosslink.net Orginal-Date: Tue, 21 Mar 2000 23:42:50 -0500 In-Reply-To: <200003212356.PAA03351@wera.pa.dec.com> Subject: Re: A possible problem: when are etags assigned X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Date: Wed, 22 Mar 2000 10:48:01 -0800 Sender: mogul X-Mts: smtp >Ouch. I think you're right, there is a problem here. >(On the one hand: why couldn't you have pointed this >out a year ago! On the other hand: this is exactly >why the IETF process values "working code", and your >efforts to implement something have certainly proved >useful to this process.) Alas, I've been laboring under some etag misconceptions. For example, I completely missed the part about etags being assigned before range extraction; which now makes sense to me (i.e.; it allows for range extraction of an index which can then be used to retrieve selected chapters; in the acrobat-selectively- reading-a-large-pdf sense). The realization about content-encoding and etags only came when I read draft 3 and pondered the significance of ...geez, I can't remember just what sub-clause got me wondering. Whatever, it's fortuitous that the latest draft happen to come around just about the time I was tinkering with the delta module (for reasons that had nothing to do with etags!) >It's quite clear that the entity tag must be assigned >*before* a delta content coding. Otherwise, the entity >tag would be useless in deciding how to combine a delta >with a previous instance. Perhaps not useless, but crippled. Allow me to belabor the point, just to make sure... Consider the case: a) at 1PM, the client requests foo.html, recieves a response with an etag of "def" b) at 8PM, the client re-requests foo.html, with If-none: "def" and Accept-encoding: Gdiff He recieves a delta-content-encoded response, with an etag of "ghi" If "ghi" refers to the "pre-encoded instance" from step b, then c) at 9PM, the client can re-re-request foo.html, with If-none: "def","ghi" The server then can use "ghi" (the instance used in step b), which is probably more similar to the current (9PM) instance. However, if "ghi" refers to the actual "entity body" (the difference file returned at 8PM), then "ghi" is almost certainly useless as a base-instance >I'm not always in agreement with Koen, but this time I think he may be >right. >Consider this scenario: > (1) Content author creates foo.html > (2) some software does "gzip -c foo.html >foo.html.gz" >Should foo.html and foo.html.gz have the same entity tag? >On the one hand, one could argue that these two files represent identical >content, but one of them is encoded differently. I like that notion ... but it does require some tortured parsing of what an "entity" (versus an "entity body" and "entity contents") >On the other hand, we have three practical issues that suggest that these >two files should not have the same entity tag: >(A) RFC2616 section 3.11 (Entity Tags) says: > A "strong entity tag" MAY be shared by two entities of a > resource only if they are equivalent by octet equality. >Unfortunately, since RFC2616 uses an ambiguous definition >for "entity", it's a little hard to pin down what this means. >Strictly speaking, this might not even allow two different ranges of the same >instance to share an entity tag, but that seems preposterous (and Koen >seems to agree). However, it does argue against assigning the same >entity tag to foo.html and foo.html.gz So one could justify tortured parsing... and even cite precedent. But I agree, it's not a pleasing argument >(B) As a practical matter, I believe that most (all?) existing servers >would not recognize that foo.html and foo.html.gz are different encodings >of the same content (for one thing, it might be computationally expensive >to verify this), and so it would be difficult to get these servers to >assign the same entity tag. That may not be a disaster -- there's nothing saying that consecutive responses for the same resource must have the same etag; they strongly SHOULD, but it's not illegal if they don't. So if the default behavior of a server is to assign an etag based on file name (and date/size/whatever), then these would get different etags. Admittedly, this does limit how frequently delta encoding will succeed, but I don't see other major problems. >(C) Although we expect the ultimate client (e.g., browser) that receives >a message to be able to decode a content-coding, we can't in general have >the same expectation for intermediate >proxies - they might not be able to decode all content-codings. And so it >would be confusing if a server first sends foo.html with entity tag >"XYZZY", and then later sends a range of the same file, with the same >entity tag, but with a different content-coding having been applied. I >believe that this is not actually an error situation - the proxy could >revert to being a non-caching tunnel in this case - but it shows how >complex things get if we allow entity tags to be assigned before >content-codings. That sounds like a trump argument -- overburdening may break a possibly fragile system of proxies. >So it looks like we have a contradiction: the entity tag must be assigned >before a delta content-coding, but after content-codings in general. >Ouch. Yeah. More complications. >There are three ways to resolve this contradiction: by kludging (e.g., >making delta encoding a special kind of content-coding that is applied >after the entity tag is assigned - yuck!), As an implementor, let me second that. Especially considering all the emphasis put on "you must encode in the same order as the accept-encoding lists". > by banning the combination of >delta content-coding with any other content-codings (this is probably not >a useful approach, More then not useful, but potentially lethal -- I suspect that the average response would benefit more from GZIP then from GDIFF (as an example). Having both is crucial. BTW: the emphasis on vcdiff is frustrating for me -- unless the situation has changed, I can find no samples of a vcdiff encoder. At least GDIFF was easy to understand and fairly easy to implement (given that I had implemented Rsync independently) >by realizing that it was a mistake to treat delta encoding as a form of >content-encoding, after all. The conflict between squeezing more stuff into a box, versus expanding it. >The kludge approach might require only minor tweaks to the >document, but I think it would lead to a big mess. It might not be all that hard to do quick & dirty, but it's got a real spagehtti code flavor to it. >The last approach seems cleanest, but would require a number of changes, >including but certainly not limited to these: >(1) Section 4 (Relationship between content-coding, transfer-coding, and >ranges) needs to be changed to make it clear that the instance is the >result of a possible content coding, not an input to it. Isn't the instance an "input" to content coding, whereas the entity is an "output" from content coding? That is, from section 4 the sequence is: a) use request string to match a resource b) use request headers, etc. to select a variant of the resource After step b, we have an "instance" -- and it would be useful to assign it an "etag" (though precedent suggests that this is not practical) c) Content-encoding (including delta encoding) is done We now have an instance, that can be subject to d) Range extraction. e) Transfer encoding f) etc. >(2) Applying similar changes to the I-D on instance digests >(we need to be consistent about assigning entity tags and >instance digests at the same point!) I haven't read up on instance digests yet... better do that. >(3) Creating a new message header (e.g., "DE") so that we would send: > HTTP/1.1 226 Delta > ETag: "1acl059" > DE: vcdiff > Delta-base: "337pey" > Date: Tue, 25 Nov 1997 18:30:05 GMT >and a new non-terminal, e.g., delta-encoding, and changing the BNF so >that "vcdiff", etc. are examples of delta-encoding, not content-coding. I don't think that's sufficient -- following Koen's rules, the etag from above would be assigned to the "vcdiff'ed" output, not to the current instance (i.e.; to whatever vcdiff compared 337pey to). There needs to be some way to tell the client "here's an identifier for the current instance". For example, one could also add: Itag: "447pey" Or perhaps modify the above: DE: vcdiff="447pey" This would mean "a vcdiff delta encoding was applied to an un-encoded instance which has an etag of as "447pey". Then, in a future request for the same resource (or for a resource in the same uniqueness scope) the client will know that "447pey" is a good candidate to include in an If-none-match: Of course, the client will also have to store the de-vcdiff'ed response as "447pey". Actually, perhaps one could have : content-encoding: diff-e;"447pey",gzip Which might save a few bytes. But I think I like the DE: idea better, it makes it clear that the client should: a) use the entity body, and the delta-base, in a "de-vcdiff" stop b) cache the results of this de-vcdiff, using an entity tag of "447pey" c) upon re-re-request, include "447pey" in an If-None-Match d) 1ac1059 refers to the "difference file" (contained in the "entity body"). It's a bit funny -- announcing the availability of an entity tag for something that never left the server -- that clients/proxies/et-al have to reconstruct. But that's no worse then reconstructing an entity from various parts. One question: how could 1ac1059 be used in the future. It's possible that the client could ask for two deltas --- one delta of current instance against a commly held base instance, and a second delta of this first-delta against 1ac1059. But that strikes me as overkill. Hence, 1ac1059 is not very useful, but it does preserve the prevailing notion of what an etag is supposed to be. >(4) Various changes to related text, examples, etc. >Does this make sense to the rest of you? Basically, though discovering all the little sub clauses where confusion may lurk should be fun. > I guess I have some work cut out. Fortunately (?) the IETF won't be publishing any new I-Ds for about >two weeks. I'll help where I can. - ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org - ----------------------------------------------------------- ------- End of Forwarded Message From mogul Wed Mar 22 10:48:42 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA10056; Wed, 22 Mar 2000 10:48:42 -0800 (PST) Message-Id: <200003221848.KAA10056@wera.pa.dec.com> To: http-delta From: Reply-To: danielh@crosslink.net Original-Date: Wed, 22 Mar 2000 00:52:46 -0500 In-Reply-To: <200003212356.PAA03351@wera.pa.dec.com> Subject: Re: A possible problem: when are etags assigned X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Date: Wed, 22 Mar 2000 10:48:42 -0800 Sender: mogul X-Mts: smtp This is an important paragraph to revise or emphasize (in delta-03 and digest-02) It is convenient to think of an entity tag, in HTTP/1.1, as being associated with an instance, rather than an entity. That is, for a given resource, two different response messages might include the same entity tag, but two different instances of the resource should never be associated with the same (strong) entity tag. - ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org - ----------------------------------------------------------- From mogul Wed Mar 22 10:49:22 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA12274; Wed, 22 Mar 2000 10:49:22 -0800 (PST) Message-Id: <200003221849.KAA12274@wera.pa.dec.com> To: http-delta From: Reply-To: danielh@crosslink.net Original-Date: Wed, 22 Mar 2000 00:56:02 -0500 In-Reply-To: <200003212356.PAA03351@wera.pa.dec.com> Subject: Re: A possible problem: when are etags assigned X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Date: Wed, 22 Mar 2000 10:49:22 -0800 Sender: mogul X-Mts: smtp Also, from digest-02 Note: the digest is computed before the application of any content-coding, because if a delta-content-coding [8] is used, the computation of the digest after the computation of the delta would not provide a digest useful for checking the integrity of the reassembled instance. You might want to add: content-coding or any range extraction, because if a delta-content-coding [8] is used, ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul Wed Mar 22 10:49:56 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA12870; Wed, 22 Mar 2000 10:49:56 -0800 (PST) Message-Id: <200003221849.KAA12870@wera.pa.dec.com> To: http-delta From: Reply-To: danielh@crosslink.net Original-Date: Wed, 22 Mar 2000 00:59:39 -0500 In-Reply-To: <200003212356.PAA03351@wera.pa.dec.com> Subject: Re: A possible problem: when are etags assigned X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Date: Wed, 22 Mar 2000 10:49:56 -0800 Sender: mogul X-Mts: smtp Content-md5 is defined against the post content encoded, but pre-transfer encoded entity. Is it defined before or after range extraction? ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Wed Mar 22 10:58:10 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA11701; Wed, 22 Mar 2000 10:58:10 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA04052; Wed, 22 Mar 2000 10:58:10 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA12112; Wed, 22 Mar 2000 10:58:10 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003221858.KAA12112@wera.pa.dec.com> To: danielh@crosslink.net Cc: http-delta@pa.dec.com Subject: Re: A possible problem: when are etags assigned In-Reply-To: Your message of "Wed, 22 Mar 2000 10:49:56 PST." <200003221849.KAA12870@wera.pa.dec.com> Date: Wed, 22 Mar 2000 10:58:10 -0800 X-Mts: smtp Content-md5 is defined against the post content encoded, but pre-transfer encoded entity. Is it defined before or after range extraction? Darned if I know. I've repeatedly tried to make the point that the term "entity", as defined in RFC2616, is misleading and possibly ambiguous. [I made the argument during the drafting of the spec, but I was voted down.] I believe that because the spec says that Content-MD5: is an MD5 digest of the entity-body for the purpose of providing an end-to-end message integrity check (MIC) of the entity-body. and because "entity-body" is used in the BNF as follows: message-body = entity-body | that it is strictly after range extraction (since range extraction is end-to-end and so clearly isn't a transfer-coding). But this is a tenuous inference, and I'm sure people will implement it both ways. This makes Content-MD5 effectively useless for ensuring integrity in the presence of ranges and delta encodings, which is why we wrote the Digest I-D - to more carefully define a header. I have to admit that I didn't understand this failing of Content-MD5 during the drafting of RFC2616, or else I would certainly have used it as ammunition in my fight against the term "entity". But I figured it out too late. -Jeff From danielh@crosslink.net Wed Mar 22 11:44:06 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA14504; Wed, 22 Mar 2000 11:44:06 -0800 (PST) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA06206; Wed, 22 Mar 2000 11:44:06 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id LAA25756 for ; Wed, 22 Mar 2000 11:44:05 -0800 (PST) Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id OAA06558 for ; Wed, 22 Mar 2000 14:44:04 -0500 Message-Id: <200003221944.OAA06558@lycanthrope.crosslink.net> X-Really-To: Date: Wed, 22 Mar 2000 14:43:30 -0500 To: http-delta@pa.dec.com Subject: weak etags? X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 I'm wondering if weak-etags may offer a possible solution to the "etag before or after delta coding" conundrum. I suspect not, but I don't have a deep understanding of weak etags. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Wed Mar 22 11:52:18 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA25588; Wed, 22 Mar 2000 11:52:18 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA08834; Wed, 22 Mar 2000 11:52:18 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA15524; Wed, 22 Mar 2000 11:52:17 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003221952.LAA15524@wera.pa.dec.com> To: danielh@crosslink.net Cc: http-delta@pa.dec.com Subject: Vcdiff In-Reply-To: Your message of "Wed, 22 Mar 2000 10:48:01 PST." <200003221848.KAA10135@wera.pa.dec.com> Date: Wed, 22 Mar 2000 11:52:17 -0800 X-Mts: smtp Daniel writes: BTW: the emphasis on vcdiff is frustrating for me -- unless the situation has changed, I can find no samples of a vcdiff encoder. At least GDIFF was easy to understand and fairly easy to implement (given that I had implemented Rsync independently) We may have to re-evaluate that as the specification moves along the IETF standards track. A lot of us are frustrated about the lack of available code, but since the vcdiff spec is going to follow the IETF standards track, it will have to meet the usual requiremnent for two independently-developed interoperable implementations, and that (I hope) will include at least one open-source version. If this hasn't happened by the time that the delta spec reaches Draft Standard status, we'll almost certainly have to remove any dependency on vcdiff. For now, we can view this as a lever to "encourage" an open source implementation of vcdiff, which is about the best known coding format on purely technical grounds. -Jeff From mogul@pa.dec.com Wed Mar 22 12:06:26 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id MAA15567; Wed, 22 Mar 2000 12:06:26 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA22621; Wed, 22 Mar 2000 12:06:25 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id MAA15788; Wed, 22 Mar 2000 12:06:25 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003222006.MAA15788@wera.pa.dec.com> To: danielh@crosslink.net Cc: http-delta@pa.dec.com Subject: Re: A possible problem: when are etags assigned In-Reply-To: Your message of "Wed, 22 Mar 2000 10:48:01 PST." <200003221848.KAA10135@wera.pa.dec.com> Date: Wed, 22 Mar 2000 12:06:25 -0800 X-Mts: smtp Alas, I've been laboring under some etag misconceptions. For example, I completely missed the part about etags being assigned before range extraction; which now makes sense to me (i.e.; it allows for range extraction of an index which can then be used to retrieve selected chapters; in the acrobat-selectively- reading-a-large-pdf sense). Even for byte-range retrievals, the ordering (entity tag stays constant for various different range retrievals) was obvious almost from the start - hence the If-Range header in HTTP/1.1. Sorry if that wasn't clear :-) The realization about content-encoding and etags only came when I read draft 3 and pondered the significance of ...geez, I can't remember just what sub-clause got me wondering. Whatever, it's fortuitous that the latest draft happen to come around just about the time I was tinkering with the delta module (for reasons that had nothing to do with etags!) I should note that this part hasn't changed since draft-00, as far as I can remember. >It's quite clear that the entity tag must be assigned >*before* a delta content coding. Otherwise, the entity >tag would be useless in deciding how to combine a delta >with a previous instance. Perhaps not useless, but crippled. Allow me to belabor the point, just to make sure... Consider the case: a) at 1PM, the client requests foo.html, recieves a response with an etag of "def" b) at 8PM, the client re-requests foo.html, with If-none: "def" and Accept-encoding: Gdiff He recieves a delta-content-encoded response, with an etag of "ghi" If "ghi" refers to the "pre-encoded instance" from step b, then c) at 9PM, the client can re-re-request foo.html, with If-none: "def","ghi" The server then can use "ghi" (the instance used in step b), which is probably more similar to the current (9PM) instance. Right - but what if another client requests, at 8pm, foo.html without delta-encoding, and gets the same instance as (b). Does that client receive "Etag: ghi" or some other entity tag? And if it is another entity tag, then we can have the perverse situation where an intermediate cache is storing the same instance under two entity tags. However, if "ghi" refers to the actual "entity body" (the difference file returned at 8PM), then "ghi" is almost certainly useless as a base-instance Please banish any thoughts about entity tags being associated with "actual entity bodies"! The word "entity" is going to give us all headaches. >I'm not always in agreement with Koen, but this time I think he may be >right. >Consider this scenario: > (1) Content author creates foo.html > (2) some software does "gzip -c foo.html >foo.html.gz" >Should foo.html and foo.html.gz have the same entity tag? >On the one hand, one could argue that these two files represent identical >content, but one of them is encoded differently. I like that notion ... but it does require some tortured parsing of what an "entity" (versus an "entity body" and "entity contents") Note that I strenuously avoid using the term entity, precisely because of this confusion. >(B) As a practical matter, I believe that most (all?) existing servers >would not recognize that foo.html and foo.html.gz are different encodings >of the same content (for one thing, it might be computationally expensive >to verify this), and so it would be difficult to get these servers to >assign the same entity tag. That may not be a disaster -- there's nothing saying that consecutive responses for the same resource must have the same etag; they strongly SHOULD, but it's not illegal if they don't. So if the default behavior of a server is to assign an etag based on file name (and date/size/whatever), then these would get different etags. Admittedly, this does limit how frequently delta encoding will succeed, but I don't see other major problems. But delta encoding is only worth doing if it is likely to succeed often enough to amortize the protocol overheads (and implementation overheads). Note that our SIGCOMM paper suggests that aggregating references from lots of clients is an important way to improve the performance of delta encoding, so losing this aggregation by assigning multiple entity tags to the same instances is probably a mistake. >The last approach seems cleanest, but would require a number of changes, >including but certainly not limited to these: >(1) Section 4 (Relationship between content-coding, transfer-coding, and >ranges) needs to be changed to make it clear that the instance is the >result of a possible content coding, not an input to it. Isn't the instance an "input" to content coding, whereas the entity is an "output" from content coding? I think the problem is that the original HTTP/1.1 layering is: variant apply content-coding [unnamed thing] apply range selection entity apply transfer-coding message-body We want to insert delta encoding in two places; one is as a hop-by-hop transfer-coding, which still seems to be working. The other is as an end-to-end coding, which would fit in like this: variant apply content-coding [unnamed thing 1] apply delta encoding [unnamed thing 2] apply range selection entity apply transfer-coding message-body We tried to cram end-to-end delta encoding into the content-coding bucket, which meant that I got fuzzy about using the term "instance" to describe "unnamed thing 1" and/or "unnamed thing 2". One possible way to resolve this might be to add a general "instance manipulation" layer, i.e., variant apply content-coding instance apply instance manipulation: (delta encoding, range selection, etc.) entity apply transfer-coding message-body And then I think it becomes clear that the "instance" is really the output of the content-coding, contrary to what I wrote in the I-D. (But I tried to make it the input to the content-coding because it has to be the input to the delta encoding, and we thought that we could make end-to-end delta encoding a content-coding.) Then the question arises: is it useful (or even correct) to describe both end-to-end delta encoding and range selection as part of the same "instance manipulation" layer? Or is this excess generality? Also, we have to deal with the fact that the headers for range support are already defined, so it's not as if we could now glom these two things together into a common HTTP header mechanism. However, it's possible that some other instance manipulations might be proposed later (there's a paper on "cache-based compaction" which might possibly fit in here; see M. C. Chan and T. Woo. Cache-based Compaction: A New Technique for Optimizing Web Transfer. In Proc. IEEE Infocom '99, pages 117-125. New York, NY, March, 1999. for more details.) >(3) Creating a new message header (e.g., "DE") so that we would send: > HTTP/1.1 226 Delta > ETag: "1acl059" > DE: vcdiff > Delta-base: "337pey" > Date: Tue, 25 Nov 1997 18:30:05 GMT >and a new non-terminal, e.g., delta-encoding, and changing the BNF so >that "vcdiff", etc. are examples of delta-encoding, not content-coding. I don't think that's sufficient -- following Koen's rules, the etag from above would be assigned to the "vcdiff'ed" output, not to the current instance (i.e.; to whatever vcdiff compared 337pey to). There needs to be some way to tell the client "here's an identifier for the current instance". To repeat my point: the word "entity" causes endless confusion. An "entity tag" REALLY IS an "an identifier for the current instance." We just got the terms wrong in RFC2616. And Koen's rule doesn't apply if we declare that delta encoding is NOT a content-coding, and so uses the instance as an input. In which case I think the details are pretty straightforward. -Jeff From mogul@pa.dec.com Wed Mar 22 12:11:45 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id MAA15925; Wed, 22 Mar 2000 12:11:45 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA22073; Wed, 22 Mar 2000 12:11:45 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id MAA15691; Wed, 22 Mar 2000 12:11:44 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003222011.MAA15691@wera.pa.dec.com> To: Cc: http-delta@pa.dec.com Subject: Re: weak etags? In-Reply-To: Your message of "Wed, 22 Mar 2000 14:43:30 EST." <200003221944.OAA06558@lycanthrope.crosslink.net> Date: Wed, 22 Mar 2000 12:11:44 -0800 X-Mts: smtp I'm wondering if weak-etags may offer a possible solution to the "etag before or after delta coding" conundrum. I suspect not, but I don't have a deep understanding of weak etags. "Not" is correct. Weak etags aren't much use for delta-encoding, because you can assign the same weak etag to two different octet-strings. (E.g., if they are the same HTML file except for an advertising banner URL.) This makes delta DEcoding impossible, because if you can't be sure about whether you have exactly the right input strings (base instance and delta), you might generate garbage when you combine them in the decoding phase. -Jeff From danielh@crosslink.net Wed Mar 22 12:34:30 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id MAA07291; Wed, 22 Mar 2000 12:34:30 -0800 (PST) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA03036; Wed, 22 Mar 2000 12:34:29 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id MAA18579 for ; Wed, 22 Mar 2000 12:34:28 -0800 (PST) Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id PAA24947 for ; Wed, 22 Mar 2000 15:34:23 -0500 Message-Id: <200003222034.PAA24947@lycanthrope.crosslink.net> X-Really-To: Date: Wed, 22 Mar 2000 15:32:15 -0500 To: http-delta@pa.dec.com In-Reply-To: <200003222011.MAA15691@wera.pa.dec.com> Subject: Re: weak etags? X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 > I'm wondering if weak-etags may offer a possible solution to the > "etag before or after delta coding" conundrum. > I suspect not, but I don't have a deep understanding of weak etags. > >"Not" is correct. Weak etags aren't much use for delta-encoding, because >you can assign the same weak etag to two different >octet-strings. (E.g., if they are the same HTML file except for an >advertising banner URL.) Which is not the same as differences that are due to an application of content-encoding. So forget weak etags. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Wed Mar 22 13:03:35 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA17805; Wed, 22 Mar 2000 13:03:34 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA12807; Wed, 22 Mar 2000 13:03:34 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA18108; Wed, 22 Mar 2000 13:03:34 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003222103.NAA18108@wera.pa.dec.com> To: danielh@crosslink.net Cc: http-delta@pa.dec.com, mogul@pa.dec.com Subject: Re: A possible problem: when are etags assigned In-Reply-To: Your message of "Wed, 22 Mar 2000 10:49:22 PST." <200003221849.KAA12274@wera.pa.dec.com> Date: Wed, 22 Mar 2000 13:03:34 -0800 X-Mts: smtp Daniel writes: Also, from digest-02 Note: the digest is computed before the application of any content-coding, because if a delta-content-coding [8] is used, the computation of the digest after the computation of the delta would not provide a digest useful for checking the integrity of the reassembled instance. You might want to add: content-coding or any range extraction, because if a delta-content-coding [8] is used, Actually, by clarifying things so that the "instance" is the output of the content-coding (possibly the identity coding), I can simplify the Digest spec by changing this note to be something like: Note: the digest is computed after the application of any content-coding, but before the application of any end-to-end delta coding[8], or any range extraction. The computation of the digest after the computation of the delta or range would not provide a digest useful for checking the integrity of the reassembled instance. Thanks -Jeff From danielh@crosslink.net Wed Mar 22 13:35:33 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA14891; Wed, 22 Mar 2000 13:35:33 -0800 (PST) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA13715; Wed, 22 Mar 2000 13:35:33 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA24458 for ; Wed, 22 Mar 2000 13:35:33 -0800 (PST) Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id QAA14083 for ; Wed, 22 Mar 2000 16:35:32 -0500 Message-Id: <200003222135.QAA14083@lycanthrope.crosslink.net> X-Really-To: Date: Wed, 22 Mar 2000 16:28:42 -0500 To: http-delta@pa.dec.com Subject: what is the instance? X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 >> Also, from digest-02 >> Note: the digest is computed before the application of any >> content-coding, because if a delta-content-coding [8] is used, >> the computation of the digest after the computation of the >> delta would not provide a digest useful for checking the >> integrity of the reassembled instance. >> You might want to add: >> content-coding or any range extraction, because if a >> delta-content-coding [8] is used, >Actually, by clarifying things so that the "instance" is >the output of the content-coding (possibly the identity >coding), I can simplify the Digest spec by changing this >note to be something like: > Note: the digest is computed after the application of any > content-coding, but before the application of any > end-to-end delta coding[8], or any range extraction. > The computation of the digest after the computation of the delta > or range would not provide a digest useful for checking the > integrity of the reassembled instance. But that's a major redefinition of "instance", as defined by: instance The entity that would be returned in a status-200 response to a GET request, at the current time, for the selected variant of the specified resource, but without the application of any content-coding or transfer-coding. Thus, if gzip is used as a content coding, then instance is the "compressed" output. And this compresssed output is basically useless when used as a base for future differences. And doesn't it make more sense for the "instance tag" to ignore such transient concerns as what form of compression (or lack of compression) was applied to the content of interest? So I'ld argue that "instance" should retain it's meaning; and some other term (say, "encoded-instance") be used for the results of content-coding. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From danielh@crosslink.net Wed Mar 22 13:38:14 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA04945; Wed, 22 Mar 2000 13:38:14 -0800 (PST) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA22622; Wed, 22 Mar 2000 13:38:13 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA00790 for ; Wed, 22 Mar 2000 13:38:13 -0800 (PST) Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id QAA15013 for ; Wed, 22 Mar 2000 16:38:11 -0500 Message-Id: <200003222138.QAA15013@lycanthrope.crosslink.net> X-Really-To: Date: Wed, 22 Mar 2000 16:37:28 -0500 To: http-delta@pa.dec.com In-Reply-To: <200003222006.MAA15788@wera.pa.dec.com> Subject: Re: A possible problem: when are etags assigned X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 > For example, I completely missed the part about etags being assigned > before range extraction; which now makes sense to me..... >Even for byte-range retrievals, the ordering (entity tag stays constant >for various different range retrievals) was obvious almost from the start >- hence the If-Range header in HTTP/1.1. Sorry if that wasn't clear :-) That's why I like to see RFC's err on the side of clarifying the seemingly obvious (though this does lead to bulkier documents) >>>It's quite clear that the entity tag must be assigned >>>*before* a delta content coding. Otherwise, the entity >>>tag would be useless in deciding how to combine a delta >> >with a previous instance. >> Perhaps not useless, but crippled. Allow me to belabor the point, >> just to make sure... >> Consider the case: >> a) at 1PM, the client requests foo.html, recieves a response with an >> etag of "def" >> b) at 8PM, the client re-requests foo.html, with If-none: "def" and >> Accept-encoding: Gdiff >>He recieves a delta-content-encoded response, with an etag of "ghi" >> If "ghi" refers to the "pre-encoded instance" from step b, then >> c) at 9PM, the client can re-re-request foo.html, with If-none: >> "def","ghi" >> The server then can use "ghi" (the instance used in step b), >> which is probably more similar to the current (9PM) instance. >Right - but what if another client requests, at 8pm, foo.html without >delta-encoding, and gets the same instance as (b). Does that client >receive "Etag: ghi" or some other entity tag? And if it is another >entity tag, then we can have the perverse situation where an intermediate >cache is storing the same instance under two entity tags. As currently structured, the non-encoded response (to client 2) should get etag: "ghi". An intermediate cache will then store the response to the first client (the difference file) as "333pey", and the response to the second client (the unenocded stuff) as "ghi". Note that a smart intermediate cache, that happened to have a copy of "abc", should be able to use "333pey) to generate a copy of "ghi" > However, if "ghi" refers to the actual "entity body" (the difference > file returned at 8PM), then "ghi" is almost certainly useless as a > base-instance >Please banish any thoughts about entity tags being associated with >"actual entity bodies"! The word "entity" is going to give us all >headaches. Sounds good to me, but it's going to be a constant headache explaining it to others -- it's such an obvious conlusion to jump to! >>>(B) As a practical matter, I believe that most (all?) existing servers >>>would not recognize that foo.html and foo.html.gz are different ... > >> That may not be a disaster -- there's nothing saying that consecutive >> responses for the same resource must have the same etag; they >> strongly SHOULD, but it's not illegal if they don't. So if the default ... >But delta encoding is only worth doing if it is likely to succeed often >enough to amortize the protocol overheads (and implementation overheads). >Note that our SIGCOMM paper suggests that aggregating references from >lots of clients is an important way to improve the performance of delta >encoding, so losing this aggregation by assigning multiple entity tags to >the same instances is probably a mistake. I agree -- I just was throwing up a possible pallative, one that I'm happy to see shot down. >> Isn't the instance an "input" to content coding, >> whereas the entity is an "output" from content coding? > >I think the problem is that the original HTTP/1.1 layering is: > variant > apply content-coding > [unnamed thing] Also .. assign etag > apply range selection > entity > apply transfer-coding > message-body >We want to insert delta encoding in two places; one is as a >hop-by-hop transfer-coding, which still seems to be working. The other is >as an end-to-end coding, which would fit in like this: > variant > apply content-coding > [unnamed thing 1] > apply delta encoding > [unnamed thing 2] > apply range selection > entity > apply transfer-coding > message-body >We tried to cram end-to-end delta encoding into the content-coding >bucket, which meant that I got fuzzy about using the term "instance" to >describe "unnamed thing 1" and/or "unnamed thing 2". What about when the order is gdiff,gzip -- one first does a delta encoding, and then a more traditional content encoding. This order is much more likely to be useful then gzip,gdiff (gzip on the "snapshot", followed by gdiff of this compressed "file" on something held in common by server and client). >One possible way to resolve this might be to add a general >"instance manipulation" layer, i.e., > variant > apply content-coding > instance > apply instance manipulation: > (delta encoding, range selection, etc.) > entity > apply transfer-coding > message-body >And then I think it becomes clear that the "instance" is really the >output of the content-coding, contrary to what I wrote in the I-D. This seems to contradict: instance The entity that would be returned in a status-200 response to a GET request, at the current time, for the selected variant of the specified resource, but without the application of any content-coding or transfer-coding. That is, the "instance" is what exists BEFORE content encoding of any kind. Am I confused? If so, what name should be given to a snapshot in the life of a resource. >I tried to make it the input to the content-coding because it has to be >the input to the delta encoding, and we thought that we could make >end-to-end delta encoding a content-coding.) >>>(3) Creating a new message header (e.g., "DE") so that we would send: >>> HTTP/1.1 226 Delta >>> ETag: "1acl059" >>> DE: vcdiff >>> Delta-base: "337pey" >>> Date: Tue, 25 Nov 1997 18:30:05 GMT >>>and a new non-terminal, e.g., delta-encoding, and changing the BNF so >>>that "vcdiff", etc. are examples of delta-encoding, not content-coding. > >> I don't think that's sufficient -- following Koen's rules, the etag from >> above would be assigned to the "vcdiff'ed" output, not to the current >> instance (i.e.; to whatever vcdiff compared 337pey to). There needs to be >> some way to tell the client "here's an identifier for the current >> instance". In the sense of: "I am retaining a full, un-encoded copy of the current instance, and you can refer to it using the following identifier" >To repeat my point: the word "entity" causes endless confusion. An >"entity tag" REALLY IS an "an identifier for the current instance." We >just got the terms wrong in RFC2616. Again, how are we defining "current instance" -- is it before or after old-fashioned (i.e.; GZIP) content encoding. If so, then the "entity tag" an identifier of the current instance. If not, then "entity tag" does NOT identify the current instance. I'm agnostic on recieved terminology (having been a marginal contributor to rfc2616). But I do like the notion of the "current instance" as meaning "preencoded" -- the body of a response that you would send if you had instanteous communication and slow processing (though how headers fit into this puzzle requires some thought). Which means that terms are needed for a) what is produced by a differencing the current instance against a commonly held instance b) what is produced by content-encoding a "current instance", or by content encoding the results of a differencing c) what is produced by range extraction >And Koen's rule doesn't apply if we declare that delta encoding is NOT a >content-coding, and so uses the instance as an input. In which case I >think the details are pretty straightforward. Koen's rule: "etag is assigned after content encoding, but before range extraction". But where would delta occur -- before or after content coding. Should that be flexible (say, based on order of appearance in accept-encoding), or should it be dictated to always occur before content coding (which would simplify implementation) ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Wed Mar 22 15:11:04 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA23436; Wed, 22 Mar 2000 15:11:04 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA14258; Wed, 22 Mar 2000 15:11:04 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA23720; Wed, 22 Mar 2000 15:11:03 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003222311.PAA23720@wera.pa.dec.com> To: Cc: http-delta@pa.dec.com Subject: Re: what is the instance? In-Reply-To: Your message of "Wed, 22 Mar 2000 16:28:42 EST." <200003222135.QAA14083@lycanthrope.crosslink.net> Date: Wed, 22 Mar 2000 15:11:03 -0800 X-Mts: smtp I wrote: >Actually, by clarifying things so that the "instance" is >the output of the content-coding (possibly the identity >coding), I can simplify the Digest spec by changing this >note to be something like: > Note: the digest is computed after the application of any > content-coding, but before the application of any > end-to-end delta coding[8], or any range extraction. > The computation of the digest after the computation of the delta > or range would not provide a digest useful for checking the > integrity of the reassembled instance. Daniel wrote: But that's a major redefinition of "instance", as defined by: instance The entity that would be returned in a status-200 response to a GET request, at the current time, for the selected variant of the specified resource, but without the application of any content-coding or transfer-coding. Correct. So I need to change that definition to instance The entity that would be returned in a status-200 response to a GET request, at the current time, for the selected variant of the specified resource, with the application of zero or more content-codings, but without the application of any end-to-end delta-encoding, range selection, or transfer-coding. Or perhaps replace "any end-to-end delta-encoding, range selection," with "any instance manipulation". Thus, if gzip is used as a content coding, then instance is the "compressed" output. And this compresssed output is basically useless when used as a base for future differences. I'm not sure I understand this. If you are taking the delta between two versions (instances) of foo.html.gz, it should work. What wouldn't work is if you wanted to cache only the uncompressed representation - but content-coding is end-to-end, and so caches aren't supposed to store the decoded version unless they can transparently restore the coding (or store a second, encoded copy as well). And doesn't it make more sense for the "instance tag" to ignore such transient concerns as what form of compression (or lack of compression) was applied to the content of interest? It depends what you mean by "transient". Transfer-codings are definitely transient, but as a practical matter, many servers implement Content-codings by storing the coded version, not the plaintext. Which makes these not so transient. However, one could certainly argue that an origin server should be able to generate either a compressed content-coding of a resource (for a cache's first retrieval), or an end-to-end delta encoded representation (for a second retrieval). And since (as far as I know) differencing algorithms such as vdelta don't do as well if the inputs are compressed, what you might really want is a form of "vcdiff" where both the encoder and the decoder decompress the base instance before doing the differencing or reconstitution. But then we need a protocol syntax to specify that the receiver needs to do this step. The problem is made trickier because *in theory* there are potentially two kinds of Content-codings: those that are effectively compressions (and so one might want to remove them before computing a delta), and those that don't interfere with computing a delta, and so don't need to be removed. In practice, no content-coding of the latter class has yet been defined. So I'ld argue that "instance" should retain it's meaning; and some other term (say, "encoded-instance") be used for the results of content-coding. Well, this begs the question of what to associate the entity tag with. Is it the instance or the content-coded-instance? I don't think it's feasible to modify the entity tag mechanisms already in RFC2616! I'll think about this some more. Any suggestions from other people. -Jeff From danielh@crosslink.net Wed Mar 22 19:57:32 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id TAA01710; Wed, 22 Mar 2000 19:57:32 -0800 (PST) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA21671; Wed, 22 Mar 2000 19:57:32 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id TAA26973 for ; Wed, 22 Mar 2000 19:57:31 -0800 (PST) Received: from smtp.crosslink.net (dyn37.c5200-1.springfield.236.crosslink.net [207.199.142.38]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id WAA28060 for ; Wed, 22 Mar 2000 22:57:28 -0500 Message-Id: <200003230357.WAA28060@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Wed, 22 Mar 2000 22:54:03 -0500 To: http-delta@pa.dec.com Subject: What is the instance? X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Jeff wrote > So I need to change that definition to > instance The entity that would be returned in a status-200 > response to a GET request, at the current time, for > the selected variant of the specified resource, > with the application of zero or more content-codings, > but without the application of any end-to-end > delta-encoding, range selection, or transfer-coding. This needs careful thought -- it's a big change from the prior meaning of instance (as the content BEFORE any content coding). More below on this ... >> Thus, if gzip is used as a content coding, then instance is the >> "compressed" output. And this compresssed output is basically >> useless when used as a base for future differences. >I'm not sure I understand this. If you are taking the delta between two >versions (instances) of foo.html.gz, it should work. What wouldn't work >is if you wanted to cache only the uncompressed representation >- but content-coding is end-to-end, and so caches aren't supposed to store the >decoded version unless they can transparently restore the coding (or >store a second, encoded copy as well). The "useless" refers to the difficulty (as Jeff notes below) of getting a useful delta between two gzipped (or otherwise compressed) versions of nearly the same thing. The use of a common denominator (of both client and server caching decoded versions) makes it easier to generate deltas in the future. Given this, some identifier is needed for these un-encoded versions; something similar to etag. The trick is for the server to send two tags, a standard "etag" for the content contained in the current response (such as a difference file, a gzipped file, a range of either of these, etc.), and an "itag" (or "o-etag"?) that identifies the original (un encoded & un differenced) content, a copy of which the server is presumably committed to retaining for awhile. And I don't see why this "wouldn't work". Yes, this does complicate matters for an intermediate cache -- it should retain the response as sent (that is, encoded) identified by the "etag", and also a decoded version identified by the "itag". This allows the cache to perform a delta. However, if the cache does not attempt to save a decoded version, you are no worse off (since the "itag" does not match the encoded version, it will pass through a request that contains an If-None: "itag_value" request header) >However, one could certainly argue that an origin server should be able >to generate either a compressed content-coding of a resource (for a >cache's first retrieval), or an end-to-end delta encoded representation >(for a second retrieval). And since (as far as I know) differencing >algorithms such as vdelta don't do as well if the inputs are compressed, >what you might really want is a form of "vcdiff" where both the encoder >and the decoder decompress the base instance before doing >the differencing or reconstitution. But then we need a protocol syntax >to specify that the receiver needs to do this step. >The problem is made trickier because *in theory* there are >potentially two kinds of Content-codings: those that are >effectively compressions (and so one might want to remove >them before computing a delta), and those that don't >interfere with computing a delta, and so don't need to be >removed. In practice, no content-coding of the latter >class has yet been defined. Hence my point about use of un-encoded versions as the "common denominator" > So I'ld argue that "instance" should retain it's meaning; and some > other term (say, "encoded-instance") be used for the results of > content-coding. >Well, this begs the question of what to associate the entity tag with. >Is it the instance or the content-coded-instance? I don't think it's >feasible to modify the entity tag mechanisms already in RFC2616! It seems we're stuck with the etag being assigned after content-coding. Hence my idea of an "itag", which is really an etag for something (I want to say an entity, but I'm afraid to) that was never sent, but which knowledgable parties can readily create. >I'll think about this some more. Any suggestions from other people. ! ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Thu Mar 23 11:32:36 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA29071; Thu, 23 Mar 2000 11:32:36 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA06686; Thu, 23 Mar 2000 11:32:36 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA31495; Thu, 23 Mar 2000 11:32:36 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003231932.LAA31495@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: What is the instance? In-Reply-To: Your message of "Wed, 22 Mar 2000 22:54:03 EST." <200003230357.WAA28060@lycanthrope.crosslink.net> Date: Thu, 23 Mar 2000 11:32:36 -0800 X-Mts: smtp writes: Jeff wrote > So I need to change that definition to > instance The entity that would be returned in a status-200 > response to a GET request, at the current time, for > the selected variant of the specified resource, > with the application of zero or more content-codings, > but without the application of any end-to-end > delta-encoding, range selection, or transfer-coding. This needs careful thought -- it's a big change from the prior meaning of instance (as the content BEFORE any content coding). Right - but I invented the definition of "instance" as part of the design of the delta encoding specification. I don't think anyone else has based any designs on this definition. So if the definition is wrong, then I think it's appropriate to change it. Given this, some identifier is needed for these un-encoded versions; something similar to etag. The trick is for the server to send two tags, a standard "etag" for the content contained in the current response (such as a difference file, a gzipped file, a range of either of these, etc.), and an "itag" (or "o-etag"?) that identifies the original (un encoded & un differenced) content, a copy of which the server is presumably committed to retaining for awhile. I think adding another identifier is a major increase in complexity. It seems to me that if we could avoid this step, without having to do something ugly, it would be worth something. I'm still trying to think this through, though. -Jeff From danielh@crosslink.net Thu Mar 23 12:24:25 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id MAA01169; Thu, 23 Mar 2000 12:24:24 -0800 (PST) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA32012; Thu, 23 Mar 2000 12:24:24 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id MAA17251 for ; Thu, 23 Mar 2000 12:24:23 -0800 (PST) Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id PAA16902 for ; Thu, 23 Mar 2000 15:24:18 -0500 Message-Id: <200003232024.PAA16902@lycanthrope.crosslink.net> X-Really-To: Date: Thu, 23 Mar 2000 14:49:25 -0500 To: http-delta@pa.dec.com In-Reply-To: <200003231932.LAA31495@wera.pa.dec.com> Subject: Instances again X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Jeff wrote: >>> So I need to change that definition to >> > instance The entity that would be returned in a status-200 >> > response to a GET request, at the current time, for >> > the selected variant of the specified resource, >> > with the application of zero or more content-codings, >> > but without the application of any end-to-end >> > delta-encoding, range selection, or transfer-coding. Daniel wrote: >> This needs careful thought -- it's a big change from the prior >> meaning of instance (as the content BEFORE any content coding). >Right - but I invented the definition of "instance" as part of the design >of the delta encoding specification. I don't think anyone else has based >any designs on this definition. >So if the definition is wrong, then I think it's appropriate to change >it. Of course. However, I am arguing that the prior meaning should be retained. Daniel wrote: >> Given this, some identifier is needed for these un-encoded versions; >> something similar to etag. The trick is for the server to send two >> tags, a standard "etag" for the content contained in the current response >> (such as a difference file, a gzipped file, a range of either of these, etc.), >> and an "itag" (or "o-etag"?) that identifies the original (unencoded & >> un differenced) content, a copy of which the server is presumably >> committed to retaining for awhile. > >I think adding another identifier is a major increase in complexity. It >seems to me that if we could avoid this step, without having to do >something ugly, it would be worth something. I'm still trying to think >this through, though. I don't see it as being that big an increase in complexity. And I don't see any way around the need for some way for the client & server to idenify a "snapshot of the resource" that can be used for future deltas. And there will be plenty of cases (i.e.; when a content coding has been applied) in which the etag just won't work. Consider a scenario: when a server maintains a set of "pre encoded" versions -- eg; foo.html and fool.htm.gz are both available. Then, a request of GET /foo.html HTTP/1.1 Accept-Encoding: vcdiff would cause the server to send foo.html using HTTP/1.1 200 Okay Etag: "abc0" Date: Tue, 15 Mar 2000 18:30:05 GMT Content-Length: 2000 In contrast, a request of GET /foo.html HTTP/1.1 Accept-Encoding: diff-e, gzip would cause the server to return foo.htm.gz: HTTP/1.1 200 Okay Etag: "abcG" O-etag: "abcO" Date: Tue, 15 Mar 2000 18:30:08 GMT Content-Length: 1251 Content-Encoding: gzip The only complication is that the server must assign two tags -- one for foo.html.gz, and one for foo.html. But that's a minor hassle, given that the server has to know that foo.html.gz is the "gzip encoding" of foo.html. Note that the, upon reciept of this, should "unGzip's" the response, and cache both the response (identified with the "abcG" etag") and this unGzipped version (identified with "abc0"). If, an hour later, the client wants a new version, he could use: GET /foo.html HTTP/1.1 Accept-Encoding: vcdiff, gzip If-None-Match: "abc0" Note the O-etag (or "itag", or whatever) header would only be added when there is evidence that the client is delta aware (say, because vcdiff is included in accept-encoding). Alternatively, special etags could be sent -- say "abcG;;abc0" -- the first semi-colon used to indicate a variant-list validator, and the second to indicate the "un content encoded instance" tag. Well, this might break some cache's abilities to do content negotiation, but perhaps there is some other syntax that would work. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From avh@marimba.com Thu Mar 23 12:30:51 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id MAA01518; Thu, 23 Mar 2000 12:30:50 -0800 (PST) Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA14909; Thu, 23 Mar 2000 12:30:50 -0800 Received: from cobra.marimba.com (acheron.marimba.com [207.126.123.64]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id MAA22660; Thu, 23 Mar 2000 12:30:50 -0800 (PST) Received: by cobra with Internet Mail Service (5.5.2650.21) id ; Thu, 23 Mar 2000 12:28:34 -0800 Message-Id: <68C8F96D4999D311B0550008C71AA8AFA4D74F@cobra> From: Arthur van Hoff To: "'Jeffrey Mogul'" , http-delta@pa.dec.com Subject: RE: What is the instance? Date: Thu, 23 Mar 2000 12:28:33 -0800 Mime-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Hi Jeff, Here are my 2 cents. In our products we use delta encoding purely based on MD5 checksums. This could reduce the confusion somewhat. In my opion a delta should be specified to be the delta between two clearly identified versions of an original file/resource/instance. This is easily and unambigously done using checksums instead of etags. I've never really liked this use of etags because they have such a confusing definition. Would it be possible to use the instance digest header from draft-mogul-http-digest-02.txt instead of inventing additional headers besides etag? Have fun, Arthur van Hoff > -----Original Message----- > From: Jeffrey Mogul [mailto:mogul@pa.dec.com] > Sent: Thursday, March 23, 2000 11:33 AM > To: http-delta@pa.dec.com > Subject: Re: What is the instance? > > > writes: > Jeff wrote > > So I need to change that definition to > > instance The entity that would be returned in > a status-200 > > response to a GET request, at the > current time, for > > the selected variant of the specified > resource, > > with the application of zero or more content-codings, > > but without the application of any end-to-end > > delta-encoding, range selection, or transfer-coding. > > This needs careful thought -- it's a big change from the prior > meaning of instance (as the content BEFORE any content coding). > > Right - but I invented the definition of "instance" as part of > the design of the delta encoding specification. I don't think > anyone else has based any designs on this definition. > > So if the definition is wrong, then I think it's appropriate to > change it. > > Given this, some identifier is needed for these > un-encoded versions; > something similar to etag. The trick is for the server > to send two tags, > a standard "etag" for the content contained in the > current response (such > as a difference file, a gzipped file, a range of either > of these, etc.), > and an "itag" (or "o-etag"?) that identifies the > original (un encoded & > un differenced) content, a copy of which the server is presumably > committed to retaining for awhile. > > I think adding another identifier is a major increase in complexity. > It seems to me that if we could avoid this step, without having to > do something ugly, it would be worth something. I'm still trying to > think this through, though. > > -Jeff > From danielh@crosslink.net Thu Mar 23 13:15:24 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA01930; Thu, 23 Mar 2000 13:15:24 -0800 (PST) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA00769; Thu, 23 Mar 2000 13:15:24 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA24219 for ; Thu, 23 Mar 2000 13:15:23 -0800 (PST) Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id QAA01529 for ; Thu, 23 Mar 2000 16:15:22 -0500 Message-Id: <200003232115.QAA01529@lycanthrope.crosslink.net> X-Really-To: Date: Thu, 23 Mar 2000 16:09:08 -0500 To: http-delta@pa.dec.com In-Reply-To: <68C8F96D4999D311B0550008C71AA8AFA4D74F@cobra> Subject: RE: What is the instance? X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Athur wrote: >Here are my 2 cents. In our products we use delta >encoding purely based on MD5 checksums. This could >reduce the confusion somewhat. In my opion a delta >should be specified to be the delta between two clearly >identified versions of an original file/resource/instance. >This is easily and unambigously done using checksums instead of etags. >I've never really liked this use of etags because they have such a >confusing definition. Would it be possible to use the instance digest >header from draft-mogul-http-digest-02.txt instead of inventing >additional headers besides etag? But the content-md5 header is defined on the post content-encoded (and pre transfer-encoded) contents of a response (dare I say entity body). Thus, it suffers from the same problem as the etag -- it does not identify the original file/resource/instance. So, are you recommending the value of the instance digest header be used as an "instance tag"? Which would mean that the instance digest header would become an inseperable part of delta encoding. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From avh@marimba.com Thu Mar 23 13:50:19 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA05779; Thu, 23 Mar 2000 13:50:19 -0800 (PST) Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA20235; Thu, 23 Mar 2000 13:50:18 -0800 Received: from cobra.marimba.com (acheron.marimba.com [207.126.123.64]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA29118 for ; Thu, 23 Mar 2000 13:50:18 -0800 (PST) Received: by cobra with Internet Mail Service (5.5.2650.21) id ; Thu, 23 Mar 2000 13:48:03 -0800 Message-Id: <68C8F96D4999D311B0550008C71AA8AFA4D751@cobra> From: Arthur van Hoff To: "'danielh@crosslink.net'" , http-delta@pa.dec.com Subject: RE: What is the instance? Date: Thu, 23 Mar 2000 13:48:02 -0800 Mime-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Hi Daniel, In our products there is no content-encoding as such, only delta-encoding and transfer-encoding. Delta encoding is done first, following by optional compression. We do support range requests, but not combined with delta-encoding, that got too complicated too quickly. In our case the content-encoding of the resource is implied by a mime type, and as a result we can use the MD5 which was computed over the content-encoded resource that is ultimately being transmitted. Jeff wrote a while back: > Consider this scenario: > (1) Content author creates foo.html > (2) some software does "gzip -c foo.html >foo.html.gz" > > Should foo.html and foo.html.gz have the same entity tag? In our case we assume that content author creates foo.html or foo.html.gz, but that there is no further automatic content encoding. In that scenario there is only one original resource, either foo.html or foo.html.gz, and thus they have different checksums. Now this takes me back to Jeff's original comment: > So it looks like we have a contradiction: the entity tag must > be assigned before a delta content-coding, but after content-codings > in general. Ouch. It appears that we need to define delta-encoding as something which happens after content-encoding and before transfer-encoding. If that is the case, then could we use the content-md5 header? Have fun, Arthur van Hoff > -----Original Message----- > From: danielh@crosslink.net [mailto:danielh@crosslink.net] > Sent: Thursday, March 23, 2000 1:09 PM > To: http-delta@pa.dec.com > Subject: RE: What is the instance? > > > Athur wrote: > >Here are my 2 cents. In our products we use delta > >encoding purely based on MD5 checksums. This could > >reduce the confusion somewhat. In my opion a delta > >should be specified to be the delta between two clearly > >identified versions of an original file/resource/instance. > >This is easily and unambigously done using checksums instead > of etags. > >I've never really liked this use of etags because they have such a > >confusing definition. Would it be possible to use the instance digest > >header from draft-mogul-http-digest-02.txt instead of inventing > >additional headers besides etag? > > But the content-md5 header is defined on the post content-encoded (and > pre transfer-encoded) contents of a response (dare I say > entity body). > Thus, it suffers from the same problem as the etag -- it does not > identify the original file/resource/instance. > > So, are you recommending the value of the instance digest > header be used as > an "instance tag"? Which would mean that the instance digest > header would > become an inseperable part of delta encoding. > > > > ----------------------------------------------------------- > Daniel Hellerstein > danielh@crosslink.net > http://www.srehttp.org > ----------------------------------------------------------- > From mogul@pa.dec.com Thu Mar 23 15:05:05 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA07374; Thu, 23 Mar 2000 15:05:05 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA01977; Thu, 23 Mar 2000 15:05:05 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA06634; Thu, 23 Mar 2000 15:05:05 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003232305.PAA06634@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: What is the instance? In-Reply-To: Your message of "Thu, 23 Mar 2000 12:28:33 PST." <68C8F96D4999D311B0550008C71AA8AFA4D74F@cobra> Date: Thu, 23 Mar 2000 15:05:05 -0800 X-Mts: smtp Arthur van Hoff writes: Here are my 2 cents. In our products we use delta encoding purely based on MD5 checksums. This could reduce the confusion somewhat. In my opion a delta should be specified to be the delta between two clearly identified versions of an original file/resource/instance. This is easily and unambigously done using checksums instead of etags. I've never really liked this use of etags because they have such a confusing definition. Would it be possible to use the instance digest header from draft-mogul-http-digest-02.txt instead of inventing additional headers besides etag? First of all, I don't think there's a need to invent extra identification headers - I'm working up a set of scenarios to show how things should work without that, but I'm not sure I'll finish that today. Second, if you think about it, the "entity tag" carried in the ETag: header is really an "instance tag". That is, a unique (strong) entity tag is associated with a unique instance. But the instance digest has to be assigned at precisely the same point in the processing pipeline, because it's also unique per instance. So if you are computing an MD5 digest to produce a Digest: header, then you can use the same string as the entity tag. However, there is no requirement that these strings be the same. In short, you can use essentially the mechanism you are already using in your product - compute an MD5 digest of the instance - and stick it in the ETag: header. If you want to add the ability to do an end-to-end integrity check, you would need to send the same string in a Digest: header, because a client is required to treat the ETag: header value as an opaque value (i.e., it cannot assume that this value is a digest of the instance!) Which means that there is a slight inefficiency, with the protocol headers carrying the same string twice. On the other hand, an existing server that uses some other scheme to construct entity tags (for example, I believe that IIS/5.0 isn't using an MD5 value, since their ETag is apparently a 20-nibble hex encoding of an 80-bit value) would not have to modify its entity-tag creation code in order to support delta encoding. I.e., it's certainly not required to use an MD5 digest as the entity tag. Daniel Hellerstein adds: But the content-md5 header is defined on the post content-encoded (and pre transfer-encoded) contents of a response (dare I say entity body). Thus, it suffers from the same problem as the etag -- it does not identify the original file/resource/instance. That's a little confused. The Content-MD5 header does provide a digest of the "entity-body", not of the instance, so you're right that it isn't sufficient for our purposes. However, I'm pretty sure that Arthur was referring to a "Digest:" header digest, not a "Content-MD5:" header digest - and we defined the "Digest:" header to be an instance digest, specifically for this purpose. Remember, just because it is called an "entity tag" does not mean that it has any actual connection with the "entity-body"! (I must be getting rather tiresome on this point by now.) So the "entity tag", which is really an "instance tag", is indeed the right thing for identifying an instance. But so is an instance digest - however, I think we want to continue to define the delta encoding protocol in terms of entity tags, so that servers aren't required to send both the entity tag and the instance digest. -Jeff From mogul@pa.dec.com Thu Mar 23 15:10:53 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA28283; Thu, 23 Mar 2000 15:10:53 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA23062; Thu, 23 Mar 2000 15:10:53 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA06595; Thu, 23 Mar 2000 15:10:53 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003232310.PAA06595@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: What is the instance? In-Reply-To: Your message of "Thu, 23 Mar 2000 13:48:02 PST." <68C8F96D4999D311B0550008C71AA8AFA4D751@cobra> Date: Thu, 23 Mar 2000 15:10:53 -0800 X-Mts: smtp Arthur van Hoff writes: In our products there is no content-encoding as such, only delta-encoding and transfer-encoding. Delta encoding is done first, following by optional compression. Am I correct in assuming that this optional compression is conceptually part of the delta encoding, rather than a separate coding step? I.e., something like the output of the pipeline diff -e | gzip or like the vcdiff format, which inherently combines compression with the delta encoding? In our case we assume that content author creates foo.html or foo.html.gz, but that there is no further automatic content encoding. In that scenario there is only one original resource, either foo.html or foo.html.gz, and thus they have different checksums. This is consistent with treating the output of the content-coding as the instance - i.e., the point at which the entity tag is assigned and the instance digest is computed. In some cases (e.g., foo.html) the content-coding is the identity transformation. Now this takes me back to Jeff's original comment: > So it looks like we have a contradiction: the entity tag must > be assigned before a delta content-coding, but after content-codings > in general. Ouch. It appears that we need to define delta-encoding as something which happens after content-encoding and before transfer-encoding. Right. I'm leaning towards calling this the instance-manipulation step, which seems like it might include other things besides delta encoding. Perhaps it conceptually includes range selection, although we can't actually touch this because it's already specified in HTTP/1.1. If that is the case, then could we use the content-md5 header? No, for the reasons I gave in the previous message. -Jeff From danielh@crosslink.net Thu Mar 23 15:27:51 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA09221; Thu, 23 Mar 2000 15:27:51 -0800 (PST) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA04170; Thu, 23 Mar 2000 15:27:51 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id PAA29260 for ; Thu, 23 Mar 2000 15:27:50 -0800 (PST) Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id SAA10524 for ; Thu, 23 Mar 2000 18:27:45 -0500 Message-Id: <200003232327.SAA10524@lycanthrope.crosslink.net> X-Really-To: Date: Thu, 23 Mar 2000 18:19:35 -0500 To: http-delta@pa.dec.com Subject: instance redux X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Arthur said ... >In our case we assume that content author creates foo.html >or foo.html.gz, but that there is no further automatic content encoding. >In that scenario there is only one original resource, either foo.html or >foo.html.gz, and thus they have different checksums. >.... >It appears that we need to define delta-encoding as something >which happens after content-encoding and before transfer-encoding. Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA09658; Thu, 23 Mar 2000 16:26:45 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA17669; Thu, 23 Mar 2000 16:26:45 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA11712; Thu, 23 Mar 2000 16:26:45 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003240026.QAA11712@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: instance redux In-Reply-To: Your message of "Thu, 23 Mar 2000 18:19:35 EST." <200003232327.SAA10524@lycanthrope.crosslink.net> Date: Thu, 23 Mar 2000 16:26:45 -0800 X-Mts: smtp writes: I've been advocating a "two tag" solution, but there is an alternative: whenever a delta is asked for an instance that was content-encoded, the server should compute the delta against it's decoded version of the instance, and the client should apply the resulting difference to it's copy of the decoded instance. I've been working on a similar approach, but with a slight twist. I've added clarifying comments in [] brackets, and the twist is in () parentheses: whenever a delta is asked for a [base] instance that was [originally received] content-encoded, (AND when the delta response does not carry a content-coding, then) the server should compute the delta against its decoded version of the instance, and the client should apply the resulting difference to its copy of the decoded instance. The difference is that this allows a simpler implementation of the scenario where the server always stores the encoded version (e.g., foo.html.gz) in its file system, and so it computes the delta between two different instances of foo.html.gz, rather than foo.html. Which might or might not be the most efficient in terms of coding density, but I think it's a potentially useful extension of your rule. I think it may also make sense to use a "deferred evaluation" approach, at the receiving cache, to the decoding stage (if necessary). I.e., the cache should store the responses as received, not after content decoding, and any necessary content decoding is then done at the last minute. This requires that the server and client: a) can identify which etags come from responses that have been content-encoded No problem, because the cache (client or proxy) stores the response as sent by the server, which includes the Content-encoding: header. And the server consistently assigns the entity tag after the content-coding step (which might be an identity coding). I don't think there is any need to mark the entity tag as being associated with a content-coded response - the responses themselves are marked with Content-encoding: headers. b) use decoded instances to perform deltas WHENEVER this sort of etag is used in a delta enabled request, Right, except that I would say "whenever this sort of response", not "this sort of entity tag" - and (per my twist above) not when both the original (200) response and the delta (226) response are marked with the same Content-Encoding: header. c) use the encoded instance when these etags are used in a plain vanilla conditional GET. Not an issue, if you accept my point of view that the entity tag is assigned to the output of the (possibly identity) content coding. -Jeff From mogul Fri Mar 24 11:30:32 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA16694; Fri, 24 Mar 2000 11:30:32 -0800 (PST) Message-Id: <200003241930.LAA16694@wera.pa.dec.com> To: http-delta From: Reply-To: danielh@crosslink.net Original-Date: Thu, 23 Mar 2000 22:47:51 -0500 In-Reply-To: <200003240026.QAA11712@wera.pa.dec.com> Subject: Re: instance redux X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Date: Fri, 24 Mar 2000 11:30:32 -0800 Sender: mogul X-Mts: smtp danielh wrote: > I've been advocating a "two tag" solution, but there is an > alternative: > whenever a delta is asked for an instance that was content-encoded, > the server should compute the delta against it's decoded version of > the instance, and the client should apply the resulting difference > to it's copy of the decoded instance. Jeff responded: >>I've been working on a similar approach, but with a slight twist. I've >>added clarifying comments in [] brackets, and the twist is in () >>parentheses: >> whenever a delta is asked for a [base] instance that was >> [originally received] content-encoded, (AND when the delta >> response does not carry a content-coding, then) >> the server should compute the delta against its decoded version of >> the instance, and the client should apply the resulting difference >> to its copy of the decoded instance. Okay, I give up. Forget the two tag solution (even though I kind of like it), this approach is reasonable. That said ... I'm having trouble understanding the meaning of the above. I assume the first part means: whenever a client requests a delta against a base instance that was original recieved (by this client), and this base instance was content encoded, the server should compute a delta of the current variant against a decoded version this base instance. but what does (AND when the delta response does not carry a content-coding, then) mean. My guess is: If the server then sends a delta response that also carries another content coding, then the delta response is against the base instance as it was orignally sent (that is, without decoding). If that's what you mean, I can't agree -- that would break a content-encoding: diff-e,gzip response (that is, a gzip of a diff-e of the current contents against the decoded base instance). (more on this below, I think) >The difference is that this allows a simpler implementation of the >scenario where the server always stores the encoded version (e.g., >foo.html.gz) in its file system, and so it computes the delta between two >different instances of foo.html.gz, rather than foo.html. Which might or >might not be the most efficient in terms of coding density, but I think >it's a potentially useful extension of your rule. Since most encodings are going to break delta applications, it's sort of pointless to bother with supporting delta for pre-encoded files that are not identified as such. That is: If the server were to deliver foo.html.gz WITHOUT a content-encoding (and hope the client intuits the need to unGZIP it), then the server is implicitily punting on future deltas. If the server includes an explicit Content-Encoding: gzip, then the rule of "undo the set the list of content-codings before delta computation" is straightforward (though it's speed may depend on how clever the server is about retaining both the unencoded and encoded versions). >I think it may also make sense to use a "deferred evaluation" approach, >at the receiving cache, to the decoding stage (if necessary). I.e., the >cache should store the responses as >received, not after content decoding, and any necessary content decoding >is then done at the last minute. I leave that to implementors to determine -- it will probably depend on how popular delta requesting becomes!. >> This requires that the server and client: >> a) can identify which etags come from responses that >> have been content-encoded >No problem, because the cache (client or proxy) stores the >response as sent by the server, which includes the Content-encoding: >header. And the server consistently assigns the entity tag after the >content-coding step (which might be an identity coding). An aside: I think the conclusion is that the etag does identify the instance, given your new definition of instance (as the stuff AFTER content coding, but before range extraction and before transfer coding). {Perhaps that should be made explicit -- clients and servers who support delta are agreeing that the etag is associated with the post-content-encoded, pre range extract, and pre transfer encoded, contents.} Therefore: In addition to the server (and client) associating resource/etag pairs to "cached" copies of base instances, the server (and client) should also retain the set of codings used to create this instance. That is, the association is between a "resource/etag" pair and a "base-instance/content-codings-used-to-create-this-base-instance" pair. >I don't think there is any need to mark the entity tag as >being associated with a content-coded response - the responses themselves >are marked with Content-encoding: headers. I agree. My suggestion was just a hack to implement the latter association (that is, the etag would contains a summary of the Content-encoding headers) > b) use decoded instances to perform deltas WHENEVER this sort of > etag is used in a delta enabled request, >Right, except that I would say "whenever this sort of response", not >"this sort of entity tag" - and (per my twist above) not when both the >original (200) response and the delta (226) response are marked with the >same Content-Encoding: header. I don't quite get this point -- unless you are saying that the delta encoding info should NO LONGER be included in the content-encoding (or accept-encoding) headers. I'm don't think that's necessary -- so long as both client's and servers adhere to the "use content-encoding header to decode base instances before computing/applying differences" rule, there is no need to change the spec. Which is the biggest advantage of this approach over the two tag approach. > c) use the encoded instance when these etags are used in a plain > vanilla conditional GET. >Not an issue, if you accept my point of view that the entity tag is >assigned to the output of the (possibly identity) content coding. Yes -- this just clarifies that when there is no delta encoding (or if the client or server are not delta savvy), then the usual rules (of how to deal with If-None-Match) apply (regardless of the prescense or absence of a content-encoding header). BTW: in the above, I assume that there will never be "deltas against deltas", that a client will not request a delta against "ddd", where "ddd" was the etag of a Content-encoding: gdiff response. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Fri Mar 24 12:02:17 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id MAA13957; Fri, 24 Mar 2000 12:02:17 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA10011; Fri, 24 Mar 2000 12:02:17 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id MAA12626; Fri, 24 Mar 2000 12:02:16 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003242002.MAA12626@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: instance redux In-Reply-To: Your message of "Fri, 24 Mar 2000 11:30:32 PST." <200003241930.LAA16694@wera.pa.dec.com> Date: Fri, 24 Mar 2000 12:02:16 -0800 X-Mts: smtp wrote: >>I've been working on a similar approach, but with a slight twist. I've >>added clarifying comments in [] brackets, and the twist is in () >>parentheses: >> whenever a delta is asked for a [base] instance that was >> [originally received] content-encoded, (AND when the delta >> response does not carry a content-coding, then) >> the server should compute the delta against its decoded version of >> the instance, and the client should apply the resulting difference >> to its copy of the decoded instance. I'm having trouble understanding the meaning of the above. I assume the first part means: whenever a client requests a delta against a base instance that was original recieved (by this client), and this base instance was content encoded, the server should compute a delta of the current variant against a decoded version this base instance. but what does (AND when the delta response does not carry a content-coding, then) mean. My guess is: If the server then sends a delta response that also carries another content coding, then the delta response is against the base instance as it was orignally sent (that is, without decoding). Actually, I think I may have botched the wording slightly, and I certainly made it more confusing than it should be. How about this: If the base instance response and the current delta response carry DIFFERENT content-codings, then the server computes the delta based on the UN-ENCODED representations of both the base instance and the current instance. If both the base instance response and the current delta response carry the SAME set of content-coding(s), then the server computes delta based on the ENCODED representations of both the base instance and the current instance. (If the set of content-codings == {}, then there is no difference between "encoded" and "un-encoded" representations.) So basically the server either has to remember what content-coding it used when sending a base instance response, or (easier to implement) it simply has to follow a deterministic rule that doesn't depend on extraneous parameters. The latter could be slightly complicated because if, for a given URL, the server has a choice of content-codings based on the client's Accept-* headers, then it probably does have to remember what it sent for a given entity tag. But if the content-coding is deterministically based on the Request-URI [e.g., "/foo.html" vs. "/foo.html.gz"), then this isn't a problem. The client has to remember what content-coding came with the base instance response, but that's easy because it's right there in the Content-Encoding header. Later in your message, you write: Therefore: In addition to the server (and client) associating resource/etag pairs to "cached" copies of base instances, the server (and client) should also retain the set of codings used to create this instance. That is, the association is between a "resource/etag" pair and a "base-instance/content-codings-used-to-create-this-base-instance" pair. I think we're in agreement, if you'll allow "if the server can reliably recompute the pair later on, then it doesn't have to store it." You wrote: If that's what you mean, I can't agree -- that would break a content-encoding: diff-e,gzip response (that is, a gzip of a diff-e of the current contents against the decoded base instance). Right, but the whole point of this week's discussion is that we have realized (thanks to you!) that delta encodings cannot be described as content-codings. Therefore, if you want the output of diff-e to be compressed, then we need to express this either as IM: diff-e Transfer-coding: gzip (that is, do the compression hop-by-hop, which is probably not what you meant), or define a new delta encoding format that includes compression: IM: diff-e-gzip or possibly we could define the IM header in such a way that it allows an ordered series of instance manipulation steps, including compression: IM: diff-e, gzip I believe that the Marimba approach that Arthur described falls into the "encoding format includes compression step" model. >The difference is that this allows a simpler implementation of the >scenario where the server always stores the encoded version (e.g., >foo.html.gz) in its file system, and so it computes the delta between two >different instances of foo.html.gz, rather than foo.html. Which might or >might not be the most efficient in terms of coding density, but I think >it's a potentially useful extension of your rule. Since most encodings are going to break delta applications, it's sort of pointless to bother with supporting delta for pre-encoded files that are not identified as such. I almost agree with you - but I think it would be a good exercise to think through all of the possible corner cases, to make sure we haven't left any other bugs in the protocol design. And remember that although all existing content-codings are some form of compression, this isn't necessarily true for the future. (For example, one might plausibly imagine an encoding that takes more bytes but simplifies the parsing of some content-type.) I think if we can make the protocol work (i.e., make the spec unambiguous) for all cases, without adding a lot of complexity, then we are better off than if we just try to make it work for the cases we think are likely. BTW: in the above, I assume that there will never be "deltas against deltas", that a client will not request a delta against "ddd", where "ddd" was the etag of a Content-encoding: gdiff response. No! I'm pretty sure that the results from our SIGCOMM paper support the opposite choice. Unlike lightning, delta opportunities tend to strike multiple times in the same place. That is, one is likely to see a sequences of references to a given URL, each resulting in a new instance, and each one expressable as a small delta from the previous one. The trick is that if the client has to decode the instances before applying a delta, when it caches the result, it should (in effect) re-encode the new instance before storing it. That then should give a consistent interpretation to the Content-Encoding header of the cached new instance - it has the same value as the Content-Encoding header of the 226 (Delta) response. One might object that this is wasteful because it consumes CPU time at the cache (client or proxy). But we would generally like to trade CPU time for bytes-on-the-wire, because Moore's law continues to make CPU time cheaper relative to bandwidth. -Jeff From danielh@crosslink.net Fri Mar 24 15:37:54 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA28772; Fri, 24 Mar 2000 15:37:54 -0800 (PST) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA00260; Fri, 24 Mar 2000 15:37:54 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id PAA25211 for ; Fri, 24 Mar 2000 15:37:53 -0800 (PST) Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id SAA09558 for ; Fri, 24 Mar 2000 18:37:52 -0500 Message-Id: <200003242337.SAA09558@lycanthrope.crosslink.net> X-Really-To: Date: Fri, 24 Mar 2000 18:36:55 -0500 To: http-delta@pa.dec.com In-Reply-To: <200003242002.MAA12626@wera.pa.dec.com> Subject: Re: instance redux**2 X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 I comment on jeff's most recent message below, but here's what my current thoughts and questions (mostly as a result of struggling with Jeff's comments).... 1) Define instance as being a) after content-codings b) before range extraction and before transfer codings ? Is an instance defined before, or after, delta codings? This may depend on whether a "instance manipulation" step (see 3 below) is added. Base-instances are instances returned from prior requests for a given resource. They may be content (and delta?) encoded. The server and client will cache these base instances, and will associate a url/etag pair to each base instance. In addition, each base instance will be associated with the set of content-encodings (and possibly the set of delta encodings) used in it's creation (this set of content-codings may be implicit; say, as based on file extensions). The current instance is the instance of the current request, which may be content encoded ? can it be delta-encoded ? What do we call the "current snapshot of the resource" -- the pre-content-encoded, pre-delta-encoded thing? 2) Etags are assigned to the results AFTER content and delta codings, but before range extraction and transfer codings. 3) ISSUE: should delta codings be ** combined with content-codings, or ** treated as part of an "instance manipulation" step that occurs after content-coding. * Adding an "instance manipulation" step may allow for greater generalizations. * Keeping the current idea, of delta-encoding as a content-encoding, may be adequate for now. 4) Either approach should assume that in many cases future delta enabled-requests may refer to a base instance that is content-encoded. In these cases, the server (and client) should either: a) un-encode the base instance, and use it to form a delta against the "current snapshot of the resource" b) compute a delta against the (possibly encoded) base instance, and the (possibly encoded) current instance. ? In case b, instance is defined as "pre-delta-encoding". Case b will often be useless. For example, the delta of two gzipped files may differ substantially, even though their unGzipped contents may be nearly identical. However, automatically doing case a prevents delta computation of future content-codings that may yield useful deltas (say, a content-coding that facilitates parsing). Thus, the client needs some means of telling the server which case should be applied. It may be easier to do this if an "instance manipulation step" is added -- see 7 below for more discussion . 5) In addition to referering to base-instances that are content-encoded, clients can use "delta encoded" responses as base instance. If they do, prior to computing a delta the server (and client) will have to "undifference", as well as possibly un-encode, the base instance. This requires another piece of information -- the identity of an "earlier base instance". Furthermore, this process could be convoluted and compute intensive -- as when a sequence of delta-encoded base instances is generated (with later instances pointing to earlier instances). Thus, clients should always include at least one non-delta-encoded instance when forming a delta-enabled request. 6) Some provision for end-to-end encoding of difference files is necessary. Currently, this is signaled by adding content-codes (such as GZIP) after a delta-codes in the Content-encoding header. Something similar should be specifiable under the "instance manipulation" scenario. 7) In the instance manipulation scenario, compression (or other content-like-codings) could occur in the content-coding stage, or in the "instance manipulation"stage. Thus, the following rule is a possible replacement for 5: i) encoding that is done in the "instance manipulation step" should be removed prior to computation of a delta ii) encoding that is done in the "content-coding" step should NOT be removed prior to computation of a delta. ? For initial responses to delta aware clients, there needs to be some way of specifying 5i -- the content is encoded, and in a future delta request decoding should be done prior to computing/applying the delta. However, this speification needs to compatible with non-delta aware requests. Maybe two tags isn't such a bad idea after all :] ---------------------------------------------------------------------------------------- Jeff wrote >Actually, I think I may have botched the wording slightly, and I >certainly made it more confusing than it should be. How about this: > If the base instance response and the current delta response carry > DIFFERENT content-codings, then the server computes the delta based > on the UN-ENCODED representations of both the base instance and the > current instance. > If both the base instance response and the current delta response > carry the SAME set of content-coding(s), then the server computes > delta based on the ENCODED representations of both the base > instance and the current instance. (If the set of content-codings > == {}, then there is no difference between "encoded" and > "un-encoded" representations.) I'm having a hard time understanding the above, and I think it's because I'm not straight on the presuppositions. For example: a) are you assuming that delta-encoding is a seperate, post content-encoding, step? b) what is the current delta response --- If the "current delta response" is the "difference file computed by comparing a base instance against a current instance", then how can the base instance response and the delta response be comparable (one is original content, the other is a difference file)? >>So basically the server either has to remember what content-coding it >>used when sending a base instance response, or (easier to implement) it >>simply has to follow a deterministic rule that doesn't depend on >>extraneous parameters. The latter could be slightly complicated because >>if, for a given URL, the server has a choice of content-codings based on >>the client's Accept-* headers, then it probably does have to remember >>what it sent for a given entity tag. > >But if the content-coding is deterministically based on the >Request-URI [e.g., "/foo.html" vs. "/foo.html.gz"), then this isn't a >problem. B) How the server "remembers" is unimportant -- it can retain a physical record of a transaction, or it can use rules like the above (personally, I use the latter whenever possible in my implemetations). However, I would add that in this example: if the server applies a "this is gzip content encoded, so decode before differencing" rule to a base instance (that is to be used in a delta-enabled response for foo.html.gz), it MUST be the case that a "content-encoding: gzip" response header was included in all responses fo foo.html.gz. >The client has to remember what content-coding came with the base >instance response, but that's easy because it's right there in the >Content-Encoding header. Yep. >Later in your message, you write: > Therefore: In addition to the server (and client) associating > resource/etag pairs to "cached" copies of base instances, the > server (and client) should also retain the set of codings > used to create this instance. That is, the association is > between a "resource/etag" pair and a > "base-instance/content-codings-used-to-create-this-base-instance" > pair. >I think we're in agreement, if you'll allow "if the server >can reliably recompute the pair later on, then it doesn't have to store >it." C) Yes. How the server maintains these associations, or whether it stores or regenerates decoded versions of a base instance, is not a concern of the rfc. >You wrote: > If that's what you mean, I can't agree -- that would break a > content-encoding: diff-e,gzip > response (that is, a gzip of a diff-e of the current contents against >the decoded base instance). >Right, but the whole point of this week's discussion is that we have >realized (thanks to you!) that delta encodings cannot be described as >content-codings. D) I'm not sure I agree with that : \ .... see E below.. [upon re-reading -- I'm starting to see the value in your logic, but I'm still not entirely convinced -- see F below] Therefore, if you want the output of diff-e to be >compressed, then we need to express this either as > IM: diff-e > Transfer-coding: gzip >(that is, do the compression hop-by-hop, which is probably not what you >meant), or define a new delta encoding format that includes compression: > IM: diff-e-gzip >or possibly we could define the IM header in such a way that it allows an >ordered series of instance manipulation steps, >including compression: > IM: diff-e, gzip E) IM -- instance manipulation? Maybe this extra layer is necessary. However, I think that for delta, this new header is not necessary. My original notion was to use a second tag -- an "original" tag (itag or o-etag). But since that idea wasn't very popular, I (along with Jeff) divined that so long as the client and server agree to de-code (to reverse the content-encodings applied to) their respective copies of a base instance before computing or applying the delta, then things would work fine. Given that, one should be able to treat delta as a content-encoding. There are some implementation concerns when the delta-encoding was applied to a base instance (see H below) >I believe that the Marimba approach that Arthur described falls into the >"encoding format includes compression step" model. My reading of Arthur is that compression is an optional step, after encoding? >>>The difference is that this allows a simpler implementation of the >>>scenario where the server always stores the encoded version (e.g., >>>foo.html.gz) in its file system, and so it computes the delta between two >>>different instances of foo.html.gz, rather than foo.html. Which might or >>>might not be the most efficient in terms of coding density, but I >>think it's a potentially useful extension of your rule. > >> Since most encodings are going to break delta applications, >> it's sort of pointless to bother with supporting delta for >> pre-encoded files that are not identified as such. >I almost agree with you - but I think it would be a good exercise to >think through all of the possible corner cases, to make sure we haven't >left any other bugs in the protocol design. And remember that although >all existing content-codings are some form of compression, this isn't >necessarily true for the future. (For example, one might plausibly >imagine an encoding that takes more bytes but simplifies the parsing of >some content-type.) F) That's a good point. So the delta spec should allow for cases where the delta is computed against the instances (where, to reiterate, the instance is what one has after content-encoding), and not against the "decoded" instances. Oh brother, how does one do that. You'ld have to have some way for the client to tell the server at what point to stop "encoding" (and to stop decoding). In particualr, if we have a html-parse,gzip encoding; , then the client would have to tell the server to "html-parse the current "snapshot", unGzip the base instance, compute the delta, and send it to me" Perhaps this is why you are leaning to a IM header, which can become more complex. >I think if we can make the protocol work (i.e., make the spec >unambiguous) for all cases, without adding a lot of complexity, then we >are better off than if we just try to make it work for the cases we think >are likely. G) Sometimes I wish I could disagree. > BTW: in the above, I assume that there will never be "deltas > against deltas", that a client will not request a delta against > "ddd", where "ddd" was the etag of a Content-encoding: gdiff > response. >No! I'm pretty sure that the results from our SIGCOMM paper support the >opposite choice. Unlike lightning, delta opportunities tend to strike >multiple times in the same place. That is, one is likely to see a >sequences of references to a given URL, each resulting in a new instance, >and each one expressable as a small delta from the previous one. H) Okay, after thinking about it, I think I agree. Here's what I think is happening... Suppose the server regenerates "unencoded" versions of base instances as needed. If a given base instance was delta-encoded (that is, the base instance was a difference file from an earlier 226 response), then this "unencoding" will involve undifferencing. Thus, as long as the "base instance for this delta-encoded base instance" is available, using "delta responses" as base instances for future delta responses is okay, and does not break the "use unencoded instances when computing/applying deltas" rule. But it looks awfully messy -- if you get a sequence of these, to compute the "unencoded" base instance, you might have to do recursive undifferencing of a set of base instances (where all but the first base instance of the set was delta encoded). Lose one of them, and you are out of luck . >The trick is that if the client has to decode the instances >before applying a delta, when it caches the result, it should (in effect) >re-encode the new instance before storing it. >That then should give a consistent interpretation to the >Content-Encoding header of the cached new instance - it has the same >value as the Content-Encoding header of the 226 (Delta) response. I) Depends if you'll ever use the "encoded" base instance. It seems a lot cheaper to retain the unencoded base instance, and risk loosing some flexibility. >One might object that this is wasteful because it consumes >CPU time at the cache (client or proxy). But we would >generally like to trade CPU time for bytes-on-the-wire, >because Moore's law continues to make CPU time cheaper >relative to bandwidth. J) If the recursion gets deep enough, it could be a lot of cpu time! Personally, in the near future I won't be adding support for delta encoding "when the base instance is a delta encoded response" -- it's kind of scarey thinking about the details! Perhaps that looses some efficiency -- This "personal evidence" suggests a proviso: When a client specifies (through If-None-Match) base-instances that may be used to form a delta response, it may include base-instances that are delta-encoded, and it SHOULD include at least one non-delta-encoded base instance. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Fri Mar 24 16:10:54 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA28224; Fri, 24 Mar 2000 16:10:54 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA30609; Fri, 24 Mar 2000 16:10:54 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA26440; Fri, 24 Mar 2000 16:10:54 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003250010.QAA26440@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: instance redux**2 In-Reply-To: Your message of "Fri, 24 Mar 2000 18:36:55 EST." <200003242337.SAA09558@lycanthrope.crosslink.net> Date: Fri, 24 Mar 2000 16:10:54 -0800 X-Mts: smtp writes: I comment on jeff's most recent message below, but here's what my current thoughts and questions (mostly as a result of struggling with Jeff's comments).... I think it's not really profitable for us to continue exchanging lengthy email messages until I can put my re-design into a self-contained, clear statement. You shouldn't have to struggle with it. So rather than trying to address your comments, I'm off working on that. It probably won't be ready until early next week. -Jeff From mogul Mon Mar 27 15:22:53 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA15063; Mon, 27 Mar 2000 15:22:53 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003272322.PAA15063@wera.pa.dec.com> To: http-delta Subject: Proposed redesign of delta spec to account for recent bug Date: Mon, 27 Mar 2000 15:22:53 -0800 X-Mts: smtp This is a sketch of what I propose to do to fix the delta encoding (and instance digest) specifications, in order to fix the ambiguity about when the entity tag is assigned. Note that this is still a sketch, not an actual set of changes! so there will probably be other details that come up. Also, at this point I don't see any need to change the treatment of delta-transfer-codings (i.e., hop-by-hop delta encodings), so that's not covered here. Please read the whole message before replying! -Jeff ======================================================================== New/Changed definitions: My original definition for "instance" was wrong, and should be replace by these two definitions: instance The entity that would be returned in a status-200 response to a GET request, at the current time, for the selected variant of the specified resource, with the application of zero or more content-codings, but without the application of any instance manipulations or transfer-codings. instance manipulation An operation on one or more instances which may result in an instance being conveyed from server to client in parts, or in more than one response message. For example, a range selection or a delta encoding. Instance manipulations are end-to-end, and often involve the use of a cache at the client. ======================================================================== HTTP response generation pipeline: Once the definition for "instance" is cleared up, the processing pipeline for HTTP responses is: datatype operation leading to next datatype ======== ================================== variant | apply content-coding v instance | apply instance manipulation: v (delta encoding, range selection, etc.) entity-body | apply transfer-coding v message-body The entity tag is associated with a specific instance, not with either a variant or an entity-body. For strong entity tags, at least, a given entity tag value is uniquely associated with an instance, within a uniqueness scope. ======================================================================== New (semi-correct) BNF: This is a rough idea of the new BNF necessary to support end-to-end delta encoding, now that delta encoding is NOT being done as a form of content-coding: instance-manipulation = "vcdiff" | "gdiff" | "diffe" | "gzip" | token IM = "IM" ":" #(instance-manipulation) A-IM = ("A-IM" | "Accept-IM") ":" #(instance-manipulation) It may be possible, in theory, to describe range selection as a form of instance manipulation, although this would require quite a bit more syntax (to specify ranges). Doing so might make it possible to explicitly control whether range selection is done before or after other instance manipulations. I'm not sure this is worth the effort. It might also be worth thinking about allowing parameters to be associated with instance-manipulation tokens, in case a particular coding function can be parameterized. For example, IM: mydiff;windowsize=37 Again, I'm not sure if this is worth the effort. Note that instance-manipulations may be combined, so that IM: diffe, gzip means that the server applied gzip to the output of diff -e. ======================================================================== Derivation rule: Here is some pseudo-code that explains how a client, upon receiving a 226 (Delta) response, should interpret it, based on the headers in that response and in the cached base-instance response. I have reviewed this once or twice for correctness, but since it's pseudo-code I obviously haven't tested it. // variables and their meanings Rreceived : Just-received response [input] Inew : new instance derived from Rreceived [output] Rnew : new cachable response derived from Rreceived [output] Rold : some previously received response [temporary] Ccoding : content-coding [temporary] // end of variables. if status_code(Rreceived) == 226_Delta then // find cached base-instance response Rold = find_response(delta_base(Rreceived)); if (Rold == NULL) then error("Missing base instance!"); endif if Content_encoding(Rold) == Content_encoding(Rreceived) then Inew = apply_delta(body(Rold), body(Rreceived)); Rnew = Inew; // keeps Content-Encoding hdr from Rreceived else Ccoding = Content_encoding(Rreceived); Inew = apply_delta(content_decode(Rold),content_decode(Rreceived)); if Ccoding == identity then Rnew = Inew; else Rnew = apply_content_coding(Inew, Ccoding); endif endif endif // Note: content_decode() applies the identity transformation if // "Content-Encoding" header is empty or missing. ======================================================================== Examples: Here are several examples, starting with an initial (non-delta) request/response, and then continuing with a number of different ways of getting the same new content. Note that example 2 yields entity-tag "def", while examples 3 & 4 yield entity-tag "ghi", because the former has a non-identity content-coding while the latter two do not. Even so, the decoded content is the same in all three of those examples. Again, I have fixed all the bugs I can find in these examples, but that doesn't mean that I found them all. (1) At time 14:00:00: GET /example.com/foo.html HTTP/1.1 Host: example.com Accept-encoding: gzip HTTP/1.1 200 OK Date: Wed, 24 Dec 1997 14:00:00 GMT Etag: "abc" Content-encoding: gzip etag = abc for instance = gzip(foo.html/14:00:00) body is gzip(foo.html/14:00:00) (2) At time 14:01:00 - alternative #1: GET /example.com/foo.html HTTP/1.1 Host: example.com If-none-match: "abc" Accept-encoding: gzip A-IM: vcdiff HTTP/1.1 226 Delta Date: Wed, 24 Dec 1997 14:01:00 GMT Etag: "def" Delta-base: "abc" Content-encoding: gzip IM: vcdiff etag = def for instance = gzip(foo.html/14:01:00) message-body is vcdiff_delta(gzip(foo.html/14:00:00), gzip(foo.html/14:01:00)) new cache entry is stored with Content-encoding: gzip (3) At time 14:01:00 - alternative #2: GET /example.com/foo.html HTTP/1.1 Host: example.com If-none-match: "abc" Accept-encoding: gzip A-IM: vcdiff HTTP/1.1 226 Delta Date: Wed, 24 Dec 1997 14:01:00 GMT Delta-base: "abc" Etag: "ghi" IM: vcdiff etag = ghi for instance = identity(foo.html/14:01:00) message-body is vcdiff_delta(gunzip(gzip(foo.html/14:00:00)), foo.html/14:01:00) new cache entry is stored with Content-encoding: identity (4) At time 14:01:00 - alternative #3: GET /example.com/foo.html HTTP/1.1 Host: example.com If-none-match: "abc" Accept-encoding: gzip A-IM: diffe, gzip HTTP/1.1 226 Delta Date: Wed, 24 Dec 1997 14:01:00 GMT Delta-base: "abc" Etag: "ghi" IM: diffe, gzip etag = ghi for instance = identity(foo.html/14:01:00) message-body is gzip(diffe_delta(gunzip(gzip(foo.html/14:00:00)), foo.html/14:01:00)) new cache entry is stored with Content-encoding: identity ======================================================================== From danielh@crosslink.net Mon Mar 27 20:51:54 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id UAA30063; Mon, 27 Mar 2000 20:51:54 -0800 (PST) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA11557; Mon, 27 Mar 2000 20:51:54 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id UAA25709 for ; Mon, 27 Mar 2000 20:51:53 -0800 (PST) Received: from smtp.crosslink.net (dyn11.c5200-3.springfield.236.crosslink.net [207.199.145.76]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id XAA24402 for ; Mon, 27 Mar 2000 23:51:46 -0500 Message-Id: <200003280451.XAA24402@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Mon, 27 Mar 2000 23:46:27 -0500 To: http-delta@pa.dec.com Subject: redesign of delta -- some comments X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Some comments on Jeff Mogul's 3/27/00 proposal. I) Reiterate the definition of instance The entity tag is associated with a specific instance, not with >either a variant or an entity-body. For strong entity tags, >at least, a given entity tag value is uniquely associated with >an instance, within a uniqueness scope. Since it's a crucial point that RFC2616 skimps on, I'ld add... Thus, the entity tag is assigned AFTER content coding, but BEFORE delta encoding, range manipulation, etc. II) Restating the rules The rules that I read from your pseudo code and your examples are: [... first, some definitions: base-instance: a copy of a prior instance of a resource; which can be retained by both client and server ] The client side rules: a) Instances are stored, and are identified by etags. Per definition, instances are always post content-coding, but pre delta encoding. In addition to storing the body of the instance, the content-encoding must also be stored. b) When a delta-response is recieved, the undifferencing rules are: i) compare the content-encoding of the client's copy of the base instanceand the content-encoding of the delta-response. ii.a) If they are the same, recreate the current instance by applying the difference (that is, the body of the delta-response) to the base instance -- do NOT decode the base instance beforehand. ii.b) If they are different, ii.b.1) Decode the base instance ii.b.2) Recreate an unencoded version of the current instance by applying the difference to this decoded base instance, ii.b.3) Content-encode (using the response's content-encoding) this recreated current instance, and save (using the current etag) Note that step a necessitates step ii.b.3 Also note that applying the "im" decoding may be a several step process -- for example, the response may first need to be gunzip'ed, and then undifferenced. When the server is forming a delta-response, it has some flexibility. For example, if the original response had a Gzip content-encoding, the server can: a) choose Gzip as a content-encoding, and compute a difference between two Gzipped instances b) choose Gzip as a IM, and compute a difference between un-encoded instances. III) A possible shortcoming Content-encoding wise, this is an all-or-nothing strategy. The server either uses the instances with all it's content-codings, or with none of them. This may miss some opportunities. For example: Consider a "qparse" content-encoding, that (in contrast to most compression algorithims)is amenable to differencing. Hence, it might be optimal to compute the delta after qparsing (but before compression). Furthermore, assume "qparse" actually increases the size of the response, hence should be combined with compression. Example: (1) At time 14:00:00: GET /example.com/foo.html HTTP/1.1 Host: example.com Accept-encoding: qparse,gzip HTTP/1.1 200 OK Date: Wed, 24 Dec 1997 14:00:00 GMT Etag: "abc" Content-encoding: qparse,gzip etag = abc for instance = gzip(qparse(foo.html/14:00:00)) message-body is gzip(qparse(foo.html/14:00:00)) How might the client and server agree to compute a delta on a qparsed response? For example, to return: diffe_delta(gunzip(foo.html/14:00:00)),qparse(foo.html/14:01:00)) I don't see how it can be done with content codings. Perhaps if qparse could be an IM? Also, if qparse is an expensive operation that a powerful server is willing to undertake (but thin clients may wish to avoid), it may be burdensome to require that the client reconstruct the "instance" with all of the content-encodings. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Tue Mar 28 15:26:06 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA18628; Tue, 28 Mar 2000 15:26:06 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA08567; Tue, 28 Mar 2000 15:26:05 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA19123; Tue, 28 Mar 2000 15:26:05 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003282326.PAA19123@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: redesign of delta -- some comments In-Reply-To: Your message of "Mon, 27 Mar 2000 23:46:27 EST." <200003280451.XAA24402@lycanthrope.crosslink.net> Date: Tue, 28 Mar 2000 15:26:05 -0800 X-Mts: smtp Daniel and I seem to have more or less converged, but there are a few items from his most recent message that I should respond to: The client side rules: a) Instances are stored, and are identified by etags. Per definition, instances are always post content-coding, but pre delta encoding. In addition to storing the body of the instance, the content-encoding must also be stored. b) When a delta-response is recieved, the undifferencing rules are: i) compare the content-encoding of the client's copy of the base instance and the content-encoding of the delta-response. ii.a) If they are the same, recreate the current instance by applying the difference (that is, the body of the delta-response) to the base instance -- do NOT decode the base instance beforehand. ii.b) If they are different, ii.b.1) Decode the base instance ii.b.2) Recreate an unencoded version of the current instance by applying the difference to this decoded base instance, ii.b.3) Content-encode (using the response's content-encoding) this recreated current instance, and save (using the current etag) This seems generally right. (I might even steal some of this for the rewrite of the spec, once I get around to it.) Note that step a necessitates step ii.b.3 Well, not exactly. If the client doesn't put the result of ii.b.2 into a cache, but only uses it to render a page, then there is no need to go through step ii.b.3. Further, even a cache can avoid this step by deferring it until the next use of the cached response, and (because many cache entries are never re-used) so might avoid ever doing the re-encoding. However, a proxy cache that applies the delta decoding before forwarding the response to a client has to restore the content-coding as sent by the origin server - i.e., has to deliver exactly the instance that the client would have received had delta encoding not been used. This could be important, for example, if the client later receives a delta or range via a different proxy, to be applied to this instance. III) A possible shortcoming Content-encoding wise, this is an all-or-nothing strategy. The server either uses the instances with all it's content-codings, or with none of them. I'm not quite sure what you mean by that. This may miss some opportunities. For example: Consider a "qparse" content-encoding, that (in contrast to most compression algorithims)is amenable to differencing. Hence, it might be optimal to compute the delta after qparsing (but before compression). Furthermore, assume "qparse" actually increases the size of the response, hence should be combined with compression. Example: (1) At time 14:00:00: GET /example.com/foo.html HTTP/1.1 Host: example.com Accept-encoding: qparse,gzip HTTP/1.1 200 OK Date: Wed, 24 Dec 1997 14:00:00 GMT Etag: "abc" Content-encoding: qparse,gzip etag = abc for instance = gzip(qparse(foo.html/14:00:00)) message-body is gzip(qparse(foo.html/14:00:00)) How might the client and server agree to compute a delta on a qparsed response? One possibility would be to take advantage of an apparent loophole in the sketchy specification I suggested yesterday; do the initial request like so: (1) At time 14:00:00: GET /example.com/foo.html HTTP/1.1 Host: example.com Accept-encoding: qparse,gzip A-IM: gzip HTTP/1.1 200 OK Date: Wed, 24 Dec 1997 14:00:00 GMT Etag: "abc" Content-encoding: qparse IM: gzip Vary: A-IM etag = abc for instance = qparse(foo.html/14:00:00) message-body is gzip(qparse(foo.html/14:00:00)) I.e., the server has a choice (based on the client's Accept-* headers) whether to apply the gzip as a content-coding or as an instance-manipulation. I hadn't originally thought of a good reason for the server to apply gzip as an IM without first having done a compressible delta encoding, but here it seems to make sense. Note, however, that in order to prevent a cache from accidentally forwarding this status-200 response to a client that doesn't understand IM, it has to be labelled "Vary: A-IM". (This probably should be done for all of my examples, although in the other cases, the 226 response status code will prevent incorrect treatment of misdirected delta responses.) Once this initial response has been received, and the client wants to get a delta as an update, we have: (2) At time 14:00:01: GET /example.com/foo.html HTTP/1.1 Host: example.com Accept-encoding: qparse,gzip A-IM: diffe,gzip HTTP/1.1 226 Delta Date: Wed, 24 Dec 1997 14:00:01 GMT Etag: "mno" Content-encoding: qparse IM: diffe,gzip Vary: A-IM etag = mno for instance = qparse(foo.html/14:01:00) message-body is gzip(diffe_delta(qparse(foo.html/14:00:00), qparse(foo.html/14:01:00))) I.e., because the original message used gzip as an IM, not as a content-coding, we don't need to take that into account when computing or applying the delta. The cache is free to store the initial response with the "IM: gzip" encoding, but it has to decode this before using the response to apply a subsequent delta. -Jeff From mogul Wed Mar 29 13:13:19 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA31199; Wed, 29 Mar 2000 13:13:19 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003292113.NAA31199@wera.pa.dec.com> To: http-delta Subject: Another simplification(?): removing delta transfer-codings Date: Wed, 29 Mar 2000 13:13:18 -0800 X-Mts: smtp You may recall that the current Internet-Drafts for delta encoding specify ways to do it either as a content-coding or as a transfer-coding. There are two stated motivations for using a delta-transfer-coding instead of a delta-content-coding: (1) allow hop-by-hop deltas. (2) allow deltas to be applied after a Range selection. My recollection is that #2 was the real reason. Assume we are going to replace the use of deltas as content-codings with deltas as instance manipulations. How does this affect the decision to support deltas as transfer-codings? I'm not sure there is much important to reason #1. After all, either way, we have no mechanism to allow a client (or proxy) to know whether the next-hop proxy/server on the path towards the origin server supports any kind of delta, so I can't see any obvious way for the client to know which form to ask for. If a proxy does modify a request to imply support for deltas (e.g., adding "A-IM: vcdiff" to a request that doesn't already have it), then it's pretty clear that the proxy should apply any delta response it receives, and convert the forwarded response to a status-200 format. I don't think this is any harder than applying hop-by-hop deltas as transfer-codings. Reason #2 was more important: we needed a way to distinguish between "delta encoding before Range selection" and "Range selection before delta encoding." Under the old (erroneous) definition of "instance", delta-content-coding always came before Range selection, and delta transfer-coding always comes after, so this allows the client to make its intentions clear by specifying one or the other kind of delta encoding in its request. Now that delta encoding is NOT a content-coding, but rather a form of instance manipulation, this approach doesn't work (and I'm not sure I really ever thought through all of the implications of Ranges and delta-transfer-codings - it might not have worked anyway). This forces us to find another way to allow a client specify which order delta encoding and range selection should be done in. I.e., we have to face the problem head-on, and the trick of using transfer-codings won't help. So I'm proposing the following mechanism, which I think is a simplification overall (even though it has somewhat of a kludgey flavor): (a) Delta encoding is *always* a form of instance-manipulation, never a content-coding or transfer-coding. (b) Range selection is explicitly defined as a form of instance manipulation. (c) We define a "range" literal as part of the registered set of instance manipulations. (d) If a client's request includes both a Range header and an "A-IM: " request header, then in order to specify an ordering between these two instance manipulations, the client must include the "range" literal in the A-IM header. For example, GET /foo.html HTTP/1.1 If-None-Match: "abc" Host: example.com Range: bytes=1-100 A-IM: vcdiff,gdiff,range specifies that if the server does use a delta-encoding, then it must be applied BEFORE the range selection. The response would then be: HTTP/1.1 227 Range of Delta Etag: "def" Content-Range: bytes 1-100/12345 IM: vcdiff,range Vary: A-IM to make it clear to the recipient what order the instance manipulations were applied. Similarly, a client that wants the delta encoding to be applied after the Range selection would instead send: A-IM: range,vcdiff,gdiff in its request. If the request contains something contradictory like A-IM: vcdiff,range,vcdiff then I would argue that this is "undefined", and the server can do whatever it wants - i.e., we don't need to specify all of the possible bogus combinations. Note: It might not be necessary for the server to return the "range" token in the IM: response header. I suspect that the use of "Vary: A-IM" prevents any ambiguities. But I have to think more about this issue. Making this change greatly simplifies the Internet-Draft, although it does introduce some slight additional mechanism. -Jeff From danielh@crosslink.net Wed Mar 29 13:51:17 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA07627; Wed, 29 Mar 2000 13:51:16 -0800 (PST) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA29565; Wed, 29 Mar 2000 13:51:16 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA11823 for ; Wed, 29 Mar 2000 13:51:15 -0800 (PST) Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id QAA30697 for ; Wed, 29 Mar 2000 16:51:14 -0500 Message-Id: <200003292151.QAA30697@lycanthrope.crosslink.net> X-Really-To: Date: Wed, 29 Mar 2000 16:40:14 -0500 To: http-delta@pa.dec.com In-Reply-To: <200003292113.NAA31199@wera.pa.dec.com> Subject: Re: Another simplification(?): removing delta transfer-codings X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Jeffrey Mogul said: > .... There are two stated motivations for using a delta-transfer-coding >instead of a delta-content-coding: > (1) allow hop-by-hop deltas. > (2) allow deltas to be applied after a Range selection. >My recollection is that #2 was the real reason. >Assume we are going to replace the use of deltas as >content-codings with deltas as instance manipulations. How does this >affect the decision to support deltas as transfer-codings? >I'm not sure there is much important to reason #1. .... >If a proxy does modify a request to imply support for deltas (e.g., >adding "A-IM: vcdiff" to a request that doesn't already have it), then >it's pretty clear that the proxy should apply any delta response it >receives, and convert the forwarded response to a status-200 format. I >don't think this is any harder than applying hop-by-hop deltas as >transfer-codings. I concur. I note that during my first round of implementation of delta, I found that it was easier to start with delta as a transfer encoding (mostly because the rules seemed simpler). Under the new proposal, that's no longer a concern. >Reason #2 was more important: we needed a way to distinguish between >"delta encoding before Range selection" and "Range selection before delta >encoding." Under the old (erroneous) definition of "instance", >delta-content-coding always came before Range selection, and delta >transfer-coding always comes after, so this allows the client to make its >intentions clear by specifying one or the other kind of delta encoding in >its request..... >So I'm proposing the following mechanism, which I think is a >simplification overall (even though it has somewhat of a kludgey flavor): >(a) Delta encoding is *always* a form of instance-manipulation, never a >content-coding or transfer-coding. >(b) Range selection is explicitly defined as a form of instance >manipulation. >(c) We define a "range" literal as part of the registered set of instance >manipulations. >(d) If a client's request includes both a Range header and >an "A-IM: " request header, then in order to specify >an ordering between these two instance manipulations, the client must >include the "range" literal in the A-IM header. For example, It's a lot LESS kludgey then the old version (in the "I'll be happy to rework my current implementation" sense) >..... >Note: It might not be necessary for the server to return >the "range" token in the IM: response header. I suspect >that the use of "Vary: A-IM" prevents any ambiguities. But >I have to think more about this issue. It seems dangerous to try and finesse the need to include Range in IM, since ordering (whether it is before or after a delta-code) has major importance. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From danielh@crosslink.net Wed Mar 29 21:19:00 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id VAA02374; Wed, 29 Mar 2000 21:19:00 -0800 (PST) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA08434; Wed, 29 Mar 2000 21:19:00 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id VAA30340 for ; Wed, 29 Mar 2000 21:18:59 -0800 (PST) Received: from smtp.crosslink.net (dyn05.c5200-1.springfield.236.crosslink.net [207.199.142.6]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id AAA21893 for ; Thu, 30 Mar 2000 00:18:52 -0500 Message-Id: <200003300518.AAA21893@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Thu, 30 Mar 2000 00:08:28 -0500 To: http-delta@pa.dec.com Subject: adding an IM content coding X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 It may not be necessary, but I'm wondering if there are cases where http/1.1-but-not-IM-aware clients (or intermediate proxies) may recieve delta-coded (or otherwise IM coded) responses. If so, the content-coding will look normal, but they won't be able to make sense of the response. I suspect that if all actors are well behaved, such a reciept will not occur. In particular, I think the Vary: A-IM should prevent such events. But what if there is a subtle case that Vary doesn't cover, or (more likely) a not-quite-kosher proxy. To account for such mishaps, a special "IM" coding could be added at the end of the content-coding. This would be discarded by IM-aware clients. However, as an unrecognized coding, it would signal to non-IM-aware clients that there is more here then a normal content-encoded response. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul Thu Mar 30 10:08:55 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA01412; Thu, 30 Mar 2000 10:08:55 -0800 (PST) Message-Id: <200003301808.KAA01412@wera.pa.dec.com> To: http-delta From: Reply-To: danielh@crosslink.net Original-Date: Tue, 28 Mar 2000 21:54:27 -0500 In-Reply-To: <200003282326.PAA19123@wera.pa.dec.com> Subject: Re: redesign of delta -- some comments X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Date: Thu, 30 Mar 2000 10:08:55 -0800 Sender: mogul X-Mts: smtp Daniel said: >> The client side rules: .... Jeffrey said: >This seems generally right. (I might even steal some of this for the >rewrite of the spec, once I get around to it.) Please do. >> Note that step a necessitates step ii.b.3 >Well, not exactly. If the client doesn't put the result of ii.b.2 into a >cache, but only uses it to render a page, then there is no need to go >through step ii.b.3. Further, even a cache can avoid this step by >deferring it until the next use of the cached response, and (because many >cache entries are never re-used) so might avoid ever doing the >re-encoding. That does loosen up a lot of possible problems -- the client doesn't have to cache, or doesn't have to "reperform" the content codings, and doesn't have to use the current instance as a future base instance. >However, a proxy cache that applies the delta decoding before forwarding >the response to a client has to restore the content-coding as sent by the >origin server - i.e., has to deliver exactly the instance that the client >would have received had delta encoding not been used. This could be >important, for example, if the client later receives a delta or range via >a different proxy, to be applied to this instance. When would that happen (given that content-coding and IM are end-to-end)? > This may miss some opportunities. For example: > Consider a "qparse" content-encoding, that (in contrast to most > compression algorithims) is amenable to differencing. > Hence, it might be optimal to compute the delta after qparsing > (but before compression). .... > How might the client and server agree to compute a delta on a > qparsed response? >One possibility would be to take advantage of an apparent loophole in the >sketchy specification I suggested yesterday; do the initial request like >so: > (1) At time 14:00:00: > GET /example.com/foo.html HTTP/1.1 > Host: example.com > Accept-encoding: qparse,gzip > A-IM: gzip > > HTTP/1.1 200 OK > Date: Wed, 24 Dec 1997 14:00:00 GMT > Etag: "abc" > Content-encoding: qparse > IM: gzip > Vary: A-IM That notion had crossed my mind, but I was concerned about interoperability with non IM aware clients & servers. Allowing gzip to be both an accept-encoding and a A-IM, use of Vary, and the 226 response seems to allay this concern. >I.e., the server has a choice (based on the client's Accept-* headers) >whether to apply the gzip as a content-coding or as an >instance-manipulation. I hadn't originally thought of a good reason for >the server to apply gzip as an IM without >first having done a compressible delta encoding, but here it seems to >make sense. Actually, Jeff came up with the idea of a content-coding that is designed to facilitiate parsing (thus, "quick-parse"). Note that since this sort of coding is used to speed up processing on the client end (rather then to reduce bandwidth requirements) it would be counter productive to expect the client to completely decode, and then recode, a base instance. Hence the rationale for allowing a delta against a "partially" decoded instance. >Note, however, that in order to prevent a cache from accidentally >forwarding this status-200 response to a client that doesn't understand >IM, it has to be labelled "Vary: A-IM". (This probably should be done >for all of my examples, although in the other cases, the 226 response >status code will prevent incorrect treatment of misdirected delta >responses.) ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Thu Mar 30 10:22:55 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA31599; Thu, 30 Mar 2000 10:22:55 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA18813; Thu, 30 Mar 2000 10:22:54 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA31734; Thu, 30 Mar 2000 10:22:54 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003301822.KAA31734@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: adding an IM content coding In-Reply-To: Your message of "Thu, 30 Mar 2000 00:08:28 EST." <200003300518.AAA21893@lycanthrope.crosslink.net> Date: Thu, 30 Mar 2000 10:22:54 -0800 X-Mts: smtp writes: It may not be necessary, but I'm wondering if there are cases where http/1.1-but-not-IM-aware clients (or intermediate proxies) may recieve delta-coded (or otherwise IM coded) responses. If so, the content-coding will look normal, but they won't be able to make sense of the response. I suspect that if all actors are well behaved, such a reciept will not occur. In particular, I think the Vary: A-IM should prevent such events. But what if there is a subtle case that Vary doesn't cover, or (more likely) a not-quite-kosher proxy. That's the main reason why we use the 226 (Delta) or 227 (Range of Delta) response codes. HTTP caches do not store responses if they don't understand the response code (this is explicit in HTTP/1.1, and apparently true for known implementations of HTTP/1.0). "Vary: A-IM" is not actually sufficient, since an HTTP/1.0 cache would ignore that. To account for such mishaps, a special "IM" coding could be added at the end of the content-coding. This would be discarded by IM-aware clients. However, as an unrecognized coding, it would signal to non-IM-aware clients that there is more here then a normal content-encoded response. I don't think that is necessary. By the way: I'm toying with the idea of collapsing the two response status codes from the existing draft ("Delta" and "Range of Delta") into one code; e.g., 226 (Instance Manipulation Applied). This would require that a delta-aware proxy (one that *is* willing to cache a Delta response) is also at least "aware" of Range, although it would not actually need to implement Range. So the "new" 226 would simply mean that the recipient needs to look for any of a set of instance-manipulation-related headers in the response, including both "IM" and "Content-Range". The tricky part is that this wouldn't actually be sent if the only instance manipulation were the Range selection, which should instead yield the traditional 206 (Partial Content) response. But we're already stuck doing something a little complex for combining Ranges and Deltas. -Jeff From danielh@crosslink.net Thu Mar 30 11:05:50 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA19582; Thu, 30 Mar 2000 11:05:50 -0800 (PST) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA07644; Thu, 30 Mar 2000 11:05:50 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id LAA20825 for ; Thu, 30 Mar 2000 11:05:49 -0800 (PST) Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id OAA11099 for ; Thu, 30 Mar 2000 14:05:48 -0500 Message-Id: <200003301905.OAA11099@lycanthrope.crosslink.net> X-Really-To: Date: Thu, 30 Mar 2000 13:56:36 -0500 To: http-delta@pa.dec.com In-Reply-To: <200003301822.KAA31734@wera.pa.dec.com> Subject: Re: adding an IM content coding X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Jeff said >I'm toying with the idea of collapsing the two response status codes from >the existing draft ("Delta" and "Range of Delta") into one code; e.g., >226 (Instance Manipulation Applied). This would require that a >delta-aware proxy (one that *is* willing to cache a Delta response) is >also at least "aware" of Range, although it would not actually need to >implement Range. So the "new" 226 would simply mean that the recipient >needs to look for any of a set of instance-manipulation-related headers >in the response, including both "IM" and "Content-Range". I have not been a strong believer in the need for two codes, so collapsing it to just a 226 doesn't bother me. BTW Reminder: If a Content-range: bytes=whatever appears, then a) if IM: includes "range,vcdiff", the server is saying: "I computed a vcdiff delta, and then I extracted a ranges of this delta b) if IM: includes "vcdiff", the server is saying "I extracted a range from the base and current instance (perhaps after decoding), and then compute a delta of these ranges." ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul Thu Mar 30 14:57:29 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA26837; Thu, 30 Mar 2000 14:57:29 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003302257.OAA26837@wera.pa.dec.com> To: http-delta Subject: Another little bug in the delta spec: when to use Vary Date: Thu, 30 Mar 2000 14:57:29 -0800 X-Mts: smtp The other day, I wrote a message containing this example: HTTP/1.1 200 OK Date: Wed, 24 Dec 1997 14:00:00 GMT Etag: "abc" Content-encoding: qparse IM: gzip Vary: A-IM and wrote: Note, however, that in order to prevent a cache from accidentally forwarding this status-200 response to a client that doesn't understand IM, it has to be labelled "Vary: A-IM". (This probably should be done for all of my examples, although in the other cases, the 226 response status code will prevent incorrect treatment of misdirected delta responses.) That's actually sort-of-wrong in two different ways. First of all, the response status code probably ought to be (following my suggestion earlier today) HTTP/1.1 226 Instance Manipulation Applied or perhaps, for brevity (no sense in wasting bytes!) HTTP/1.1 226 IM Used This is better than using Vary to prevent a proxy cache from incorrectly forwarding the response - especially, since Vary does not work for HTTP/1.0 proxies! Second, the part about "this probably should be done for all of my examples" was half right. In the case where the instance manipulation is something like gzip, I don't think you need a Vary header. This is probably also true if the request is a simple Range selection, since the response is self-identifying. But if the instance manipulation is a delta encoding, then the result implicitly depends on the entity tag in the request's If-None-Match header. For example, consider this sequence of events (with some of the mandatory headers ellided for simplicity): (1) Client A sends, via a proxy GET /foo.html HTTP/1.1 If-None-Match: "abc" A-IM: vcdiff (2) and the origin server responds HTTP/1.1 226 IM Used IM: vcdiff Etag: "ghi" which is forwarded to A and also stored by the proxy cache. (3) Client B sends, via the same proxy GET /foo.html HTTP/1.1 If-None-Match: "def" A-IM: vcdiff Can the proxy use its cached response to reply to client B? No, because the delta is computed from the wrong base instance. But there is nothing in the origin server's response (in step #2) that would prevent this error. We could make up elaborate rules on how caching proxies handle 226 responses, but they would effectively end up being equivalent in effect to requiring the origin server to send Vary:If-None-Match in its response at step #2. Which, unfortunately, adds 20 bytes of header to a delta-encoding response, but seems to be the Right Thing to do. Note that it does not appear to be sufficient to require the use of Delta-Base in the origin server's response. This does allow the recipient to check the results (i.e., it makes the responses self-identifying), but it would still require adding special-purpose rules to the proxy implementation. Which, I believe, is not the Right Thing. Although we might add a rule allowing a proxy to ignore the "Vary: If-None-Match" if it is willing to interpret the Delta-Base header, since this could allow a high cache hit ratio when two clients send overlapping but non-identical lists of entity tags in their If-None-Match headers. For some reason, I didn't get this right in the previous I-Ds for delta encoding. Oops. Although maybe it was obvious to anyone who tried to implement origin server support for delta encoding? -Jeff From danielh@crosslink.net Thu Mar 30 19:01:46 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id TAA21888; Thu, 30 Mar 2000 19:01:46 -0800 (PST) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA07565; Thu, 30 Mar 2000 19:01:45 -0800 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id TAA20629 for ; Thu, 30 Mar 2000 19:01:45 -0800 (PST) Received: from smtp.crosslink.net (dyn60.c5200-2.springfield.236.crosslink.net [207.199.142.189]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id WAA32336 for ; Thu, 30 Mar 2000 22:01:38 -0500 Message-Id: <200003310301.WAA32336@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Thu, 30 Mar 2000 21:52:21 -0500 To: http-delta@pa.dec.com In-Reply-To: <200003302257.OAA26837@wera.pa.dec.com> Subject: when to use Vary X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 >We could make up elaborate rules on how caching proxies handle 226 >responses, but they would effectively end up being equivalent in effect >to requiring the origin server to send > Vary:If-None-Match >in its response at step #2. Which, unfortunately, adds 20 >bytes of header to a delta-encoding response, but seems to >be the Right Thing to do. Could an abbreviation be used? Say ... Vary: IFN Non delta-aware proxies would find no match, and properly not use a cached response. Delta-aware proxies would know that IFN means "if-none-match". >For some reason, I didn't get this right in the previous >I-Ds for delta encoding. Oops. Although maybe it was >obvious to anyone who tried to implement origin server >support for delta encoding? Didn't think of it either. Which suggests the need for paragraph or two listing "what origin servers should to to ensure that IM containing responses are properly handled by proxies of various vintages" ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Fri Mar 31 09:56:07 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id JAA09375; Fri, 31 Mar 2000 09:56:07 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA18191; Fri, 31 Mar 2000 09:56:07 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id JAA23331; Fri, 31 Mar 2000 09:56:07 -0800 (PST) From: Jeffrey Mogul Message-Id: <200003311756.JAA23331@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: when to use Vary In-Reply-To: Your message of "Thu, 30 Mar 2000 21:52:21 EST." <200003310301.WAA32336@lycanthrope.crosslink.net> Date: Fri, 31 Mar 2000 09:56:07 -0800 X-Mts: smtp writes: >We could make up elaborate rules on how caching proxies handle 226 >responses, but they would effectively end up being equivalent in effect >to requiring the origin server to send > Vary:If-None-Match >in its response at step #2. Which, unfortunately, adds 20 >bytes of header to a delta-encoding response, but seems to >be the Right Thing to do. Could an abbreviation be used? Say ... Vary: IFN Non delta-aware proxies would find no match, and properly not use a cached response. Delta-aware proxies would know that IFN means "if-none-match". No, this is not how Vary works. Vary says "If the request contained the named header, then if you cache this response, you cannot use it to answer a subsequent request unless the named header and its value matches exactly the header & value for the current request." (Sorry, that's probaly not the most elegant way to put it). And we can't use a different request header than If-None-Match, since we want a non-delta-capable server to do a traditional conditional request. So it has to be "Vary: If-None-Match", period. Or we need to insist that a proxy that caches a 226 response needs to be fully aware of the matching rules, which might be a possible option, but it's more complex to specify. And we have to make the decision now, during the protocol design phase, not after anything has been deployed. -Jeff From mogul@pa.dec.com Wed Apr 5 16:45:02 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA21895; Wed, 5 Apr 2000 16:45:02 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA29356; Wed, 5 Apr 2000 16:45:02 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA23630; Wed, 5 Apr 2000 16:45:02 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200004052345.QAA23630@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Still wrestling with: when/whether to use Vary Date: Wed, 05 Apr 2000 16:45:02 -0700 X-Mts: smtp I am almost done with a re-write of the Delta specification. The one significant problem that I am still trying to resolve is how to handle the caching of 226 (IM Used) responses. Last week, I wrote: We could make up elaborate rules on how caching proxies handle 226 responses, but they would effectively end up being equivalent in effect to requiring the origin server to send Vary:If-None-Match in its response at step #2. Which, unfortunately, adds 20 bytes of header to a delta-encoding response, but seems to be the Right Thing to do. Note that it does not appear to be sufficient to require the use of Delta-Base in the origin server's response. This does allow the recipient to check the results (i.e., it makes the responses self-identifying), but it would still require adding special-purpose rules to the proxy implementation. Which, I believe, is not the Right Thing. But I gave this some more thought, and I decided that the right approach was to either let the cache implement the "elaborate rules" (which aren't TOO elaborate), or simply not cache 226 responses. (A cache could store the result of decoding a 226 response, as a 200 or 206 cache entry.) So, here are the rules that I came up with: A status-226 cache entry MUST NOT be used in response to a subsequent request under any of these conditions (a cache that never stores status-226 responses may ignore these tests): 1. If any of the instance-manipulation values from the IM header field in the cached response do not appear in the subsequent request's A-IM header field. The comparison between the headers is done using an exact match on each instance-manipulation value including any associated inparams values (see section 12.1). 2. If the order of instance-manipulation values appearing in the cached IM header field differs from the order of that set of instance-manipulations in the A-IM header field of the subsequent request. 3. If the cache implementation is not aware of the specification of any of the instance-manipulation values in the cached IM header field. 4. If any of the instance-manipulation values in the cached IM header field is a delta-coding, and the cache entry includes a Base-Instance header field, and that Base-Instance entity tag is not one of the entity tags listed in an If-None-Match header field of the subsequent request. 5. If any of the instance-manipulation values in the cached IM header field is a delta-coding, the cache entry does not include a Base-Instance header field, and the If-None-Match header field of the request that led to that cache entry does not match the If-None-Match header field of the subsequent request. Rule #3 is the key rule - it allows us to add new instance-manipulations, including delta-codings, rsync, and yet-to-be-determined technologies, without worrying about whether caches would inappropriately store the results. Rules #1 and #2 allow a cache more flexibility (and hence potentially a higher hit rate) than simply requiring "Vary: A-IM". Rules #4 and #5 allow more flexibility than simply requiring "Vary: If-None-Match". So, for example, if Client A sends GET /foo.html HTTP/1.1 If-None-Match: "abc", "def" A-IM: vcdiff, gdiff and gets the response HTTP/1.1 226 IM Used Base-Instance: "def" ETag: "pqr" IM: vcdiff which is then cached, and the Client B sends GET /foo.html HTTP/1.1 If-None-Match: "abc", "ghi" A-IM: diffe, vcdiff the cache entry is usable as a reply. If we were to require the use of Vary, e.g.: HTTP/1.1 226 IM Used Base-Instance: "def" ETag: "pqr" IM: vcdiff Vary:A-IM,If-None-Match then the cached response to A's request would not be useful for B's request, and it would be 25 bytes longer, too. Also, I would expect that in many cases the 226 response would not be cached in any case, which makes the Vary headers superfluous overhead. However, this part is tricky enough that I would really appreciate it if other people could review it carefully (either now or in a day or so, when I have a full draft ready to go). -Jeff From danielh@crosslink.net Wed Apr 5 19:03:11 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id TAA07130; Wed, 5 Apr 2000 19:03:11 -0700 (PDT) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA15174; Wed, 5 Apr 2000 19:03:10 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id TAA11480 for ; Wed, 5 Apr 2000 19:03:09 -0700 (PDT) Received: from smtp.crosslink.net (dyn48.c5200-1.springfield.236.crosslink.net [207.199.142.49]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id WAA05290 for ; Wed, 5 Apr 2000 22:03:05 -0400 Message-Id: <200004060203.WAA05290@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Wed, 05 Apr 2000 21:38:21 -0300 To: http-delta@pa.dec.com In-Reply-To: <200004052345.QAA23630@wera.pa.dec.com> Subject: Re: Still wrestling with: when/whether to use Vary X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Jeffery wrote: >Last week, I wrote: > We could make up elaborate rules on how caching proxies handle > 226 responses, but they would effectively end up being equivalent >...... >was to either let the cache implement the "elaborate rules" (which aren't >TOO elaborate), or simply not cache 226 responses. (A cache could store >the result of decoding a 226 response, as a 200 or 206 cache entry.) >So, here are the rules that I came up with: > A status-226 cache entry MUST NOT be used in response to a subsequent > request under any of these conditions (a cache that never stores > status-226 responses may ignore these tests): > 1. If any of the instance-manipulation values from the IM > header field in the cached response do not appear in the > subsequent request's A-IM header field. The comparison > between the headers is done using an exact match on each > instance-manipulation value including any associated > inparams values (see section 12.1). Assuming that "inparams values" are things like "mode=1" in "foodiff;mode=1" > 2. If the order of instance-manipulation values appearing in > the cached IM header field differs from the order of that > set of instance-manipulations in the A-IM header field of > the subsequent request. Interesting -- you are carrying forward the position of the earlier draft that "the server must adhere to the ordering of encodings supplied by the client". That is, if the client provides (let's assume that there are no content-encodings). A-IM: gzip,gdiff then if the server wants to use both encodings, it MUST first gzip, and then gdiff (even though that may yield a much large response then gdiff first, followed by gzip). I never much liked that requirement, so I want to be sure that you really think it should be maintained (I'm willing to cede to your judgement, so long at it is a "considered judgement") > 3. If the cache implementation is not aware of the > specification of any of the instance-manipulation values > in the cached IM header field. Good point. For example: if "rsync" were included as an A-IM, it might be used along with a new header to (say, Rsync-Signature) pass crucial information (that is, instead of in If-None-Match. By requiring the proxy to know such intricacies of rsync as an IM coding, the IM-aware but rsync-unaware proxy (that knows nothing about checking "Rsync-signature") will never inappropriately return the cached response. > 4. If any of the instance-manipulation values in the cached > IM header field is a delta-coding, and the cache entry > includes a Base-Instance header field, and that > Base-Instance entity tag is not one of the entity tags > listed in an If-None-Match header field of the subsequent > request. What if there is only one etag in the original If-None-Match, and no Base-Instance was returned (say, since the server figures it wasn't necessary)? The lack of Base-Instance etag should not disallow caching. Your rule 5 covers some of these cases, but not all of them. Hmm, actually, you could amend the above and say "If the response contained no Base-Instance header, but the request contained only one etag in it's If-None-Match header, then the server may implicitily add (for internal use only) a Base-Instance header containing this single etag's value. Which would take care of this special case. > 5. If any of the instance-manipulation values in the cached > IM header field is a delta-coding, the cache entry does > not include a Base-Instance header field, and the > If-None-Match header field of the request that led to that > cache entry does not match the If-None-Match header field > of the subsequent request. Is this strictly necessary, given my above "addendum". I mean, will there EVER be a case where the server does NOT include a Base-Instance header when the request contained more then 1 etag in If-None-Match? >Rule #3 is the key rule - it allows us to add new instance-manipulations, >including delta-codings, rsync, and yet-to-be-determined technologies, >without worrying about whether caches would inappropriately store the >results. Yes. But it might be necessary to add: These caching rules apply to the use of delta encoding, and simpler compressions (such as Gzip and deflat) as an Instance Manipulation. Future Instance Manipulations (such as rsync) may require their own additonal rules (such as the need to check the rsync-signature header). >Rules #1 and #2 allow a cache more flexibility (and hence potentially a >higher hit rate) than simply requiring "Vary: A-IM". Rules #4 and #5 >allow more flexibility than simply requiring "Vary: If-None-Match". Good point. >However, this part is tricky enough that I would really >appreciate it if other people could review it carefully >(either now or in a day or so, when I have a full draft >ready to go). One last issue, and it's more of a philosophical/political point. I've come to see that the notion of "instance manipuation" is useful and proper. But, I'm just one lonely (and rather untested) voice. I wonder what longer term players in the http spec endeavour would say. After all, it is a significant addition to how a request should be handled, and there might get folks nervous (or worse, treat it as not worth the risk of muddying the waters). ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Thu Apr 6 14:53:31 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA30617; Thu, 6 Apr 2000 14:53:31 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA27491; Thu, 6 Apr 2000 14:53:31 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA29119; Thu, 6 Apr 2000 14:53:31 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200004062153.OAA29119@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: Still wrestling with: when/whether to use Vary In-Reply-To: Your message of "Wed, 05 Apr 2000 21:38:21 -0300." <200004060203.WAA05290@lycanthrope.crosslink.net> Date: Thu, 06 Apr 2000 14:53:31 -0700 X-Mts: smtp writes: > 2. If the order of instance-manipulation values appearing in > the cached IM header field differs from the order of that > set of instance-manipulations in the A-IM header field of > the subsequent request. Interesting -- you are carrying forward the position of the earlier draft that "the server must adhere to the ordering of encodings supplied by the client". That is, if the client provides (let's assume that there are no content-encodings). A-IM: gzip,gdiff then if the server wants to use both encodings, it MUST first gzip, and then gdiff (even though that may yield a much large response then gdiff first, followed by gzip). Well, I would say that in this case, even if the server "wants" to use both encodings, it needs to make a reasonable choice between using just one (which avoids the ordering issue) or using both, but in a non-optimal order. My guess is that using just one is the right thing to do here. The problem is that if we don't give the client a means to insist on an ordering, then the client cannot control the semantics of the resulting delta, and so what the server "wants" to do here could be useless to the client. I never much liked that requirement, so I want to be sure that you really think it should be maintained (I'm willing to cede to your judgement, so long at it is a "considered judgement") I suppose one option would be to provide a way for the client to express "ordering doesn't matter to me" - except that I think in most cases, it actually does matter (and I haven't come up with a non-kludgey way of expressing this). So I'm inclined to treat this as a "considered judgement". > 4. If any of the instance-manipulation values in the cached > IM header field is a delta-coding, and the cache entry > includes a Base-Instance header field, and that > Base-Instance entity tag is not one of the entity tags > listed in an If-None-Match header field of the subsequent > request. > 5. If any of the instance-manipulation values in the cached > IM header field is a delta-coding, the cache entry does > not include a Base-Instance header field, and the > If-None-Match header field of the request that led to that > cache entry does not match the If-None-Match header field > of the subsequent request. What if there is only one etag in the original If-None-Match, and no Base-Instance was returned (say, since the server figures it wasn't necessary)? The lack of Base-Instance etag should not disallow caching. Your rule 5 covers some of these cases, but not all of them. Which cases aren't covered by rules #4 and #5? Your "what if" example seems to be: Original client sends: GET /foo.html HTTP/1.1 If-None-Match: "abc" A-IM:vcdiff Server responds: HTTP/1.1 226 IM Used Etag: "pqr" IM:vcdiff and then the new request (the one that *might* be a cache hit) is New client sends: GET /foo.html HTTP/1.1 If-None-Match: "abc" A-IM:vcdiff is allowed as a cache hit by rule 4 (since there is no Delta-Base [which is the correct header name, I screwed up when I wrote these rules] in the cache entry), and also by rule 5 (since the If-None-Match headers match). Remember, rules #1-5 are the conditions where cache hits are NOT allowed, so any situation that does NOT match any of these rules is OK as far as caching goes. Hmm, actually, you could amend the above [rule #4] and say "If the response contained no Base-Instance header, but the request contained only one etag in it's If-None-Match header, then the server may implicitily add (for internal use only) a Base-Instance header containing this single etag's value. Which would take care of this special case. "Server"? or do you mean "cache"? Anyway, that's not necessary, since rule #5 already allows a cache hit in this case. Is this [rule #5] strictly necessary, given my above "addendum". I mean, will there EVER be a case where the server does NOT include a Base-Instance header when the request contained more then 1 etag in If-None-Match? No (well, not if I had remembered to call it "Delta-Base" instead of Base-Instance!). Delta-Base MUST be included if there is more than one entity-tag in the If-None-Match (this has always been in the Delta spec). >Rule #3 is the key rule - it allows us to add new instance-manipulations, >including delta-codings, rsync, and yet-to-be-determined technologies, >without worrying about whether caches would inappropriately store the >results. Yes. But it might be necessary to add: These caching rules apply to the use of delta encoding, and simpler compressions (such as Gzip and deflat) as an Instance Manipulation. Future Instance Manipulations (such as rsync) may require their own additonal rules (such as the need to check the rsync-signature header). I'll add something like that. -Jeff From mogul Thu Apr 6 16:04:00 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA11687; Thu, 6 Apr 2000 16:04:00 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200004062304.QAA11687@wera.pa.dec.com> To: http-delta Subject: New Delta-encoding draft for your review Date: Thu, 06 Apr 2000 16:04:00 -0700 X-Mts: smtp This is NOT yet ready for submission to the IETF, but I wanted to give the members of the http-delta list a chance to review this: ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-delta-04.6april2000.txt It's actually longer than the previous (-03) draft, partly because I added some of the explanations, clarifications, and examples that seemed to be helpful on the mailing list discussion. And partly because there are some newish concepts. Some stuff got removed, however. -Jeff From danielh@crosslink.net Thu Apr 6 19:54:51 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id TAA11570; Thu, 6 Apr 2000 19:54:50 -0700 (PDT) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA32054; Thu, 6 Apr 2000 19:54:49 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id TAA07746 for ; Thu, 6 Apr 2000 19:54:49 -0700 (PDT) Received: from smtp.crosslink.net (dyn36.c5200-3.springfield.236.crosslink.net [207.199.145.101]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id WAA21199 for ; Thu, 6 Apr 2000 22:54:47 -0400 Message-Id: <200004070254.WAA21199@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Thu, 06 Apr 2000 22:52:48 -0400 To: http-delta@pa.dec.com Subject: comments on draft 4 X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 I'm impressed -- it's quite coherent for a first revision! I do have the following comments. Comments: On page 6 there is One can think of an instance as a snapshot in the life of a resource. I think might be confusing -- it confused me in the earlier draft. In particular, when I read "snapshot", I think of "pre content-encoding" -- it's the acutal contents that the client will see (or hear, or execute), irrespective of content-encoding that may have been applied for transmission/parsing/whatever efficiencies. Somehow, we need wording that emphasizes that this "life moment" is after content encoding (after all, content encoding can be applied on the fly). If this is too cumbersome to say (and I can't think of a good way to say it), we'ld be better of dropping this otherwise pithy-in-a-good-way statement. Page 14: - It already has a cached response for that resource, whose entity tag is ``123xyz''. To remind the reader that "instance" is the relevant concept, how about - It already has a cached response for that resource (that is, a cached instnce), whose entity tag is ``123xyz''. Page 16: transmission of unnecessary bytes, and this Reason-phase should not should that be "reason-phrase"?? Page 21 This response tells the client to apply the delta to the cached response with entity tag ``337pey'', and to associate the entity tag ``1acl059'' with the result. It's might be a bit redundant, but I'ld add. This response tells the client to apply the delta to the cached response with entity tag ``337pey'', and to associate the entity tag ``1acl059'' with the result. Note that ``1acl059'' refers to the result of applying the delta to the cached response, it does NOT refer to the delta itself. That is, ``1acl059'' refers to the actual instance the server would have sent if delta-encoding was not attempted. Page 24 Once a cluster-eligible response is cached, when the client is about to make a subsequent request, it would match the request-URI against all of the URL-prefixes in its cache. The ``If-None-Match'' field in its request could then list the entity tags for all of the matching entries. In some cases, it might be more efficient to list only a subset (such as the most recently received cache entries), to avoid excessive request header lengths. If I correctly recollect earlier discussion of this point, then the above doesn't read broadly enough. How about... Once cluster-eligible responses are cached, when the client is about to make a subsequent request, it would: a) match the request-URI against all of the URL-prefixes in its cache. b) the client would then find all cache entires that started with one of these URL-prefixes; and only use cached entries recieved AFTER the URL-prefix was identified (that is, after the response containing the DCluster that identifies the URL-prefix). The ``If-None-Match'' field in ...... page 36; 4. If any of the instance-manipulation values in the cached IM header field is a delta-coding, and the cache entry includes a Delta-Base header field, and that Delta-Base entity tag is not one of the entity tags listed in an If-None-Match header field of the subsequent request. I would advocate modifying 4, to say 4. If any of the instance-manipulation values in the cached IM header field is a delta-coding, and the cache entry includes a Delta-Base header field, and that Delta-Base entity tag is not one of the entity tags listed in an If-None-Match header field of the subsequent request. In some cases, a cache may implicitily define a Delta-Base header when the server neglects to add one. In pearticular, when the server sends a delta response to a request that specified only one etag in an If-None-Match request header. page 38 1. If both the new (delta) response and the cached response have exactly the same set of content-codings, the client applies the delta response to the cached response without removing the content-codings from either response. Might it be better to say "instance" instead of "response" 2. If the new (delta) response and the cached response have a different set of content-codings, the client decodes the content-codings from both the delta response and the cached response, before applying the delta. I'ld add 2. If the new (delta) response and the cached response have a different set of content-codings, the client decodes the content-codings from both the delta response and the cached response, before applying the delta. This implies that the server created the delta by first content-decoding the current and base instance, and the applying the delta. Page 39 The body of this response would be the result of VCDIFF_DELTA(GUNZIP(GZIP(foo.html;"abc")), foo.html;"ghi"), or more simply VCDIFF_DELTA(foo.html;"abc", foo.html;"ghi"). The client would store as a new cache entry the entity foo.html;"ghi" (i.e., without any content-coding), after recovering that entity by applying the delta to its previous cache entry. I'ld add The body of this response would be the result of VCDIFF_DELTA(GUNZIP(GZIP(foo.html;"abc")), foo.html;"ghi"), or more simply VCDIFF_DELTA(foo.html;"abc", foo.html;"ghi"). The client would store as a new cache entry the entity foo.html;"ghi" (i.e., without any content-coding), after recovering that entity by applying the delta to the an GZIP'ed version of its previous cache entry. pg 41 Note that a client might accept compression either as a content-coding or as an instance-manipulation. For example: Accept-Encoding: gzip A-IM: gzip, diffe Since diffe doesn't work on binary files, I'ld change diffe to gdiff. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul Fri Apr 14 13:06:28 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA07652; Fri, 14 Apr 2000 13:06:28 -0700 (PDT) Message-Id: <200004142006.NAA07652@wera.pa.dec.com> To: http-delta From: X-Originally-To: Reply-To: danielh@crosslink.net X-Original-Date: Thu, 06 Apr 2000 20:48:24 -0400 In-Reply-To: <200004062153.OAA29119@wera.pa.dec.com> Subject: Re: Still wrestling with: when/whether to use Vary X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Date: Fri, 14 Apr 2000 13:06:28 -0700 Sender: mogul X-Mts: smtp [Note from Jeff: I'm remailing this to the list, because it appears that Daniel meant to send it to everyone. Sorry for the delay, I've been swamped this week.] Daniel said >> ....then if the server wants to use both encodings, it MUST first gzip, >> and then gdiff (even though that may yield a much large response >> then gdiff first, followed by gzip). Jeffery responded: >Well, I would say that in this case, even if the server "wants" to use >both encodings, it needs to make a reasonable choice between using just >one (which avoids the ordering issue) or using both, but in a non-optimal >order. My guess is that using just one is the right thing to do here. That's what I did before (just use one). >....I suppose one option would be to provide a way for the client to express >"ordering doesn't matter to me" - except that I think in most cases, it >actually does matter (and I haven't come up with a non-kludgey way of >expressing this). So I'm inclined to treat this as a "considered >judgement". Good enough. Sensible clients should be aware of these issues anyways. > > > 4. If any of the instance-manipulation values in the cached > > IM header field is a delta-coding, and the cache entry > > includes a Base-Instance header field, and that > > Base-Instance entity tag is not one of the entity tags > > listed in an If-None-Match header field of the subsequent > > request. > > > 5. If any of the instance-manipulation values in the cached > > IM header field is a delta-coding, the cache entry does > > not include a Base-Instance header field, and the > > If-None-Match header field of the request that led to that > > cache entry does not match the If-None-Match header field > > of the subsequent request. > >> What if there is only one etag in the original If-None-Match, and >> no Base-Instance was returned (say, since the server figures it >> wasn't necessary)? The lack of Base-Instance etag should not >> disallow caching. Your rule 5 covers some of these cases, but not >> all of them. >Which cases aren't covered by rules #4 and #5? Your "what if" example >seems to be: > Original client sends: > GET /foo.html HTTP/1.1 > If-None-Match: "abc" > A-IM:vcdiff > Server responds: > HTTP/1.1 226 IM Used > Etag: "pqr" > IM:vcdiff >and then the new request (the one that *might* be a cache hit) is > New client sends: > GET /foo.html HTTP/1.1 > If-None-Match: "abc" > A-IM:vcdiff >is allowed as a cache hit by rule 4 (since there is no Delta-Base [which >is the correct header name, I screwed up when I wrote these rules] in the >cache entry), and also by rule 5 (since the If-None-Match headers match). >Remember, rules #1-5 are the conditions where cache hits are NOT allowed, >so any situation that does NOT match any of these rules is OK as far as >caching goes. At a later date, a client sends: GET /foo.html HTTP/1.1 If-None-Match: "abc","cba" A-IM:vcdiff Here, the If-None-Headers do NOT match, but the cache should response (since the implicit Delta-Base is "abc"). Which would be solved by ... >> Hmm, actually, you could amend the above [rule #4] and say >> "If the response contained no Base-Instance header, but the >> request contained only one etag in it's If-None-Match header, >> then the server may implicitily add (for internal use only) a >> Base-Instance header containing this single etag's value. >> >> Which would take care of this special case. >"Server"? or do you mean "cache"? Oops, you are right. I meant "cache" (or "proxy" server) - ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org - ----------------------------------------------------------- ------- End of Forwarded Message From mogul Fri Apr 14 13:14:07 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA16910; Fri, 14 Apr 2000 13:14:06 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200004142014.NAA16910@wera.pa.dec.com> To: http-delta Subject: Forwarded from Koen Holtman: comments on new delta draft Date: Fri, 14 Apr 2000 13:14:06 -0700 X-Mts: smtp ------- Forwarded Message Return-Path: koen@win.tue.nl From: koen@win.tue.nl (Koen Holtman) Message-Id: <200004121825.UAA09007@wsooti09.win.tue.nl> Subject: comments on new delta draft To: mogul@pa.dec.com (Jeffrey Mogul) Date: Wed, 12 Apr 2000 20:25:41 +0200 (MET DST) Hi Jeff, I have read about half of the new delta encoding draft (intermediate version dated 6 april) now, and want to give some advance comments. Please forward these to the appropriate discussion list. Overall, what I have seen of the design looks sound. To my taste it is quite heavy for an optimisation mechanism, I would tend towards simplifying the thing by not handling all range scenarios, but that is my personal feeling so you don't need to pay any attention to that. Section 3 looks like a sound formalisation of response production in HTTP/1.1 to me, I don't expect that these will be any interoperability problems if you go with this model. Your model implies that a 'variant' is a thing that does not have a content encoding but may have one applied to it later on -- I think it is closer to the original intention of HTTP/1.1 if you describe a `variant' as something that already has a content-encoding or not -- i.e. the content-encoding choice gets made during variant selection. But in any case the exact meaning of the term 'variant' has no impact on what your specification requires on the wire, as far as I can see, so this is not a critical thing to get right. The delta mechanism allows a new response to be generated by merging two cached responses (or one cached and one current). In that case the question arises what age and which cache-control header the new response should have. You should formalise this somewhere, e.g. specify that the age should be the max() of the two ages, and the cache control headers are those of the oldest(?) response. Finding the best rules may require some thought. All this may of course be in the part of the draft I have not read yet. I am unhappy with the security considerations section. I think it is essential that the draft _requires_ all parties to implement a watertight protection against the spoofing attacks you describe. The current language makes security optional. I have been thinking about how to prevent spoofing. The cryptographic checksum method you describe protects against some things, though I have not yet closely studied it for holes. It seems to me that it does not protect against the following attack: 1) victim.org has copyrighted content at victim.org/x.html 2) an end user does a request on attacker.org/y.html. 3) attacker.org wants the user to see some of the copyrighted content at victim.org/x.html as part of its own web page attacker.org/y.html, without the attacker.org site ever literally sending this copyrighted content from victim.org in a HTTP response it generates. This might make attacker.org immune to prosecution for copyright violation by victim.org, while attacker.org still has the benefit of putting its brand and advertising on material created by victim.org. 4) it looks to me that 3) can be achieved by attacker.org sending a delta response with a Dcluster header that includes victim.org/x.html, and the etag of the 'current' victim.org/x.html response, together maybe with the right cryptographic checksum. The browser (or intermediate proxy), if it has cached victim.org/x.html earlier of course, will than take that copyrighted material, apply the delta, and display it to the user. In any case, I think it is easy to have spoofing protection against many attacks, including the one above, using the following rule: - - if a cache has a delta response from URL X and wants to apply this to a response from URL Y, then this MUST only be done if a) X and Y are octet-wise identical or b) X has a Dcluster or Dtemplate header that points to or includes Y _AND_ Y has a Dcluster or Dtemplate header that points to or includes X So in case b), material can only be merged if both sources explicitely recognise the existence of the other as a 'partner'. That is all I have for now. Koen. ------- End of Forwarded Message From mogul@pa.dec.com Fri Apr 14 14:00:27 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA20036; Fri, 14 Apr 2000 14:00:27 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA25494; Fri, 14 Apr 2000 14:00:27 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA18249; Fri, 14 Apr 2000 14:00:26 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200004142100.OAA18249@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: Still wrestling with: when/whether to use Vary In-Reply-To: Your message of "Fri, 14 Apr 2000 13:06:28 PDT." <200004142006.NAA07652@wera.pa.dec.com> Date: Fri, 14 Apr 2000 14:00:26 -0700 X-Mts: smtp Regarding this: > > > 4. If any of the instance-manipulation values in the cached > > IM header field is a delta-coding, and the cache entry > > includes a Base-Instance header field, and that > > Base-Instance entity tag is not one of the entity tags > > listed in an If-None-Match header field of the subsequent > > request. > > > 5. If any of the instance-manipulation values in the cached > > IM header field is a delta-coding, the cache entry does > > not include a Base-Instance header field, and the > > If-None-Match header field of the request that led to that > > cache entry does not match the If-None-Match header field > > of the subsequent request. > >> What if there is only one etag in the original If-None-Match, and >> no Base-Instance was returned (say, since the server figures it >> wasn't necessary)? The lack of Base-Instance etag should not >> disallow caching. Your rule 5 covers some of these cases, but not >> all of them. >Which cases aren't covered by rules #4 and #5? Your "what if" example >seems to be: > Original client sends: > GET /foo.html HTTP/1.1 > If-None-Match: "abc" > A-IM:vcdiff > Server responds: > HTTP/1.1 226 IM Used > Etag: "pqr" > IM:vcdiff >and then the new request (the one that *might* be a cache hit) is > New client sends: > GET /foo.html HTTP/1.1 > If-None-Match: "abc" > A-IM:vcdiff >is allowed as a cache hit by rule 4 (since there is no Delta-Base [which >is the correct header name, I screwed up when I wrote these rules] in the >cache entry), and also by rule 5 (since the If-None-Match headers match). >Remember, rules #1-5 are the conditions where cache hits are NOT allowed, >so any situation that does NOT match any of these rules is OK as far as >caching goes. writes: At a later date, a client sends: GET /foo.html HTTP/1.1 If-None-Match: "abc","cba" A-IM:vcdiff Here, the If-None-Headers do NOT match, but the cache should response (since the implicit Delta-Base is "abc"). Which would be solved by ... Ah, good point. This is an optimization that might be useful in some cases. I see several fairly straightforward ways to solve this: (1) say that servers "SHOULD" send Delta-Base (instead of "MAY") for responses where the base instance is unambiguously implicit in the request headers. (2) say that proxies "MAY" add a Delta-Base header, using the implied base-instance, to responses that they store or forward. I've been avoiding #1 since the first draft of the document, since this adds on-the-wire overhead. I'm leaning towards #2, since I can't think of any reason why a proxy that complies with the spec shouldn't do this. In either case, the example you offered becomes covered by my rule #4, without modifications. -Jeff From mogul@pa.dec.com Fri Apr 14 14:31:15 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA22327; Fri, 14 Apr 2000 14:31:15 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA21450; Fri, 14 Apr 2000 14:31:14 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA23628; Fri, 14 Apr 2000 14:31:14 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200004142131.OAA23628@wera.pa.dec.com> To: http-delta@pa.dec.com Cc: koen@win.tue.nl (Koen Holtman) Subject: Re: Forwarded from Koen Holtman: comments on new delta draft In-Reply-To: Your message of "Fri, 14 Apr 2000 13:14:06 PDT." <200004142014.NAA16910@wera.pa.dec.com> Date: Fri, 14 Apr 2000 14:31:14 -0700 X-Mts: smtp [I'm going to address Koen's security-related comments in a separate message.] Overall, what I have seen of the design looks sound. To my taste it is quite heavy for an optimisation mechanism, I would tend towards simplifying the thing by not handling all range scenarios, but that is my personal feeling so you don't need to pay any attention to that. We've already run into a few design bugs that resulted from not considering a wide enough range of scenarios. I'd rather take the time now to cover all of the possibilities, rather than have to fix it again (or to discover, after systems are deployed, that we did something wrong that can't be fixed.) Section 3 looks like a sound formalisation of response production in HTTP/1.1 to me, I don't expect that these will be any interoperability problems if you go with this model. Your model implies that a 'variant' is a thing that does not have a content encoding but may have one applied to it later on -- I think it is closer to the original intention of HTTP/1.1 if you describe a `variant' as something that already has a content-encoding or not -- i.e. the content-encoding choice gets made during variant selection. But in any case the exact meaning of the term 'variant' has no impact on what your specification requires on the wire, as far as I can see, so this is not a critical thing to get right. I'm not too concerned about sticking to the original intention of HTTP/1.1, because I think a lot of that was (and still is) somewhat muddled. The "content-coding" dimension of variant representations (as discussed in section 12 of RFC2616) never seemed to me to belong with the other dimensions (language, character set, format), since all of the currently-defined content-codings are loss-free and automatically invertible, whereas there is (as of the foreseeable future) no automatic way for a client to convert from English to Danish or vice versa. It might not be necessary to specify that content-coding happens after variant selection (although compression codings surely are applied after the *generation* of a text in a particular natural language or character set, a selection between several pre-compressed variant files could conceivably be done without ordering these two steps). So maybe my sequence is slightly overdetermined here. But I think it would just complicate things to get all of these nuances into the document. The delta mechanism allows a new response to be generated by merging two cached responses (or one cached and one current). In that case the question arises what age and which cache-control header the new response should have. You should formalise this somewhere, e.g. specify that the age should be the max() of the two ages, and the cache control headers are those of the oldest(?) response. Finding the best rules may require some thought. All this may of course be in the part of the draft I have not read yet. Good point. Actually, I think section 13.5.3 (Combining Headers) of RFC2616 already covers this, and (after a quick read) I don't think the Delta I-D needs to do anything except to refer the reader to that section of RFC2616. But I'll give it a closer look. -Jeff From danielh@crosslink.net Fri Apr 14 14:46:23 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA20104; Fri, 14 Apr 2000 14:46:23 -0700 (PDT) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA23226; Fri, 14 Apr 2000 14:46:23 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id OAA23809 for ; Fri, 14 Apr 2000 14:46:22 -0700 (PDT) Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id RAA09797 for ; Fri, 14 Apr 2000 17:46:21 -0400 Message-Id: <200004142146.RAA09797@lycanthrope.crosslink.net> X-Really-To: Date: Fri, 14 Apr 2000 17:45:25 -0300 To: http-delta@pa.dec.com In-Reply-To: <200004142014.NAA16910@wera.pa.dec.com> Subject: Re: Forwarded from Koen Holtman: comments on new delta draft X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Koen said: >The delta mechanism allows a new response to be generated by merging two >cached responses (or one cached and one current). In that case the >question arises what age and which cache-control header the new response >should have. You should formalise this somewhere, e.g. specify that the >age should be the max() of the two ages, and the cache control headers >are those of the oldest(?) response. Finding the best rules may require >some thought. All this may of course be in the part of the draft I have >not read yet. Consider that a non-delta aware client would recieve the current response. This suggest that the cache-control should be that of the current response. Similarly, the age should also be that of the current response. Basically, there is no reason that a stale (or otherwise unfresh) cached-instance can not be used as a delta-base -- given that etag's are unique across time (within a cluster). So why worry about disposition (so long as the proxy or cache know that use of stale responses in delta requests does NOT change any of the usual, conditional GET, restrictions) >I am unhappy with the security considerations section. I think it is >essential that the draft _requires_ all parties to implement a watertight >protection against the spoofing attacks you describe. The current >language makes security optional. >I have been thinking about how to prevent spoofing. The cryptographic >checksum method you describe protects against some things, though I have >not yet closely studied it for holes. It seems to me that it does not >protect against the following attack: >1) victim.org has copyrighted content at victim.org/x.html >2) an end user does a request on attacker.org/y.html. >3) attacker.org wants the user to see some of the copyrighted content at >victim.org/x.html as part of its own web page attacker.org/y.html, >without the attacker.org site ever literally sending this copyrighted >content from victim.org in a HTTP response it generates. This might make >attacker.org immune to prosecution for copyright violation by victim.org, >while attacker.org still has the benefit of putting its brand and >advertising on material created by victim.org. >4) it looks to me that 3) can be achieved by attacker.org sending a delta >response with a Dcluster header that includes victim.org/x.html, and the >etag of the 'current' victim.org/x.html response, together maybe with the >right cryptographic checksum. The browser (or intermediate proxy), if it >has cached victim.org/x.html earlier of course, will than take that >copyrighted material, apply the delta, and display it to the user. I'm not sure that would be a hole. For the "attacker.org/x.html" response to be used as a delta, the client (or proxy) would have had to have asked for a delta -- a Dcluster only defines future options). What would be a problem is: a) client gets victim.org/x.html, with etag "abc" b) client gets attacker.org/y.html, with etag "bad1" attacker.org returns a dcluster that includes victim.org/x.html c) client ask for victim.org/x.html again, and sends if-none-match that includes "bad1" (since these two urls are now defined to be in the same cluster) d) if "bad1" happens to be an earlier etag of victim.org/x.html, then it would apply a delta against it's version of "bad1". The client would then un-delta, using the attacker.org's version of "bad1". Which could cause victim.org's contents to be intermingled with attacker.org's contents. A question: I can't recollect what the draft says, but it's my belief that : a) if a response to URL1 defines a cluster that includes URL2, b) a later request to URL2 occurs (say, with etag of "pyx" c) then a subsequent re-request for URL1 will NOT include "pyx" That is, a dcluster returned with the latest instance of resourceX defines what other responses this instance can be used as delta base for; but does NOT define what other instances can be used as future bases for resourceX. >In any case, I think it is easy to have spoofing protection against many >attacks, including the one above, using the following rule: >- - if a cache has a delta response from URL X and wants to apply this > to a response from URL Y, then this MUST only be done if > a) X and Y are octet-wise identical > or > b) X has a Dcluster or Dtemplate header that points to or includes Y > _AND_ > Y has a Dcluster or Dtemplate header that points to or includes X Would you insist on rule b when X and Y are to the same server? >So in case b), material can only be merged if both sources explicitely >recognise the existence of the other as a 'partner'. Is this a MUST, or a strong SHOULD? ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From danielh@crosslink.net Fri Apr 14 14:47:14 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA09014; Fri, 14 Apr 2000 14:47:14 -0700 (PDT) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA11775; Fri, 14 Apr 2000 14:47:14 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id OAA20488 for ; Fri, 14 Apr 2000 14:47:13 -0700 (PDT) Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id RAA09993 for ; Fri, 14 Apr 2000 17:47:12 -0400 Message-Id: <200004142147.RAA09993@lycanthrope.crosslink.net> X-Really-To: Date: Fri, 14 Apr 2000 17:45:42 -0300 To: http-delta@pa.dec.com In-Reply-To: <200004142100.OAA18249@wera.pa.dec.com> Subject: Re: Still wrestling with: when/whether to use Vary X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 > At a later date, a client sends: > GET /foo.html HTTP/1.1 > If-None-Match: "abc","cba" > A-IM:vcdiff > > Here, the If-None-Headers do NOT match, but the cache should response > (since the implicit Delta-Base is "abc"). Which would be solved by >... >Ah, good point. This is an optimization that might be useful in some >cases. I see several fairly straightforward ways to solve this: > (1) say that servers "SHOULD" send Delta-Base (instead > of "MAY") for responses where the base instance is > unambiguously implicit in the request headers. > (2) say that proxies "MAY" add a Delta-Base header, > using the implied base-instance, to responses that > they store or forward. >I've been avoiding #1 since the first draft of the document, since this >adds on-the-wire overhead. I'm leaning towards #2, since I can't think >of any reason why a proxy that complies with the spec shouldn't do this. I agree -- #2 should be trivial to implement. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Fri Apr 14 18:04:36 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA30215; Fri, 14 Apr 2000 18:04:35 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA04728; Fri, 14 Apr 2000 18:04:35 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA24689; Fri, 14 Apr 2000 18:04:35 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200004150104.SAA24689@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: Forwarded from Koen Holtman: SECURITY comments on new delta draft In-Reply-To: Your message of "Fri, 14 Apr 2000 13:14:06 PDT." <200004142014.NAA16910@wera.pa.dec.com> Date: Fri, 14 Apr 2000 18:04:35 -0700 X-Mts: smtp Koen Holtman writes: I am unhappy with the security considerations section. I think it is essential that the draft _requires_ all parties to implement a watertight protection against the spoofing attacks you describe. The current language makes security optional. I think we need to differ on this. I think there are probably plenty of circumstances where the risk of successful spoofing is less than the cost of sending the digests. I'm open to making this a SHOULD-level requirement, but I think making it a MUST-level requirement does not meet the IETF's criteria in RFC2119 (which, I admit, are somewhat ambiguous). At any rate, I think we should probably submit for Proposed Standard without stricter requirements, then make sure that we get an expert security review before progressing. I'd rather not assume that this spoofing stuff is the only security issue! I have been thinking about how to prevent spoofing. The cryptographic checksum method you describe protects against some things, though I have not yet closely studied it for holes. It seems to me that it does not protect against the following attack: 1) victim.org has copyrighted content at victim.org/x.html 2) an end user does a request on attacker.org/y.html. 3) attacker.org wants the user to see some of the copyrighted content at victim.org/x.html as part of its own web page attacker.org/y.html, without the attacker.org site ever literally sending this copyrighted content from victim.org in a HTTP response it generates. This might make attacker.org immune to prosecution for copyright violation by victim.org, while attacker.org still has the benefit of putting its brand and advertising on material created by victim.org. I'm not sure about copyright law outside the US, although I believe that it's now generally the same in most industrialized countries. In the US, I'm pretty sure there's a legal concept called "contributory infringement" that would certainly make attacker.org liable (at one point many years ago, I was warned against publishing a document with the FTP address of an RFC repository, since at the time the copyright status of RFCs was hazy, and by helping people download them we could have been guilty of contributory infringement). It's also not clear that victim.org would be liable for copyright infringement; it depends on whether the law requires "specific intent" (meaning: victim.org intended to violate copyright) or "general intent" (meaning: victim.org did some action that led to a violation, without any intention of breaking the rules). If you really care, I can check with our lab's lawyer, who specializes in intellectual property law. In any case, I think it is easy to have spoofing protection against many attacks, including the one above, using the following rule: - - if a cache has a delta response from URL X and wants to apply this to a response from URL Y, then this MUST only be done if a) X and Y are octet-wise identical or b) X has a Dcluster or Dtemplate header that points to or includes Y _AND_ Y has a Dcluster or Dtemplate header that points to or includes X So in case b), material can only be merged if both sources explicitely recognise the existence of the other as a 'partner'. I'm not sure if I understand how, in general, case (b) could arise. Can you give a *plausible* scenario with specific message headers? -Jeff From mogul@pa.dec.com Fri Apr 14 18:06:03 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA24663; Fri, 14 Apr 2000 18:06:03 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA01073; Fri, 14 Apr 2000 18:06:03 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA29602; Fri, 14 Apr 2000 18:06:02 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200004150106.SAA29602@wera.pa.dec.com> To: koen@win.tue.nl (Koen Holtman) Cc: http-delta@pa.dec.com Subject: Re: Forwarded from Koen Holtman: SECURITY comments on new delta draft Date: Fri, 14 Apr 2000 18:06:02 -0700 X-Mts: smtp [Oops, I forgot to send this to Koen; sorry for the duplication!] Koen Holtman writes: I am unhappy with the security considerations section. I think it is essential that the draft _requires_ all parties to implement a watertight protection against the spoofing attacks you describe. The current language makes security optional. I think we need to differ on this. I think there are probably plenty of circumstances where the risk of successful spoofing is less than the cost of sending the digests. I'm open to making this a SHOULD-level requirement, but I think making it a MUST-level requirement does not meet the IETF's criteria in RFC2119 (which, I admit, are somewhat ambiguous). At any rate, I think we should probably submit for Proposed Standard without stricter requirements, then make sure that we get an expert security review before progressing. I'd rather not assume that this spoofing stuff is the only security issue! I have been thinking about how to prevent spoofing. The cryptographic checksum method you describe protects against some things, though I have not yet closely studied it for holes. It seems to me that it does not protect against the following attack: 1) victim.org has copyrighted content at victim.org/x.html 2) an end user does a request on attacker.org/y.html. 3) attacker.org wants the user to see some of the copyrighted content at victim.org/x.html as part of its own web page attacker.org/y.html, without the attacker.org site ever literally sending this copyrighted content from victim.org in a HTTP response it generates. This might make attacker.org immune to prosecution for copyright violation by victim.org, while attacker.org still has the benefit of putting its brand and advertising on material created by victim.org. I'm not sure about copyright law outside the US, although I believe that it's now generally the same in most industrialized countries. In the US, I'm pretty sure there's a legal concept called "contributory infringement" that would certainly make attacker.org liable (at one point many years ago, I was warned against publishing a document with the FTP address of an RFC repository, since at the time the copyright status of RFCs was hazy, and by helping people download them we could have been guilty of contributory infringement). It's also not clear that victim.org would be liable for copyright infringement; it depends on whether the law requires "specific intent" (meaning: victim.org intended to violate copyright) or "general intent" (meaning: victim.org did some action that led to a violation, without any intention of breaking the rules). If you really care, I can check with our lab's lawyer, who specializes in intellectual property law. In any case, I think it is easy to have spoofing protection against many attacks, including the one above, using the following rule: - - if a cache has a delta response from URL X and wants to apply this to a response from URL Y, then this MUST only be done if a) X and Y are octet-wise identical or b) X has a Dcluster or Dtemplate header that points to or includes Y _AND_ Y has a Dcluster or Dtemplate header that points to or includes X So in case b), material can only be merged if both sources explicitely recognise the existence of the other as a 'partner'. I'm not sure if I understand how, in general, case (b) could arise. Can you give a *plausible* scenario with specific message headers? -Jeff From koen@win.tue.nl Sat Apr 15 15:46:10 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA24047; Sat, 15 Apr 2000 15:46:10 -0700 (PDT) Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA08183; Sat, 15 Apr 2000 15:46:09 -0700 Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id PAA00320; Sat, 15 Apr 2000 15:46:07 -0700 (PDT) Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3) id AAA03499. Sun, 16 Apr 2000 00:42:21 +0200 (MET DST) From: koen@win.tue.nl (Koen Holtman) Message-Id: <200004152242.AAA03499@wsooti09.win.tue.nl> Subject: Re: Forwarded from Koen Holtman: SECURITY comments on new delta draft In-Reply-To: <200004150106.SAA29602@wera.pa.dec.com> from Jeffrey Mogul at "Apr 14, 2000 6: 6: 2 pm" To: mogul@pa.dec.com (Jeffrey Mogul) Date: Sun, 16 Apr 2000 00:42:20 +0200 (MET DST) Cc: koen@win.tue.nl, http-delta@pa.dec.com X-Mailer: ELM [version 2.4ME+ PL43 (25)] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit >[Oops, I forgot to send this to Koen; sorry for the duplication!] > >Koen Holtman writes: > > I am unhappy with the security considerations section. I think it > is essential that the draft _requires_ all parties to implement a > watertight protection against the spoofing attacks you describe. > The current language makes security optional. > >I think we need to differ on this. I think there are probably >plenty of circumstances where the risk of successful spoofing >is less than the cost of sending the digests. I'm open to >making this a SHOULD-level requirement, but I think making >it a MUST-level requirement does not meet the IETF's criteria >in RFC2119 (which, I admit, are somewhat ambiguous). I think a MUST on spoofing prevention might be essential if the protocol is to pass an IETF security review. In any case I am unhappy with the security/efficiency tradeoff you are currently making. [...] > > In any case, I think it is easy to have spoofing protection against > many attacks, including the one above, using the following rule: > > - - if a cache has a delta response from URL X and wants to apply this > to a response from URL Y, then this MUST only be done if > > a) X and Y are octet-wise identical > > or > > b) X has a Dcluster or Dtemplate header that points to or includes Y > _AND_ > Y has a Dcluster or Dtemplate header that points to or includes X > > So in case b), material can only be merged if both sources explicitely > recognise the existence of the other as a 'partner'. > >I'm not sure if I understand how, in general, case (b) could arise. >Can you give a *plausible* scenario with specific message headers? Oops, I failed to mention that the above rule would come paired with a new requirement that a delta response from an url X, which refers to a base entity that the browser could have gotten from another URL Y, SHOULD include a Dcluster or Dtemplate header that points to Y. To adapt an example from the draft: 1. first request/response: GET /foo?p=1 HTTP/1.1 HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "abc" DCluster: "//bar.example.net/foo?" 2. second request/response: GET /foo?p=2 HTTP/1.1 Host: bar.example.net If-None-Match: "abc" A-IM: vcdiff HTTP/1.1 226 IM used Etag "def" Base-etag: "abc" IM: vcdiff DCluster: "//bar.example.net/foo?" <---new Note the last header in the second response, this one is new. By sending this header, the resource /foo?p=2 acknowledges that it is willing to use responses from /foo?p=1 as base instances, and that it considers /foo?p=1 to be 'friendly' and in the same uniquenes domain. With this in place the security check b) I wrote above uses the header to make sure that /foo?p=2 is not spoofed by unfriendly servers like xx.attacker.org. To spoof p=2 with this in place, an attacker would have to alter a message from p=1 or p=2 in transit, or compromise the origin server part that is responsible for the resource p=1. My claim is that the above rule and requirement would reduce the spoofing risks to the usual ones of transport integrity and man-in-the-middle attacks. So the delta mechanism would add no new spoofing holes to the existing web, and this is good (and I my opinion essential). > >-Jeff Koen. From bala@research.att.com Sun Apr 16 11:26:26 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA29859; Sun, 16 Apr 2000 11:26:26 -0700 (PDT) Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA20012; Sun, 16 Apr 2000 11:26:26 -0700 Received: from mail-green.research.att.com (H-135-207-30-103.research.att.com [135.207.30.103]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id LAA19019 for ; Sun, 16 Apr 2000 11:26:25 -0700 (PDT) Received: from raptor.research.att.com (raptor.research.att.com [135.207.23.32]) by mail-green.research.att.com (Postfix) with ESMTP id 65D761E016 for ; Sun, 16 Apr 2000 14:25:39 -0400 (EDT) Received: from research.att.com (raptor.research.att.com [135.207.23.32]) by raptor.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id OAA58194 for ; Sun, 16 Apr 2000 14:25:38 -0400 (EDT) Message-Id: <200004161825.OAA58194@raptor.research.att.com> To: http-delta@pa.dec.com Subject: new version Date: Sun, 16 Apr 2000 14:25:38 -0400 From: Balachander Krishnamurthy i found a few minutes to read through it. here are some concerns. some of these could have been expressed earlier but i just did not have the time and am sorry about that. . am not happy with the definition of 'instance' - for one it is circular. instance The entity that would be returned in a status-200 response to a GET request, at the current time, for the selected variant of the specified resource, with the application of zero or more content-codings, but without the application of any instance manipulations or transfer-codings. instance manipulation is only defined later. at this point of defining instance we should leave instance manipulations out. . "One can think of an instance as a snapshot in the life of a resource." if no one heard the tree fall in the forest, then is it an instance? a resource may have multiple existences but zero instance if no one requested it? i.e., a snapshot in the life of a resource only if the snapshot was taken. i would leave this sentence out or correct it since the definition requires an entity to be requested. maybe picky, but i thought we were trying to avoid 2616 problems. . page 10 "This formalization of the HTTP message" well it is not really formalization of THE HTTP message since instances are not discussed in 2616. is it clear that we are talking about our notion of HTTP message? we say just prior to instance definition It is too late to fix the terminological failure in the HTTP/1.1 specification, so we instead define a new term, for use in this document: should the interpretation be that everything is relative to this document and not, say, 2616? my specific concern with "the HTTP message" is in relation to the following sentence in Section 4 This formalization of the HTTP message generation sequence has not previously been described. this would lead readers (it led me) to believe that we are talking about generic HTTP (2616) message generation sequence. of course it could not have been described this way since 'instance' didn't exist. please note that am not complaining about introduction of 'instance'. . section 5.2 However, based on the new ordering constraint proposed in section 12.4.5, new compared to what? and as 12.4.5 comes later, so maybe say However, based on the ordering constraint discussed in section 12.4.5 . section 5.2 We note that if a client indicates it is willing to accept deltas, but the server does not support this form of instance-manipulation, the server will simply ignore this aspect of the request. (HTTP always allows an implementation to ignore a header it does not understand, and the specification of ``A-IM'' allows the server to ignore an instance-manipulation it does not understand.) since the server can ignore the entire header, it does not seem important that it is permitted to ignore an instance-manipulation it does not understand. are we saying that the server can send back a 200 if it doesn't understand the header it doesn't understand an instance-manipulation either of the above two . section 5.4 line 880 A response using delta encoding must be identified as such. This is done using the ``IM'' header, specified in section 12.4.4. change to A response using delta encoding must be identified as such. This is done using the ``IM'' response-header, specified in section 12.4.4. ^^^^^^^^^ . section 5.4 line 886 just a nit Because the Internet is full of HTTP/1.0 caches, which might never be entirely replaced, and because the HTTP specifications change to Because the Internet has a signficant number of HTTP/1.0 caches, which... or 'overwhelmingly large' - not necessarily "full"? . section 5.4 line 902 - this human user noticed the typo transmission of unnecessary bytes, and this Reason-phase should not normally be seen by human users.) change to transmission of unnecessary bytes, and this Reason-phRase should not normally be seen by human users.) . section 5.4 line 906 Existing proxies apparently forward responses with unknown status codes, and do not attempt to cache them. is this a "known thing" kind of statement? do we know in practice if this is true? the only 1.1 proxy contact i had left that company. can someone check with 1.0 proxy folks? i don't mean 'experimental' proxies but products. . section 5.6 line 957 nit We used this example in section 5.2: the client sends: to We used this example in section 5.2: The client sends: . section 6 line 1118 A recent study suggests that ``vdelta'' is the best overall delta algorithm [16]. what is the statute of limitations for 'recency' - study was done in '96 maybe last known study? . section 8, line 1354 http://quote.yahoo.com/q?s=DEC&d=f yields No such ticker symbol. DEC -> CPQ. if we wait long enough maybe T will change to T + AWE . section 8, line 1406 In order to use this approach to clustering, we need to impose one important constraint. HTTP/1.1 requires so-called ``strong'' entity tags to be unique for a given URI, but does not impose any broader uniqueness requirements. what is a 'uniqueness requirement'. sounds colloquial. . section 10, line 1644 - When the proxy receives a request from a non-delta-capable client, it might convert this into a delta request before forwarding it to the server, and then (after applying a resulting delta response to one of its own cache entries) it would return a full-body response to the client. (0) is assumption that non-delta-capable means client can't handle ranges? (1) request may not be forwarded (2) full-body response may not be returned (could be a 304/206 and no/partial body) . section 12.5, line 2081 inparams --> imparams ^ that's all for now cheers, bala From danielh@crosslink.net Sun Apr 16 13:57:05 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA14426; Sun, 16 Apr 2000 13:57:05 -0700 (PDT) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA27794; Sun, 16 Apr 2000 13:57:05 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA25637 for ; Sun, 16 Apr 2000 13:57:04 -0700 (PDT) Received: from smtp.crosslink.net (dyn41.c5200-1.springfield.236.crosslink.net [207.199.142.42]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id QAA24106 for ; Sun, 16 Apr 2000 16:56:58 -0400 Message-Id: <200004162056.QAA24106@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Sun, 16 Apr 2000 16:50:38 -0400 To: http-delta@pa.dec.com In-Reply-To: <200004161825.OAA58194@raptor.research.att.com> Subject: Re: new version X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 > instance The entity that would be returned in a status-200 > response to a GET request, at the current time, for > the selected variant of the specified resource, with > the application of zero or more content-codings, but > without the application of any instance manipulations > or transfer-codings. >instance manipulation is only defined later. at this point of defining >instance we should leave instance manipulations out. I'm not sure that is a problem --tt alerts the reader to an important feature of instances (that they may be subject to "manipulation"). And a definition of Instance Manipulation is only a few paragraphs further down! >>. "One can think of an instance as a snapshot in the life of a resource." > if no one heard the tree fall in the forest, then is it an instance? ... I really like the phrase, but it's too imprecise. Alas, it should be removed. >. section 8, line 1406 > In order to use this approach to clustering, we need to impose one > important constraint. HTTP/1.1 requires so-called ``strong'' entity > tags to be unique for a given URI, but does not impose any broader > uniqueness requirements. > what is a 'uniqueness requirement'. sounds colloquial. (that is, the etag does NOT have to be unique across different URIs). ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From danielh@crosslink.net Tue Apr 18 20:37:33 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id UAA04841; Tue, 18 Apr 2000 20:37:33 -0700 (PDT) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA12896; Tue, 18 Apr 2000 20:37:33 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id UAA05210 for ; Tue, 18 Apr 2000 20:37:32 -0700 (PDT) Received: from smtp.crosslink.net (dyn46.c5200-2.springfield.236.crosslink.net [207.199.142.175]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id XAA16713 for ; Tue, 18 Apr 2000 23:37:29 -0400 Message-Id: <200004190337.XAA16713@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Tue, 18 Apr 2000 23:31:18 -0300 To: http-delta@pa.dec.com Subject: dcluster and spoofing X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 A suggestion regarding dcluster and spoofing: The following is an alternative to Koen's solution, which involves some sort of verification scheme -- a scheme that seems to me to negate much of the benefit of clustering (since at least one non-delta response is required from "victim.org" before clustering can begin). By way of review ... The problem of spoofing with dcluster (and dtemplate) occurs when a) victim.org "happens" to have a base instance identified by an etag of "pey" that lies in the uniqueness scope of victim.org/foo.bar b) "pey" is also used as an etag for a response returned from malicious.org to client x c) a Dcluster in this response (from malicious.org) identified victim.org/foo.bar as being in it's uniqueness scope. When client x then asks for victim.org/foo.bar, victim.org may return a delta response against it's "pey" base-instance, which is not the same base instance as the clients "pey"-from-malicious.org base-instance One way of avoiding this problem is for the client to identify the source of all etags that did not come from a prior request to victim.org/foo.bar. Then, victim.org could choose whether these were legitimate etags (in the sense of having been generated by servers or intra- server content providers that are truely in the uniqueness scope of victim.org/foo.bar) For example, client x could provide: GET /foo.bar http/1.1 host: victim.org A-IM: vcdiff If-None-Match: "def","pey","arf" DCluster: malicious.org="pey", malicious.org/foo.bar="arf", victim2.org="def" Alternatively, so reduce the size a bit: DCluster: malicious.org="pey", /foo.bar="arf", victim2.org="def" (or A-Dcluster: .... ) Where malicious.org identified victim.org/foo.bar in a Dcluster on previous two requests, and victim2.org identified it on one request. Presuming that victim2.org is legit, victim.org would ignore "arf" and "pey", but use "def" (if it is available). This does increase the size of delta requests. However, in legitimate cases this extra header will only be sent to delta aware servers, with the strong expectation that a delta response will be generated. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From koen@win.tue.nl Wed Apr 19 10:41:11 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA10223; Wed, 19 Apr 2000 10:41:10 -0700 (PDT) Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA22290; Wed, 19 Apr 2000 10:41:09 -0700 Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id KAA07065 for ; Wed, 19 Apr 2000 10:41:08 -0700 (PDT) Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3) id TAA05872. Wed, 19 Apr 2000 19:37:18 +0200 (MET DST) From: koen@win.tue.nl (Koen Holtman) Message-Id: <200004191737.TAA05872@wsooti09.win.tue.nl> Subject: Re: dcluster and spoofing In-Reply-To: <200004190337.XAA16713@lycanthrope.crosslink.net> from "danielh@crosslink.net" at "Apr 18, 2000 11:31:18 pm" To: danielh@crosslink.net Date: Wed, 19 Apr 2000 19:37:18 +0200 (MET DST) Cc: http-delta@pa.dec.com X-Mailer: ELM [version 2.4ME+ PL43 (25)] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit >A suggestion regarding dcluster and spoofing: > >The following is an alternative to Koen's solution, which >involves some sort of verification scheme -- a scheme that >seems to me to negate much of the benefit of clustering (since at least >one non-delta response is required from "victim.org" before clustering can >begin). No, it is not the intention of my scheme to hold back clustering-related actions by a cache until the first non-delta response from victim.org. If a cache gets a GET request on victim,org, and victim.org was included in a Dcluster on a response from attack.org with an etag E, it is my intention that the cache goes ahead and transforms the request on victim.org into a delta request, using the etag E. However, the cache should only *use* the delta response obtained from this request if it has a Dcluster or Dtemplate pointing to attack.org, the sourece of the etag E. If this anti-spoofing check fails, then the cache will have made the delta request for nothing, and will have to retry without the etag E which is now known to be a from a spoofing attack, but this is an inefficiency one can live with. Note that an initial non-delta response from victim.org is not needed for this anti-spoofing mechanism to work. >[...your discussion of an alternative system snipped to save space...] Your alternative system would also work I, I believe. I'm usually not too worried about adding bytes to headers, but to make a comparison, my method requires adding bytes to the delta responses with clustering only, which looks more economical than adding bytes to every request that could trigger a delta response with clustering. I'm currently waiting for Jeff to publish a revised security section. I don't care much which anti-spoofing mechanism is used in the end, my main concern is that use or such a mechanism should be required by default, and that is is cheap enough that people are willing to stick to this requirement. > > >----------------------------------------------------------- >Daniel Hellerstein >danielh@crosslink.net >http://www.srehttp.org >----------------------------------------------------------- > Koen. From mogul Fri Apr 21 19:03:22 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id TAA14142; Fri, 21 Apr 2000 19:03:22 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200004220203.TAA14142@wera.pa.dec.com> To: http-delta Subject: Status report: Delta draft Date: Fri, 21 Apr 2000 19:03:22 -0700 X-Mts: smtp It probably shouldn't be a surprise to anyone that I've fallen behind on all of the comments people have been providing re: the latest revisions to the Delta encoding draft, still: ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-delta-04.6april2000.txt Life is like that (i.e., my actual job takes priority). Anyway, I've applied a lot of obvious edits (thanks to all who commented), and I've created an "issues list" to handle the rest of them. This "issues list" mechanism worked very well while we were editing the HTTP/1.1 specification - it ensures that open issues aren't lost, and that one doesn't have to search through zillions of email messages to find them. Jim Gettys had a nice HTML-tables format for his HTTP/1.1 issues list. I'm lazy; it's in ASCII text. Sorry. The current draft of the issues list is: ftp://ftp.digital.com/pub/DEC/WRL/mogul/issues-00.txt and is broken down into "substantive issues" and "editorial issues". "Editorial issues" are purely questions of how to put what we mean into words; the "substantive issues" are the trickier ones (i.e., what should the specification actually specify). For some of these, a few of us have been having private email conversations to work out the details, but we'll certainly encourage wider discussion if necessary. For discussion of issues: please put the Issue-name in the Subject line of your message, and try to keep the discussions on-topic. That is, one issue per email message (unless two or more are related), and use a new Subject: line for a new issue. If you want to suggest a new issue, I'd appreciate it if you use the "Template for issues list items" at the end of the file. Try to keep them narrower than 80 columns. Send new items directly to me, or to the entire http-delta list. Minor editorial stuff (spelling errors, incorrect citations, etc.) should go directly to me, rather than burdening the mailing list. Thanks, -Jeff From danielh@crosslink.net Sun Apr 23 10:44:20 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA03327; Sun, 23 Apr 2000 10:44:20 -0700 (PDT) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA00649; Sun, 23 Apr 2000 10:44:20 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id KAA22149 for ; Sun, 23 Apr 2000 10:44:19 -0700 (PDT) Received: from smtp.crosslink.net (dyn14.c5200-1.springfield.236.crosslink.net [207.199.142.15]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id NAA01847 for ; Sun, 23 Apr 2000 13:44:14 -0400 Message-Id: <200004231744.NAA01847@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Sun, 23 Apr 2000 13:43:06 -0400 To: http-delta@pa.dec.com Subject: DCLUSTER-ORDERING (issue) X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 The following is a possible section 12.4.2.a; which defines just how a client should use Dcluster information to determine what base-instance may be useable (hence, what etags to include in a If-None-Match) --------------- 12.4.2.a: Determining the base-instances in a uniqueness scope When sending a delta-enabled request, a client should identify all base-instances that may be useable. Essentially, the problem is finding all base-instances in the same uniqueness scope as the request-URI. The first step is simple: i) any available base-instance, from a prior response from the same request-URI, MAY be used The next step is to find any base-instance that is explicitily associated with a "matching" URL-prefix. ii) any available base-instance, from a prior response from a URI that contained a DCluster that prefix-matches the request-URI, MAY be used Then, the set of matching URL-prefixes should be determined. iii) The request-URI should be compared to all available DCluster information. This comparision will yield a set of matching URL-prefixs, and the dates of their defintion. Using the results of step iii, base-instances that are implicitily in the request-URI's uniqueness scope can be found iv) Every available base-instance is compared to each member of the set of matching URL-prefixes. If a match is found, and the date of the matching URL-prefix is before the date of the base-instance, then the base instance MAY be used. If a client's cache is large, following all these steps may be overly time-consuming. Thus, these steps are NOT required -- they are meant to define the largest set of useable base-instances, but not necessarily the optimal set. Notes: * step ii may define base-instance that do NOT prefix-match the request-URI. * the "available" base-instances are effected by expiration concerns. Expiration of base-instances may be due to constraints on the size of the client's cache, or may be dicated by the server (say, due to Cache-control: retain response headers) * the date of definition rule is used to prevent accidents with very old cache entries ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From danielh@crosslink.net Sun Apr 23 10:51:36 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA21711; Sun, 23 Apr 2000 10:51:36 -0700 (PDT) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA30908; Sun, 23 Apr 2000 10:51:36 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id KAA09200 for ; Sun, 23 Apr 2000 10:51:35 -0700 (PDT) Received: from smtp.crosslink.net (dyn14.c5200-1.springfield.236.crosslink.net [207.199.142.15]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id NAA02845 for ; Sun, 23 Apr 2000 13:51:34 -0400 Message-Id: <200004231751.NAA02845@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Sun, 23 Apr 2000 13:46:46 -0400 To: http-delta@pa.dec.com Subject: New issue: implicit delta-base X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Issue-Name: Implicit Delta Base Document-section: page 36, point 4 Reported-By: Daniel Hellerstein Reported-Date: 23 Apr 2000 Description: Clarification of caching rules when Delta-Base is not specified. Suggested resolution: Modify point 4 on the "when to not cache rules", to include: "If a delta response is returned without a delta-base, as may happen if If-None-Match contains a single etag, the proxy MAY create an Delta-base header for internal use (with a value equal to the single Etag contained in the request's If-None-Match header). Resolution-Date: ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From danielh@crosslink.net Sun Apr 23 10:56:20 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA31821; Sun, 23 Apr 2000 10:56:20 -0700 (PDT) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA25698; Sun, 23 Apr 2000 10:56:20 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id KAA30986 for ; Sun, 23 Apr 2000 10:56:19 -0700 (PDT) Received: from smtp.crosslink.net (dyn14.c5200-1.springfield.236.crosslink.net [207.199.142.15]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id NAA03386 for ; Sun, 23 Apr 2000 13:56:18 -0400 Message-Id: <200004231756.NAA03386@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Sun, 23 Apr 2000 13:51:57 -0400 To: http-delta@pa.dec.com Subject: New issue: Client X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Issue-Name: Client-initiated Dcluster (NEW ISSUE) Document-section: Section 8 Reported-By: Daniel Hellerstein Reported-Date: 23 April 2000 Description: Expanding the use of Dcluster I propose that a client can add a Dcluster request header. This would be used to indicate a client's "guess" as to an enhanced uniqueness scope that may also be available to the server. The client could then include etags in an If-None-Match that are associated with instances from the Dcluster. That is, with instances from URI's that prefix-match the argument contained in the Dcluster request header, and that would otherwise NOT be in the uniqueness scope of the request-URI. Suggested resolution: In essence, this provides a way for the client and server to coordinate on the use of "augmented caches". For example, there may be sites that specialize in commonly used "base instances", and these site may be readily accessible by both client and server. Alternatively, clients (and servers) may have out-of-band means of adding instances (and their URIs and Etags) to their cache; for example, on an installation CD used to install access to a full-service ISP. This capability might exacerbate possible spoofing problems; but nothing that would not be solved by Koen's solution (of adding the appropriate Dcluster response-header to any response that uses a base-instance that is not from the request-URI; that is, that is from the "extended" uniqueness-scope). Example: GET /personal/biography.html HTTP/1.1 Host: joeblow.umess.edu Dcluster: baseinstances.umess.edu/personal/ A-IM: vcdiff If-None-Match: "std_biography" Assuming that joeblow.umess.edu has quick out-of-band access to instances generated by baseinstances.umess.edu; and that the joeblow.umess.edu will never use "std_biography" as an etag for /personal/biography.html, it could respond with. HTTP/1.1 226 IM Used Etag: "joe_bio1c" Delta-base: "std_homepage" Dcluster: baseinstances.umess.edu IM: vcdiff Notes: * the response Delta-base and Dcluster are optional. In particular, Dcluster is not needed for security reasons -- the inclusion of a Dcluster by the client gives both side enough information to detect spoofing. * this example makes sense when the client (say, janedoe.umess.edu) also has quick out-of-bound access to baseinstances.umess.edu, or when it has made prior requests to umess.edu which resulted in acquisition of the "std_biography" instance (say, via a DTemplate). Resolution-Date: ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From koen@win.tue.nl Wed Apr 26 11:48:09 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA18340; Wed, 26 Apr 2000 11:48:08 -0700 (PDT) Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA18397; Wed, 26 Apr 2000 11:48:08 -0700 Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id LAA30841 for ; Wed, 26 Apr 2000 11:48:07 -0700 (PDT) Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3) id UAA03796. Wed, 26 Apr 2000 20:44:20 +0200 (MET DST) From: koen@win.tue.nl (Koen Holtman) Message-Id: <200004261844.UAA03796@wsooti09.win.tue.nl> Subject: Re: more thoughts on dcluster (fwd) To: http-delta@pa.dec.com Date: Wed, 26 Apr 2000 20:44:20 +0200 (MET DST) Cc: koen@win.tue.nl (Koen Holtman) X-Mailer: ELM [version 2.4ME+ PL43 (25)] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit [This should have gone to the list too, forwarded on request, sorry for the delay] ----- Forwarded message from Real Name ----- >From danielh@crosslink.net Fri Apr 21 16:04:21 2000 X-Really-To: From: danielh@crosslink.net (Real Name) Subject: Re: more thoughts on dcluster To: koen@win.tue.nl (Koen Holtman) X-Mailer: CommuniGate Pro Web Mailer v.3.1 Date: Fri, 21 Apr 2000 10:04:17 -0400 Message-ID: In-Reply-To: <200004191737.TAA05872@wsooti09.win.tue.nl> Consider a case where there exists a large & fast public repository of "base instances" (say, www.baseinstances.net) which contains a myriad of commonly used templates-like base instances -- say one for a "typical home page", one for a typical "send me your comments page", etc. Assuming these can be used as base instances by a variety of sites; it is likely that a client will have used one of them recently (I'm abstracting from how). Hence, on future requests the client can tell a server that it has copies of a (or of several) likely base-instances from such a repository, which could make delta much more effective. Dcluster (or dtemplate) can handle this now, but it does require that the server inform the client first of what URIs are in a uniqueness scope. I'm wondering if the opposite would also work -- the client "guessing" what uniqueneness scope the uri might fall into. In particular, that the client would include a Dcluster: that points to www.baseinstances.net, along with several appropriate etags. The server can then use these base-instance (if readily available). Actually, the client may never have contacted www.baseinstances.net -- it may have "pre-loaded" it's cache from an installation or update cd-rom; or otherwise used out-of-band means to load base instances (and their etag & uris) into it's cache. And how does that effect spoofing? I'm thinking that either Koen's or my notion -- of a server providing a dcluster, or the client providing it, can work. That is, either mechanism can be used to indicate the source of an etag; both need not be done. And the above example shows one case where "client provided" dclusters can do double duty -- as way of "guessing" a uniqueness scope, and as a way of protecting against spoofing. ----- End of forwarded message from Real Name ----- From mogul@pa.dec.com Fri Apr 28 15:27:42 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA32165; Fri, 28 Apr 2000 15:27:42 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA08114; Fri, 28 Apr 2000 15:27:42 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA32291; Fri, 28 Apr 2000 15:27:42 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200004282227.PAA32291@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: DCLUSTER-ORDERING (issue) In-Reply-To: Your message of "Sun, 23 Apr 2000 13:43:06 EDT." <200004231744.NAA01847@lycanthrope.crosslink.net> Date: Fri, 28 Apr 2000 15:27:42 -0700 X-Mts: smtp writes: The following is a possible section 12.4.2.a; which defines just how a client should use Dcluster information to determine what base-instance may be useable (hence, what etags to include in a If-None-Match) Thanks for suggesting this resolution. I think I will adopt the basic outline of your suggestion (but I plan to place it as section 12.10, more or less). I had to spend some time working through the various cases, and so I ended up with with a somewhat different way of stating the rules, but I think they are fairly precise now. Also, I worked in the anti-spoofing rules that Koen wanted to see (as far as I understand things), although there will still need to be some more language elsewhere about this. -Jeff +---+ 12.10 Rules for matching cache entries with DCluster headers Normally, when a client does a cache lookup to find an entry matching the URL of a resource, it checks for an exact match. A client that supports the DCluster header (section XXX) MAY use a more complex matching rule when formulating a request for a delta-encoded response, allowing the client to list entity tags from multiple resources. Assuming that a client is about to make a request for a delta-encoded response for a given Request-URI URL1, the request MAY include the entity tag from a cache entry for URL2 if the cache entry for URL1 does not contain a DTemplate header (section YYY) specifying a resource other that URL2, and if at least one of the following conditions hold: (1) URL2 is URL1. (2) The cache entry for URL1 includes a DCluster header field, and at least one of the uri-prefix values in that field is a prefix of URL2, and the Date header field in the cache entry for URL1 is no newer than the Date header field in the cache entry for URL2. (See section 14.2 for privacy considerations.) Note: a cache that includes multiple entries for URL1 might have several with DCluster field values identical to value in the most recent entry. If so, the constraint on Date header values may be satisfied by the oldest such cache entry for URL1. In practice, an implementation might choose to record, in the cache entry for URL1, the Date value from the last response that changed the DCluster value for URL1, rather than storing the actual prior cache entries. >>>QUESTION: the spoofing attack is not possible in case 2, right?<<< (3) The cache entry for URL2 includes a DCluster header field, and at least one of the uri-prefix values in that field is a prefix of URL1, and (to protect against the spoofing spoofing attack described in section 14.1) at least one of these conditions holds: (a) The host part (and port, if specified) of URL1 and URL2 are identical. (b) Condition (2) above also holds. (c) The client intends to reject any delta response without a secure means to detect spoofing, such as an instance digest. (d) The client implementation has been explicitly configured to disable protection against spoofing. The matching rules in this section define the maximal set of cache entries, and thus entity tags, that a client MAY use in a request for a delta-encoded response. In general, clients SHOULD further prune the set to avoid sending excessively large headers. The precise details of this pruning operation are left to the individual implementation, but pruning SHOULD be consistent with these rules: (1) If the cache entry for URL2 includes a "retain" cache-directive, this entry SHOULD NOT be used if the optional delta-seconds value is larger than the entry's age. (2) Otherwise, cache entries with "retain" cache-directives SHOULD be preferred over other entries. (3) Newer entries MAY be preferred over older entries. +---+ From mogul@pa.dec.com Fri Apr 28 15:43:30 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA02367; Fri, 28 Apr 2000 15:43:30 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA07385; Fri, 28 Apr 2000 15:43:30 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA28917; Fri, 28 Apr 2000 15:43:30 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200004282243.PAA28917@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: New issue: Client-initiated Dcluster In-Reply-To: Your message of "Sun, 23 Apr 2000 13:51:57 EDT." <200004231756.NAA03386@lycanthrope.crosslink.net> Date: Fri, 28 Apr 2000 15:43:30 -0700 X-Mts: smtp writes: I propose that a client can add a Dcluster request header. This would be used to indicate a client's "guess" as to an enhanced uniqueness scope that may also be available to the server. The client could then include etags in an If-None-Match that are associated with instances from the Dcluster. That is, with instances from URI's that prefix-match the argument contained in the Dcluster request header, and that would otherwise NOT be in the uniqueness scope of the request-URI. I think I'm going to reject this for a few reasons: First, and most important, while this might be an interesting and useful extension to the delta encoding mechanism, I don't it represents an identifiable problem with the existing draft. I'd suggest waiting until we've reached some sort of closure on the basic mechanism, then writing this up as a separate Internet-Draft. As far as I can tell, since it should be an optional mechanism, it can be described in a separate document. Second, I'd strongly recommend against using the same header name in both a request and a response unless it really means *exactly* the same thing. We've had some confusion in HTTP/1.1 because, for example, some of the cache-directive names are valid in both requests and responses, but mean subtly different things. Finally, I'd suggest thinking carefully about whether the mechanism you propose would actually work correctly, without some additional details. I don't think you could actually safely allow deltas using entity tags that aren't definitely in the same uniqueness scope (this is just a hunch). -Jeff From mogul Fri Apr 28 18:23:09 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA17481; Fri, 28 Apr 2000 18:23:09 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200004290123.SAA17481@wera.pa.dec.com> To: http-delta Subject: new issue: DELTA+IF-RANGE Date: Fri, 28 Apr 2000 18:23:09 -0700 X-Mts: smtp Just thinking about all of the likely header combinations ... -Jeff Issue-Name: DELTA+IF-RANGE Document-section: needs new subsection of section 12? Reported-By: Jeff Mogul Reported-Date: Fri, 28 Apr 2000 Description: The spec needs to provide some guidance on how the server should interpret a request that allows delta encoding and also includes an If-Range header. HTTP/1.1 says If-Range means: if the entity is unchanged, send me the part(s) that I am missing; otherwise, send me the entire new entity When combined with a request for a delta, the meaning could either be: if the entity is unchanged, send me the part(s) OF THE DELTA that I am missing; otherwise, send me the entire new entity or it could be: if the entity is unchanged, send me the part(s) that I am missing; otherwise, send me A DELTA-ENCODED RESPONSE FOR the entire new entity or it could be: if the entity is unchanged, send me the part(s) OF THE DELTA that I am missing; otherwise, send me A DELTA-ENCODED RESPONSE FOR the entire new entity Suggested resolution: The third choice seems to be the only useful interpretation. The first choice seems odd (why would one only want to apply delta-encoding to the previous response [the one that was prematurely terminated and that is being filled in with an If-Range], but not to the current one?). The second choice also seems not to work (the prematurely terminated response could not have been delta-encoded, because trying to fill it in using a Range of the non-delta-encoded instance wouldn't work in that case, but then why ask for a delta now if we didn't ask for it the last time?) So a legal example of this combination (choice #3) would be something like: GET /foo.html HTTP/1.1 Host: example.com Range: 1024- // get the rest of the response A-IM: vcdiff, range // apply the delta, then the range If-Range: "abc" // Etag for partial prior response If-None-Match: "pqr" // Etag for prior base instance Perhaps the spec should say that if the request carries and If-Range header, and the A-IM header lists "range" prior to any delta-coding, then the server SHOULD ignore the delta-coding? Resolution-Date: From danielh@crosslink.net Fri Apr 28 21:36:47 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id VAA21938; Fri, 28 Apr 2000 21:36:47 -0700 (PDT) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA29078; Fri, 28 Apr 2000 21:36:47 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id VAA28573 for ; Fri, 28 Apr 2000 21:36:46 -0700 (PDT) Received: from smtp.crosslink.net (dyn31.c5200-1.springfield.236.crosslink.net [207.199.142.32]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id AAA11505 for ; Sat, 29 Apr 2000 00:36:44 -0400 Message-Id: <200004290436.AAA11505@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Sat, 29 Apr 2000 00:34:33 -0400 To: http-delta@pa.dec.com In-Reply-To: <200004282329.QAA19758@wera.pa.dec.com> Subject: Re: New issue: implicit delta-base X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Reported-By: Daniel Hellerstein Reported-Date: 23 Apr 2000 > > Description: Clarification of caching rules when Delta-Base is not > specified. > > Suggested resolution: > > Modify point 4 on the "when to not cache rules", to include: > "If a delta response is returned without > a delta-base, as may happen if If-None-Match contains a single etag, > the proxy MAY create an Delta-base header for internal use > (with a value equal to the single Etag contained in the > request's If-None-Match header). > Jeff replied >How about a slightly different modification: at the end of >section 12.4.1 (Delta-Base), add this: > A cache that receives a delta-encoded response that lacks > a Delta-base header MAY add a Delta-Base header whose value > is the entity tag given in the If-None-Match field of the > request (but only if that field lists exactly one entity > tag). >This kills two birds with one stone: it solves your problem, and it also >allows a caching proxy to forward the implicit >Delta-base to another client. That's fine by me. >Alternatively, we could change 12.4.1 from > Any response with an IM header that includes a delta-coding MAY > include a Delta-Base header. >to > Any response with an IM header that includes a delta-coding SHOULD > include a Delta-Base header. >as suggested in the note in that section, which would render your issue >superfluous. Any comments? I prefer the first solution. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From danielh@crosslink.net Fri Apr 28 22:28:08 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id WAA27640; Fri, 28 Apr 2000 22:28:08 -0700 (PDT) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA12409; Fri, 28 Apr 2000 22:28:08 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id WAA30392 for ; Fri, 28 Apr 2000 22:28:07 -0700 (PDT) Received: from smtp.crosslink.net (dyn31.c5200-1.springfield.236.crosslink.net [207.199.142.32]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id BAA18916 for ; Sat, 29 Apr 2000 01:28:05 -0400 Message-Id: <200004290528.BAA18916@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Sat, 29 Apr 2000 01:23:08 -0400 To: http-delta@pa.dec.com In-Reply-To: <200004290123.SAA17481@wera.pa.dec.com> Subject: Re: new issue: DELTA+IF-RANGE X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 > So a legal example of this combination (choice #3) would > be something like: > GET /foo.html HTTP/1.1 > Host: example.com > Range: 1024- // get the rest of the response > A-IM: vcdiff, range // apply the delta, then the range > If-Range: "abc" // Etag for partial prior response > If-None-Match: "pqr" // Etag for prior base instance Which means: If foo.html's current-instance has an etag of "abc", then a) compute a delta (using vcdiff) between "abc" and "pqr" b) return bytes 1024- of this delta If it's NOT "abc" then (say, it's "xyz") a) compute a delta between "xyz" and "pqr", and return this delta (ignore the range) > Perhaps the spec should say that if the request carries > and If-Range header, and the A-IM header lists "range" > prior to any delta-coding, then the server SHOULD ignore > the delta-coding? Makes sense -- what would the vcdiff (of the 1024- range) be computed against? ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From danielh@crosslink.net Fri Apr 28 22:28:44 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id WAA06345; Fri, 28 Apr 2000 22:28:44 -0700 (PDT) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA01434; Fri, 28 Apr 2000 22:28:44 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id WAA23476 for ; Fri, 28 Apr 2000 22:28:43 -0700 (PDT) Received: from smtp.crosslink.net (dyn31.c5200-1.springfield.236.crosslink.net [207.199.142.32]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id BAA19007 for ; Sat, 29 Apr 2000 01:28:41 -0400 Message-Id: <200004290528.BAA19007@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Sat, 29 Apr 2000 01:27:41 -0400 To: http-delta@pa.dec.com In-Reply-To: <200004290011.RAA13020@wera.pa.dec.com> Subject: Re: another thought re: client-initiated Dcluster X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Jeff said >I think maybe what you are getting at is something a little >different from DCluster. Maybe I'm reading this the wrong >way, but I don't think this is best thought of as a case of >expanding the uniqueness scope for a URL. >Rather, I think what you really want to do is to have the >client's request express the concept that > I have ALL of the base instances in this [large] set, > so I don't intend to list them all in my If-None-Match > header. >So the sequence of events might be that server umess.edu >tells the client that its uniqueness scope includes >"http://baseinstances.com" or perhaps "cdrom://baseinstaces". >Then the client has a choice of a million entity tags >(or however many things are in this repository) to specify >in its If-None-Match header - this is clearly not feasible. The notion was that a client would only use the instances that are somehow functionally related to the resource it's about to request -- where "functionally related" would be determined by name and location. For example, a rule of the sort "/hello.htm is used as a "welcome to my site "page, hence we should use the base-instances for this type of page" would be used to choose a limited set of etags. I now think that this may be overly restrictive, in the pain-in-the-neck-to-implement sense -- getting servers and clients to agree on these "rules" being the first obstacle. So your suggestion does have a certain logic. Your suggestion may be a bit extreme, since it seems to the client to having "all" the base instances in the named respository, not just a "useful collection". But with carefully orgainized "sets", say as specified using subdirectories, this may not be such an onerous problem. >So instead, the client could send something like > GET /personal/daniel.html HTTP/1.1 > Host: bios.umess.edu > Base-Instance-sets: http://baseinstances.com/, > cdrom://baseinstances > A-IM: vcdiff >with no If-None-Match, because (in this example) it has never received >your biography before. >And the server could respond > HTTP/1.1 226 IM Used > IM: vcdiff > DTemplate: "http://baseinstances.com/biotemplate/version97" > Delta-Base: "whateveretag" > Etag: "adjklaskdjasd" Interesting, the response indicates a template that the server believes the client "already has", based on the client's inclusion of a Base-Instance-sets. The Delta-base may not be strictly necessary, but it's probably a good idea -- since there may be more then one "base instance" associated with the named template (assuming that the delta-base is for this named template) >with the body that is the delta between one of the many >files in one of those repositories and the current version >of your biography. Or maybe my assumption is not what you were thinking (that the delta base points to a base instance of http://baseinstances.com/biotemplate/version97)? Which I think doesn't make sense. >This isn't fully worked through - I'm leaving that to you :-). But I >think the first step is to clearly define what problem you are trying to >solve, and I think the issue with a >repository-based approach is not how to define the uniqueness scope, but >how to limit the number of entity tags in the >request headers. The problem is how to use delta-encoding on the first response to a client. The idea is that when a client knows (or can reaonably guess) that the origin server has quick access to a respository of "base instances" -- a repository that the client also has quick access to. For example, this repository may be a very fast (or widely mirrored) site, or it may be local data distributed via CD-ROM (say, in an installation package used for all clients at a university). At this moment, I can't think of any major flaws in your proposal. The points I would make are: a) the client needs some unspecified means to determine when to include a Base-Instances-Set request header b) the Delta-base that is returned refer to the "uniqueness scope" of the URI included in the Dtemplate; that is, that the base instance used by the client to decode the response be identified by the Delta-base etag of the named Dtemplate. c) the base-instances-set can be specified down to the "subdirectory" level; it need not refer to an "entire site". d) I would not allow base-instance-sets of the form cdrom://baseinstances Instead, the client could use baseinstances.com/cdrom_OCT99/ which both client and server would presumably have a "local" copy of. That is, express everything as URL-prefix, that points to an server, and let the client and server take advantage of whatever clever "caches" they may have. So -- I guess I've been volunteered to write a section.... assuming that the rest of the group deems this a worth addition (or at least one person thinks so, and no one else disagrees). ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From koen@win.tue.nl Sat Apr 29 13:20:02 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA05058; Sat, 29 Apr 2000 13:20:02 -0700 (PDT) Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA11777; Sat, 29 Apr 2000 13:20:01 -0700 Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA19482; Sat, 29 Apr 2000 13:20:00 -0700 (PDT) Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3) id WAA08791. Sat, 29 Apr 2000 22:16:14 +0200 (MET DST) From: koen@win.tue.nl (Koen Holtman) Message-Id: <200004292016.WAA08791@wsooti09.win.tue.nl> Subject: Re: DCLUSTER-ORDERING (issue) In-Reply-To: <200004282227.PAA32291@wera.pa.dec.com> from Jeffrey Mogul at "Apr 28, 2000 3:27:42 pm" To: mogul@pa.dec.com (Jeffrey Mogul) Date: Sat, 29 Apr 2000 22:16:14 +0200 (MET DST) Cc: http-delta@pa.dec.com, koen@win.tue.nl (Koen Holtman) X-Mailer: ELM [version 2.4ME+ PL43 (25)] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Jeff writes: > writes: > > The following is a possible section 12.4.2.a; which defines just > how a client should use Dcluster information to determine what > base-instance may be useable (hence, what etags to include in a > If-None-Match) > >Thanks for suggesting this resolution. I think I will adopt >the basic outline of your suggestion (but I plan to place it >as section 12.10, more or less). I had to spend some time >working through the various cases, and so I ended up with >with a somewhat different way of stating the rules, but I >think they are fairly precise now. Also, I worked in the >anti-spoofing rules that Koen wanted to see (as far as I >understand things), although there will still need to be >some more language elsewhere about this. The proposed language below looks OK at first sight, but I have not done a detailed analysis. However I am a bit concerned about the direction that is taken here in expanding the draft. I don't think that it is necessary that the draft spells out, at a MUST/MAY/SHOULD level, an exact algorithm for selecting the etags to send. I believe that it is safe to leave the invention of the algorithm up to the implementers. (Most of the text below could be helpful as an appendix though -- that helps implementers without running the risk that the real spec becomes self-contradictory or develops a hole because we forgot a case.) The draft _must_ be very exact in the algorithm for deciding when a base instance (with an etag X) and a delta response can be merged. >From this merging decision algorithm it will follow that it will never make sense to send certain etags in the request, because it is known in advance that they can never be a valid X in the merging step. The implementer, however, can make the necessary deductions about which etags make sense here. If the implementer gets it wrong and writes an algorithm that sends too many etags, this will result in an inefficiency (sometimes the response obtained will fail the test that allows merging) but it will never result in incorrect content being delivered. I could see some room for the draft making suggestions on certain classes of etags that should not be sent, because servers would never be expected to use them -- e.g. the etag of a base instance for which revalidation failed in the past. > >-Jeff > >+---+ >12.10 Rules for matching cache entries with DCluster headers > > Normally, when a client does a cache lookup to find an > entry matching the URL of a resource, it checks for an > exact match. A client that supports the DCluster header > (section XXX) MAY use a more complex matching rule when > formulating a request for a delta-encoded response, > allowing the clent to list entity tags from multiple > resources. > > Assuming that a client is about to make a request for a > delta-encoded response for a given Request-URI URL1, the > request MAY include the entity tag from a cache entry for > URL2 if the cache entry for URL1 does not contain a > DTemplate header (section YYY) specifying a resource other > that URL2, and if at least one of the following conditions hold: > > (1) URL2 is URL1. > > (2) The cache entry for URL1 includes a DCluster header > field, and at least one of the uri-prefix values in > that field is a prefix of URL2, and the Date header > field in the cache entry for URL1 is no newer than the > Date header field in the cache entry for URL2. I believe the 'no newer' above is too restrictive in the template case. If URL2 were a template it would generally be very old. right? (I assume the 'dcluster' above is supposed to mean 'dcluster and dtemplate'.) > (See > section 14.2 for privacy considerations.) > > Note: a cache that includes multiple entries for URL1 > might have several with DCluster field values identical > to value in the most recent entry. If so, the constraint > on Date header values may be satisfied by the oldest > such cache entry for URL1. In practice, an implementation > might choose to record, in the cache entry for URL1, > the Date value from the last response that changed > the DCluster value for URL1, rather than storing the > actual prior cache entries. > > >>>QUESTION: the spoofing attack is not possible in case 2, right?<<< On the spoofing attack: it looks like the attack is prefented here, yes, but I would have to see the merging rule to see if the attack is prevented everywhere. I would like to see all possible attacks being prevented by the merging rule, as I think it will be in what you will write. The etag sending rules above would then only affect efficiency, not security, if they happend to have a hole. > > (3) The cache entry for URL2 includes a DCluster header > field, and at least one of the uri-prefix values in > that field is a prefix of URL1, and (to protect against > the spoofing spoofing attack described in section 14.1) > at least one of these conditions holds: > (a) The host part (and port, if specified) of URL1 > and URL2 are identical. > (b) Condition (2) above also holds. > (c) The client intends to reject any delta response > without a secure means to detect spoofing, such > as an instance digest. > (d) The client implementation has been explicitly > configured to disable protection against spoofing. I am really uncomfortable with the (a)-(d) list above because it greatly expands the number of cases one needs to consider to determine if the spoofing protection is watertight. As I said above, I would like the protection to be centralised in the merging rule part of the spec; this reduces the cases above to only case (c), so that everything from 'and (to protect against...' can be deleted. > > The matching rules in this section define the maximal set > of cache entries, and thus entity tags, that a client MAY > use in a request for a delta-encoded response. In general, > clients SHOULD further prune the set to avoid sending > excessively large headers. The precise details of this > pruning operation are left to the individual implementation, > but pruning SHOULD be consistent with these rules: > (1) If the cache entry for URL2 includes a "retain" > cache-directive, this entry SHOULD NOT be used if the > optional delta-seconds value is larger than the entry's age. > > (2) Otherwise, cache entries with "retain" cache-directives > SHOULD be preferred over other entries. > > (3) Newer entries MAY be preferred over older entries. There should probably be something about Dcluster in the above discussion too. Koen. From danielh@crosslink.net Sat Apr 29 16:59:11 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA12199; Sat, 29 Apr 2000 16:59:10 -0700 (PDT) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA22131; Sat, 29 Apr 2000 16:59:10 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id QAA00393 for ; Sat, 29 Apr 2000 16:59:09 -0700 (PDT) Received: from smtp.crosslink.net (dyn09.c5200-1.springfield.236.crosslink.net [207.199.142.10]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id TAA26045 for ; Sat, 29 Apr 2000 19:59:01 -0400 Message-Id: <200004292359.TAA26045@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Sat, 29 Apr 2000 19:45:54 -0400 To: http-delta@pa.dec.com In-Reply-To: <200004292016.WAA08791@wsooti09.win.tue.nl> Subject: Re: DCLUSTER-ORDERING (issue) X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Koen wrote >However I am a bit concerned about the direction that is taken here in >expanding the draft. I don't think that it is necessary that the draft >spells out, at a MUST/MAY/SHOULD level, an exact algorithm for selecting >the etags to send. I believe that it is safe to leave the invention of >the algorithm up to the implementers. (Most of the text below could be >helpful as an appendix though -- that helps implementers without running >the risk that the real spec becomes self-contradictory or develops a hole >because we forgot a case.) The current paragraph The matching rules in this section define the maximal set of cache entries, and thus entity tags, that a client MAY use in a request for a delta-encoded response. In general, clients SHOULD further prune the set to avoid sending excessively large headers. The precise details of this pruning operation are left to the individual implementation, but pruning SHOULD be consistent with these rules: is a fairly weak -- except for the limitation on what the "maximal" set should be. If I read correctly, you are advocating that the concept of the "maximal set" not be prominent. That a client can send any etag, even if there is no direct evidence that the base-instance associated with one of these "any etags" is from the request-URI's uniqueness scope. In other words, including such etags may be very inefficient, but it's not unacceptable practice. >prevented by the merging rule, as I think it will be in what you will >write. The etag sending rules above would then only affect efficiency, Is the "merging rule" a) IF a server uses a base-instance from a request-URI's uniqueness scope, but not from the actual request-URI b) THEN it MUST include a Dcluster pointing ot the URI for which this etag is (associated with) a base-instance ? If so, that would alleviate some concerns about clients exceeding the "maximal set". But I still prefer the language the way it is (maximal sets, defined using SHOULD and MAY) -- since it will encourage careful practice. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From koen@win.tue.nl Sun Apr 30 13:03:24 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA10763; Sun, 30 Apr 2000 13:03:23 -0700 (PDT) Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA24355; Sun, 30 Apr 2000 13:03:23 -0700 Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA08730 for ; Sun, 30 Apr 2000 13:03:22 -0700 (PDT) Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3) id VAA10021. Sun, 30 Apr 2000 21:59:34 +0200 (MET DST) From: koen@win.tue.nl (Koen Holtman) Message-Id: <200004301959.VAA10021@wsooti09.win.tue.nl> Subject: Re: DCLUSTER-ORDERING (issue) In-Reply-To: <200004292359.TAA26045@lycanthrope.crosslink.net> from "danielh@crosslink.net" at "Apr 29, 2000 7:45:54 pm" To: danielh@crosslink.net Date: Sun, 30 Apr 2000 21:59:34 +0200 (MET DST) Cc: http-delta@pa.dec.com X-Mailer: ELM [version 2.4ME+ PL43 (25)] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit >Koen wrote >>However I am a bit concerned about the direction that is taken here in >>expanding the draft. I don't think that it is necessary that the draft >>spells out, at a MUST/MAY/SHOULD level, an exact algorithm for selecting >>the etags to send. I believe that it is safe to leave the invention of >>the algorithm up to the implementers. (Most of the text below could be >>helpful as an appendix though -- that helps implementers without running >>the risk that the real spec becomes self-contradictory or develops a hole >>because we forgot a case.) > >The current paragraph > The matching rules in this section define the maximal set > of cache entries, and thus entity tags, that a client MAY > use in a request for a delta-encoded response. In general, > clients SHOULD further prune the set to avoid sending > excessively large headers. The precise details of this > pruning operation are left to the individual implementation, > but pruning SHOULD be consistent with these rules: > >is a fairly weak -- except for the limitation on what the >"maximal" set should be. Yes. > >If I read correctly, you are advocating that the concept of the "maximal >set" not be prominent. That a client can send any etag, even if there >is no direct evidence that the base-instance associated with one of these >"any etags" is from the request-URI's uniqueness scope. In other words, >including such etags may be very inefficient, but it's not unacceptable >practice. Yes, exactly. The position I am advocating is perhaps more editorial, or related to 'protocol complexity', than it will affect what goes on on the wire. However I do feel that editorial concerns from people other than the editor carry some weight in as far as they affect (the complexity of) the security related language. > > >>prevented by the merging rule, as I think it will be in what you will >>write. The etag sending rules above would then only affect efficiency, >Is the "merging rule" > a) IF a server uses a base-instance from a request-URI's uniqueness >scope, > but not from the actual request-URI > b) THEN it MUST include a Dcluster pointing ot the URI for which this > etag is (associated with) a base-instance >? What I have been calling the "merging rule" is the thing above, plus all other things that need to be checked on merging, e.g. also if the base instance said that the request-URI was in its uniqueness scope. > >If so, that would alleviate some concerns about clients exceeding the >"maximal set". > >But I still prefer the language the way it is (maximal sets, defined >using SHOULD and MAY) -- >since it will encourage careful practice. > >----------------------------------------------------------- >Daniel Hellerstein >danielh@crosslink.net >http://www.srehttp.org >----------------------------------------------------------- > Koen. From mogul@pa.dec.com Mon May 1 14:04:15 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA31473; Mon, 1 May 2000 14:04:15 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA24893; Mon, 1 May 2000 14:04:15 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA30649; Mon, 1 May 2000 14:04:14 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200005012104.OAA30649@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: another thought re: client-initiated Dcluster In-Reply-To: Your message of "Sat, 29 Apr 2000 01:27:41 EDT." <200004290528.BAA19007@lycanthrope.crosslink.net> Date: Mon, 01 May 2000 14:04:14 -0700 X-Mts: smtp writes: So -- I guess I've been volunteered to write a section.... assuming that the rest of the group deems this a worth addition (or at least one person thinks so, and no one else disagrees). Actually, I would strongly suggest doing this as a separate document (Internet-Draft), NOT as another section for the Delta specification. The current document is already too long/complex, and I don't think the extension you're proposing needs to be in the same document; it should be possible to layer it as an extension. We need to make progress on getting a Delta I-D finished, and while there may be many interesting bells and whistles that we could add, I'd argue against almost anything else at this point. -Jeff From mogul@pa.dec.com Mon May 1 14:44:39 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA04715; Mon, 1 May 2000 14:44:39 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA05593; Mon, 1 May 2000 14:44:39 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA11072; Mon, 1 May 2000 14:44:39 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200005012144.OAA11072@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: DCLUSTER-ORDERING (issue) In-Reply-To: Your message of "Sun, 30 Apr 2000 21:59:34 +0200." <200004301959.VAA10021@wsooti09.win.tue.nl> Date: Mon, 01 May 2000 14:44:39 -0700 X-Mts: smtp Koen Holtman writes: The position I am advocating is perhaps more editorial, or related to 'protocol complexity', than it will affect what goes on on the wire. However I do feel that editorial concerns from people other than the editor carry some weight in as far as they affect (the complexity of) the security related language. That's a fair criticism - if neither you nor Daniel can figure out what I meant to say in that section, then it's probably not good enough for the world at large. Let me take another stab at this DCLUSTER-ORDERING issue (although it should probably now be called DCLUSTER-MATCHING). When writing rules for the client (cache) to use in determining which entity tags to send in a delta-eligible request, we need to consider three orthogonal requirements: (1) will correct, non-malicious implementations that follow these rules always deliver the right content to the user? (2) do these rules make the most efficient use of shared resources (the Internet, servers, proxies, etc.)? (3) do these rules protect against the known spoofing attack? I think we all agree that we should not unduly limit the behavior or design of implementations beyond these three requirements (although anything we specify ought to be plausibly implementable!) The "maximal set" approach is primarily meant to address the first requirement. If the set of entity tags that a client generates is entirely contained in this maximal set, we can guarantee the right answer from non-malicious servers. Rules 3(a)-3(d) are designed to address the third requirement, anti-spoofing. The "pruning rules" are design to address the second requirement, efficiency. I'll try to write an introductory paragraph to explain that. We have a separate issue (SPOOFING) open, about whether the anti-spoofing rules are sound and strong enough. I'd like to continue to handle that as a separate issue, please! -Jeff From danielh@crosslink.net Mon May 1 14:55:13 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA03063; Mon, 1 May 2000 14:55:13 -0700 (PDT) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA32551; Mon, 1 May 2000 14:55:13 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id OAA17894 for ; Mon, 1 May 2000 14:55:12 -0700 (PDT) Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id RAA18256 for ; Mon, 1 May 2000 17:55:06 -0400 Message-Id: <200005012155.RAA18256@lycanthrope.crosslink.net> X-Really-To: Date: Mon, 01 May 2000 17:51:55 -0400 To: http-delta@pa.dec.com In-Reply-To: <200005012144.OAA11072@wera.pa.dec.com> Subject: Re: DCLUSTER-ORDERING (issue) X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 >When writing rules for the client (cache) to use in determining which >entity tags to send in a delta-eligible request, we need to consider >three orthogonal requirements: > (1) will correct, non-malicious implementations that > follow these rules always deliver the right content > to the user? > (2) do these rules make the most efficient use of > shared resources (the Internet, servers, proxies, etc.)? > (3) do these rules protect against the known spoofing > attack? >I think we all agree that we should not unduly limit the behavior or >design of implementations beyond these three requirements (although >anything we specify ought to be plausibly implementable!) Seems right to me. >The "maximal set" approach is primarily meant to address the first >requirement. If the set of entity tags that a client generates is >entirely contained in this maximal set, we can guarantee the right answer >from non-malicious servers. Rules 3(a)-3(d) are designed to address the >third requirement, >anti-spoofing. The "pruning rules" are design to address >the second requirement, efficiency. >I'll try to write an introductory paragraph to explain that. I've got no further comments right now... I'm awaiting the next draft, or section of draft. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Mon May 1 15:05:56 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA10421; Mon, 1 May 2000 15:05:56 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA21973; Mon, 1 May 2000 15:05:56 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA13318; Mon, 1 May 2000 15:05:56 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200005012205.PAA13318@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: DCLUSTER-ORDERING (issue) In-Reply-To: Your message of "Sat, 29 Apr 2000 22:16:14 +0200." <200004292016.WAA08791@wsooti09.win.tue.nl> Date: Mon, 01 May 2000 15:05:56 -0700 X-Mts: smtp Koen Holtman writes: However I am a bit concerned about the direction that is taken here in expanding the draft. I don't think that it is necessary that the draft spells out, at a MUST/MAY/SHOULD level, an exact algorithm for selecting the etags to send. I believe that it is safe to leave the invention of the algorithm up to the implementers. The draft _must_ be very exact in the algorithm for deciding when a base instance (with an etag X) and a delta response can be merged. From this merging decision algorithm it will follow that it will never make sense to send certain etags in the request, because it is known in advance that they can never be a valid X in the merging step. I think there is a subtle error in your reasoning here. There are basically three decision points where a choice is made about entity tags if "clustering" is used: (1) The client, when forming a request, has to pick a set of entity tags to send in the If-None-Match header. (2) The server, when computing a delta, has to pick one member of this set as the base instance (or it can decide to pick nothing == no delta encoding) (3) The client, when it receives a delta response, needs to decide if the response is valid for use in reconstructing the current base instance. As far as I can tell, Koen and Daniel are using the term "merging rule" to describe the third decision point (although I admit that I'm not entirely sure if that's what they mean). And Koen is arguing that the "merging rule" is where the specification must be exact. But, in fact, the first decision point is also critical (i.e., must be formally correct), or else protocol will break. The problem is that it is (potentially) possible for two different base instances, in two different uniqueness scopes, to have identical entity tags. In fact, vanilla HTTP/1.1 allows every resource served by a server to have exactly the same entity tag! (For example, a valid HTTP/1.1 server that provided a million different pages from a CD-ROM could use the CD-ROM's creation timestamp as the entity tag for each and every one of those pages.) And if the client and server don't agree on which uniqueness scope an entity tag is drawn from, they also would not realize that they could disagree on what the associated instance is. So the client MUST NOT, under any circumstances, tell the server that it wants a delta using an entity tag that isn't in the right uniqueness scope - there is no way for the checks at decisions points 2 or 3 to fix a mistake at point 1. I would agree that the third step also needs to be precisely specified, at least to the extent that at least one defense against spoofing involves looking at the DCluster header on that response. But (as far as I can tell) this is ONLY an issue for anti-spoofing, and not for the more general problem of ensuring correct behavior even when all parties are honest. -Jeff From mogul Mon May 1 16:55:39 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA28696; Mon, 1 May 2000 16:55:39 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200005012355.QAA28696@wera.pa.dec.com> To: http-delta Subject: SPOOFING In-reply-to: Your message of "Sat, 29 Apr 2000 22:16:14 +0200." <200004292016.WAA08791@wsooti09.win.tue.nl> Date: Mon, 01 May 2000 16:55:39 -0700 X-Mts: smtp Koen Holtman writes: > (3) The cache entry for URL2 includes a DCluster header > field, and at least one of the uri-prefix values in > that field is a prefix of URL1, and (to protect against > the spoofing spoofing attack described in section 14.1) > at least one of these conditions holds: > (a) The host part (and port, if specified) of URL1 > and URL2 are identical. > (b) Condition (2) above also holds. > (c) The client intends to reject any delta response > without a secure means to detect spoofing, such > as an instance digest. > (d) The client implementation has been explicitly > configured to disable protection against spoofing. I am really uncomfortable with the (a)-(d) list above because it greatly expands the number of cases one needs to consider to determine if the spoofing protection is watertight. As I said above, I would like the protection to be centralised in the merging rule part of the spec; this reduces the cases above to only case (c), so that everything from 'and (to protect against...' can be deleted. I understand your desire to centralize the anti-spoofing rules, as a matter of making the spec simpler to verify. But I think that this leads to excessively restrictive anti-spoofing rules, because I think that a client that follows either rule 3(a) or rule 3(b) [which means "the client already has a cache entry for the Request-URI that includes DCluster header covering URL2"] is safe against spoofing. We could debate that assertion (e.g., you could find a counter example). But if it is a true assertion, then I'm not sure why we should limit the implementors' options more than necessary. Frankly, it's not a big deal to me either way. I suspect that whatever we put into the spec, implementors might ignore the "official" security requirements and do whatever they think is "secure enough", as is too often the case with Web security. I'd rather analyze the options up front, rather than trying to limit the analysis to just a few of the choices, since then (if one of 3(a) or 3(b) proves faulty) we could at least warn implementors that this has been analyzed and shown not to work. -Jeff From mogul Tue May 2 17:41:56 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id RAA30436; Tue, 2 May 2000 17:41:56 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200005030041.RAA30436@wera.pa.dec.com> To: http-delta Subject: DTemplate & DCluster Date: Tue, 02 May 2000 17:41:56 -0700 X-Mts: smtp danielh@crosslink.net wrote (in a message to me, not to the list): > Assuming that a client is about to make a request for a > delta-encoded response for a given Request-URI URL1, the > request MAY include the entity tag from a cache entry for > URL2 if the cache entry for URL1 does not contain a > DTemplate header (section YYY) specifying a resource other > that URL2, and if at least one of the following conditions hold: Are you saying that inclusion of a DTemplate FORCES the client to use the DTemplate's "base instance" instead of other instances in the uniqueness scope, such as URL.2 may be? Or am I reading the above sentence incorrectly? You're reading the sentence right; the spec for DTemplate (in effect) changes the way that a client uses DCluster. Or rather, the meaning of DCluster stays the same (it defines the uniqueness scope), but if the client implements DTemplate, then it doesn't use the uniqueness scope as a source of a list of entity tags; it uses it as a source a list of of DTemplate values, and then "indirects" through this list to get a list of entity tags. However, after re-reading that, I realized that I had been too lazy about being precise. So I replaced that paragraph with a simpler one: Assuming that a client is about to make a request for a delta-encoded response for a given Request-URI URL1, the request MAY include the entity tag from a cache entry for URL2 if at least one of the following conditions hold: And then added this new paragraph, later in the section: If the client supports the OPTIONAL DTemplate header (section YYY), a modified rule applies. As the client chooses the set of cache entries from which entity tags are acceptable according to the matching rules listed above in this section, it constructs a set of the DTemplate header field values found in those acceptable entries. If the set is non-empty, then the client SHOULD ignore the entity tags chosen according to the rules above, and instead it lists the entity tags for any cache entries for the URIs specified by the set of DTemplate header field values. If no such cache entries are found, the client MAY request the resource specified by one of the DTemplate header field values, then use the entity tag for the response in its delta-eligible request for URL1. Is that clear (and does it seem right?) It's still a little dense. I'm trying hard to specify the necessary behavior, not the implementation behind it, but that leads to some abstraction. -Jeff From danielh@crosslink.net Tue May 2 21:35:30 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id VAA30296; Tue, 2 May 2000 21:35:30 -0700 (PDT) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA20266; Tue, 2 May 2000 21:35:29 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id VAA26333 for ; Tue, 2 May 2000 21:35:29 -0700 (PDT) Received: from smtp.crosslink.net (dyn45.c5200-2.springfield.236.crosslink.net [207.199.142.174]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id AAA03580 for ; Wed, 3 May 2000 00:35:26 -0400 Message-Id: <200005030435.AAA03580@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Wed, 03 May 2000 00:34:18 -0300 To: http-delta@pa.dec.com In-Reply-To: <200005030041.RAA30436@wera.pa.dec.com> Subject: Re: DTemplate & DCluster X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 In <200005030041.RAA30436@wera.pa.dec.com>, on 05/02/00 at 05:41 PM, Jeffrey Mogul said: >danielh@crosslink.net wrote (in a message to me, not to the list): > > Assuming that a client is about to make a request for a > > delta-encoded response for a given Request-URI URL1, the > > request MAY include the entity tag from a cache entry for > > URL2 if the cache entry for URL1 does not contain a > > DTemplate header (section YYY) specifying a resource other > > that URL2, and if at least one of the following conditions hold: > > Are you saying that inclusion of a DTemplate FORCES the client to > use the DTemplate's "base instance" instead of other instances in > the uniqueness scope, such as URL.2 may be? Or am I reading the > above sentence incorrectly? >You're reading the sentence right; the spec for DTemplate (in effect) >changes the way that a client uses DCluster. Or rather, the meaning of >DCluster stays the same (it defines the uniqueness scope), but if the >client implements DTemplate, then it doesn't use the uniqueness scope as >a source of a list of entity tags; it uses it as a source a list of of >DTemplate values, and then "indirects" through this list to get a list of >entity tags. That's kind of a shock -- I had no sense of this from my prior readings of the delta spec! >However, after re-reading that, I realized that I had been too lazy about >being precise. >So I replaced that paragraph with a simpler one: > > Assuming that a client is about to make a request for a > delta-encoded response for a given Request-URI URL1, the > request MAY include the entity tag from a cache entry for > URL2 if at least one of the following conditions hold: > >And then added this new paragraph, later in the section: I'm still a bit unclear... > If the client supports the OPTIONAL DTemplate header > (section YYY), a modified rule applies. As the client > chooses the set of cache entries from which entity tags > are acceptable according to the matching rules listed which means: the client looks at each of it's cache entries, and determines which entries are part of the uniqueness scope of URL1 -- for example, which ones have DCluster information that matches URL1. > above in this section, it constructs a set of the > DTemplate header field values found in those acceptable > entries. Allow me to think out loud ... In a sense, DTemplates are the opposite of DCluster. A DCluster says "in subsequent requests, you can this instance can be used as a base instance for these URIS". A DTemplates says "this uri is a good candidate for use as a base-instance on future requests to the request-URI you just asked for" It's a little bit odd -- why bother telling the client to go somewhere else, when you just sent her what is probably a perfectly good base instance? In most cases, the answer has to do with efficiency (possibly the DTemplates's base instance is easier to compute deltas against), or more likely permanence (the server will probably retain the DTemplate's base instance, but probably not the instance that was just sent to the client). So -- for a cached entry to contain a DTemplate does not mean "you can use me for some other request-URIs", it means "go here for another base-instance for me". Perhaps what should be said is that if any of the cached instances of URL1 contains a DTemplate entry, then only the instances pointed to by DTemplates, contained in URL1 cached instances, should be used. Other cached entries, that are not for URL1, should not be used -- EVEN if they contain DCluster information that puts them in URL1's uniqueness scope. This actually simplifies implementation -- the set of cached entries to be checked is much shorter (just those for those "starting at" the same request-URI). But maybe I still don't have it quite right?? > If the set is non-empty, then the client > SHOULD ignore the entity tags chosen according to the > rules above, and instead it lists the entity tags > for any cache entries for the URIs specified by the set > of DTemplate header field values. If no such cache > entries are found, the client MAY request the resource > specified by one of the DTemplate header field values, > then use the entity tag for the response in its > delta-eligible request for URL1. >Is that clear (and does it seem right?) It's still a little dense. I'm >trying hard to specify the necessary behavior, not the implementation >behind it, but that leads to some abstraction. >-Jeff ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul Wed May 3 18:52:33 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA05792; Wed, 3 May 2000 18:52:33 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200005040152.SAA05792@wera.pa.dec.com> To: http-delta Subject: Proposal: splitting the Delta document into two Date: Wed, 03 May 2000 18:52:33 -0700 X-Mts: smtp After spending the last week or so trying to figure out the details related to DCluster and DTemplate, it's dawned on me that it might make more sense to split this into two separate documents. One would specify the basic HTTP Delta mechanism, without any mention of clusters, templates, or uniqueness scopes. The other would extend that specification to add clusters, templates, or uniqueness scopes. This would give the following advantages: (1) simplify the presentation of both parts (2) decouple the basic delta mechanism (which is relatively well understood) from the more esoteric mechanisms (which are justified by research results, but which have not been (widely?) implemented). (3) isolate the debate about security issues, all of which seem to be associated with DCluster. We still have a few issues related to the basic delta spec, but most are connected to the cluster/templates parts. I've made an initial stab at the separation; it seems easy enough. Would anyone who is listed as an author of the basic Delta specification: Network Working Group Jeffrey Mogul, Compaq WRL, Internet-Draft Balachander Krishnamurthy, AT&T, Expires: 25 September 2000 Fred Douglis, AT&T, Anja Feldmann, Univ. of Saarbruecken, Yaron Goland, Arthur van Hoff, Marimba, Daniel Hellerstein, ERS/USDA like to be REMOVED as an author of the cluster/template mechanism? Would anyone else (Koen?) like to be added to the latter? Thanks, -Jeff From danielh@crosslink.net Thu May 4 07:42:40 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id HAA18668; Thu, 4 May 2000 07:42:40 -0700 (PDT) From: Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA05676; Thu, 4 May 2000 07:42:40 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id HAA28025 for ; Thu, 4 May 2000 07:42:39 -0700 (PDT) Received: from danielh ([151.121.64.82]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id KAA14124 for ; Thu, 4 May 2000 10:42:38 -0400 Message-Id: <200005041442.KAA14124@lycanthrope.crosslink.net> X-Really-To: Date: Thu, 04 May 2000 10:40:40 -0300 To: http-delta@pa.dec.com Subject: 2 parts X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 >After spending the last week or so trying to figure out the >details related to DCluster and DTemplate, it's dawned on >me that it might make more sense to split this into two >separate documents. So the idea is that a base implementation of delta would only use a request-URI's "own" instance. An advanced implementation would allow for the various ways of extending the uniqueness scope (Dtemplate, Dcluster, Base-Instances). My feeling is why not --- afterall, it doubles the number of "pubs" :] >This would give the following advantages: > (1) simplify the presentation of both parts > (2) decouple the basic delta mechanism (which is > relatively well understood) from the more esoteric > mechanisms (which are justified by research results, > but which have not been (widely?) implemented). > (3) isolate the debate about security issues, all > of which seem to be associated with DCluster. >We still have a few issues related to the basic delta spec, >but most are connected to the cluster/templates parts. But I don't think this solves the Dcluster spoofing problem -- an "advanced" client (that understands Dcluster) can still be fooled into using malicious.org's "foo" instance for victim.org's "foo" instance, even if victim.org only implements "basic delta". That is, if extended uniquness is ever allowed, then some way of identifying the provenence of a base-instance is still required. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Thu May 4 15:51:51 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA04906; Thu, 4 May 2000 15:51:51 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA14386; Thu, 4 May 2000 15:51:51 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA30715; Thu, 4 May 2000 15:51:51 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200005042251.PAA30715@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: 2 parts In-Reply-To: Your message of "Thu, 04 May 2000 10:40:40 -0300." <200005041442.KAA14124@lycanthrope.crosslink.net> Date: Thu, 04 May 2000 15:51:51 -0700 X-Mts: smtp danielh@crosslink.net writes: >This would give the following advantages: > (1) simplify the presentation of both parts > (2) decouple the basic delta mechanism (which is > relatively well understood) from the more esoteric > mechanisms (which are justified by research results, > but which have not been (widely?) implemented). > (3) isolate the debate about security issues, all > of which seem to be associated with DCluster. >We still have a few issues related to the basic delta spec, >but most are connected to the cluster/templates parts. But I don't think this solves the Dcluster spoofing problem -- an "advanced" client (that understands Dcluster) can still be fooled into using malicious.org's "foo" instance for victim.org's "foo" instance, even if victim.org only implements "basic delta". That is, if extended uniquness is ever allowed, then some way of identifying the provenence of a base-instance is still required. Separating the draft into two documents isn't intended to SOLVE the spoofing problem. It's only intended to remove that problem, and the complexities of solving it, from the document that specifies the basic delta mechanism. It does have the effect of limiting the possible solutions of the spoofing problem to those that only involve implementations that support DCluster and/or DTemplate. I.e., we should not add extra work for implementations that do not support either of those mechanisms. I believe we have always assumed that to be the case, this just makes it explicit. -Jeff From danielh@crosslink.net Thu May 4 20:04:32 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id UAA00408; Thu, 4 May 2000 20:04:32 -0700 (PDT) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA28024; Thu, 4 May 2000 20:04:31 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id UAA26478 for ; Thu, 4 May 2000 20:04:31 -0700 (PDT) Received: from smtp.crosslink.net (dyn59.c5200-1.springfield.236.crosslink.net [207.199.142.60]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id XAA29312 for ; Thu, 4 May 2000 23:04:29 -0400 Message-Id: <200005050304.XAA29312@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Thu, 04 May 2000 22:59:56 -0400 To: http-delta@pa.dec.com In-Reply-To: <200005042251.PAA30715@wera.pa.dec.com> Subject: Re: 2 parts X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 > But I don't think this solves the Dcluster spoofing problem -- an > "advanced" client (that understands Dcluster) can still be fooled > into using malicious.org's "foo" instance for victim.org's "foo" > instance, even if victim.org only implements "basic delta". > That is, if extended uniquness is ever allowed, then some way of > identifying the provenence of a base-instance is still required. >Separating the draft into two documents isn't intended to SOLVE the >spoofing problem. It's only intended to remove that problem, and the >complexities of solving it, from the document that specifies the basic >delta mechanism. >It does have the effect of limiting the possible solutions >of the spoofing problem to those that only involve implementations that >support DCluster and/or DTemplate. I.e., we should >not add extra work for implementations that do not support >either of those mechanisms. I believe we have always assumed that to be >the case, this just makes it explicit. If both parties are simple (no dcluster, no dtemplate) that's true; but if the client is "advanced" and the server is "simple", then spoofing can still occur. All I'm saying is that two docs may be a good idea (I've no objection), but the "basic" document will have to deal with spoofing in some way (since basic implementations will coexists with advanced implementation). Or, "advanced" implementations will have to be able to tell a server that this is an "advanced request", which a basic server can ignore (or can respond with a "i'm simple, so send me a simple request") ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From douglis@research.att.com Mon May 8 12:27:48 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id MAA27340; Mon, 8 May 2000 12:27:48 -0700 (PDT) Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA09823; Mon, 8 May 2000 12:27:46 -0700 Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id MAA12691; Mon, 8 May 2000 12:27:45 -0700 (PDT) Received: from alliance.research.att.com (alliance.research.att.com [135.207.26.26]) by mail-blue.research.att.com (Postfix) with ESMTP id DE55E4CE06; Mon, 8 May 2000 15:26:54 -0400 (EDT) Received: from windsor.research.att.com (windsor.research.att.com [135.207.26.46]) by alliance.research.att.com (8.8.7/8.8.7) with ESMTP id PAA18736; Mon, 8 May 2000 15:26:53 -0400 (EDT) Received: from windsor.research.att.com (localhost [127.0.0.1]) by windsor.research.att.com (8.8.8+Sun/8.8.5) with ESMTP id PAA15595; Mon, 8 May 2000 15:25:34 -0400 (EDT) Message-Id: <200005081925.PAA15595@windsor.research.att.com> X-Mailer: exmh version 2.1.1 10/15/1999 X-Exmh-Isig-Comptype: repl X-Exmh-Isig-Folder: delta From: Fred Douglis To: Jeffrey Mogul Cc: http-delta@pa.dec.com Subject: Re: Proposal: splitting the Delta document into two In-Reply-To: Your message of "Wed, 03 May 2000 18:52:33 PDT." <200005040152.SAA05792@wera.pa.dec.com> X-Uri: http://www.research.att.com/~douglis/ Mime-Version: 1.0 Content-Type: text/plain Comments: Hyperbole mail buttons accepted, v3.13. Date: Mon, 08 May 2000 15:25:33 -0400 Sender: douglis@research.att.com As you know, I was involved with the early work and have pretty much sat on the sidelines ever since. So, I think that if you want to split it, that's great, but I suspect that I and anyone else who was involved earlier on but had nothing to do with the more recent stuff shouldn't be named as an author -- or at least should read the new doc, which I haven't yet :-). I suspect if I had time to read it, I'd be very interested in the clustering work, but unfortunately I can't look at it for at least another 5-6 weeks due to a very high OSDI reviewing load. Fred From mogul@pa.dec.com Mon May 8 18:44:16 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA25071; Mon, 8 May 2000 18:44:16 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA26954; Mon, 8 May 2000 18:44:16 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA19441; Mon, 8 May 2000 18:44:15 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200005090144.SAA19441@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: 2 parts In-Reply-To: Your message of "Thu, 04 May 2000 22:59:56 EDT." <200005050304.XAA29312@lycanthrope.crosslink.net> Date: Mon, 08 May 2000 18:44:15 -0700 X-Mts: smtp wrote: >Separating the draft into two documents isn't intended to SOLVE the >spoofing problem. It's only intended to remove that problem, and the >complexities of solving it, from the document that specifies the basic >delta mechanism. >It does have the effect of limiting the possible solutions >of the spoofing problem to those that only involve implementations that >support DCluster and/or DTemplate. I.e., we should >not add extra work for implementations that do not support >either of those mechanisms. I believe we have always assumed that to be >the case, this just makes it explicit. If both parties are simple (no dcluster, no dtemplate) that's true; but if the client is "advanced" and the server is "simple", then spoofing can still occur. All I'm saying is that two docs may be a good idea (I've no objection), but the "basic" document will have to deal with spoofing in some way (since basic implementations will coexists with advanced implementation). This doesn't work. The basic Delta spec should not require implementations to do something specific that is meant to prevent Dcluster-spoofing, if these implementations are otherwise ignorant of Dcluster. I mean, it clearly would not work to require a server that does not support DCluster to send a Dcluster header in its responses! At least, not without a major redefinition of what a Dcluster header means, and I think this would become very confusing. So we need to find a solution to the Dcluster-spoofing problem that does not depend on any Dcluster-specific behavior from non-Dcluster-supporting implementations. This should have been obvious from the start, but by separating the documents, we can now make this explicit. Or, "advanced" implementations will have to be able to tell a server that this is an "advanced request", which a basic server can ignore (or can respond with a "i'm simple, so send me a simple request") I'm against this approach for three reasons: (1) increased specification complexity for the basic document (2) increased protocol overhead for the Dcluster mechanism (3) not at all clear to me how this would be implemented. My suggestion: let's give the Dcluster/Dtemplate stuff a rest until we've finished a complete draft for the basic Delta stuff (and because a lot of people are telling me privately that they are bored with this debate!) If it really does turn out that we need to do something to the basic delta spec to prevent spoofing (and that includes a decision that Dcluster is worth doing in the first place), the IETF process gives us plenty of opportunities to add somet requirements to the basic Delta spec. -Jeff From mogul Fri May 12 17:49:19 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id RAA16730; Fri, 12 May 2000 17:49:19 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200005130049.RAA16730@wera.pa.dec.com> To: http-delta Subject: new draft of Delta encoding spec available (finally) Date: Fri, 12 May 2000 17:49:19 -0700 X-Mts: smtp I've finished another revised draft of the Delta encoding spec. It's available as: ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-delta-04.12may2000.txt Changes in this version: (1) I removed all the stuff about uniqueness scopes, DCluster, and DTemplate support. This is now in a separate document. (2) I added sections 5.5 (Guaranteeing cache safety) and 10.8.2 (IM directive), to resolve the CACHE-SAFETY issue (I hope). (3) The DELTA+IF-RANGE issue turned out to be already covered in the spec, more or less - I just forgot that I had covered it. I did add one related tweak to the specification: If a request includes an A-IM header field that lists the "range" instance-manipulation prior to any delta-coding(s), and the request also includes an If-Range header that lists the entity tag of the current instance, the server SHOULD ignore the delta-coding(s). Otherwise, the meaning of that A-IM header is very hard to define. (4) I added section 10.3 (Basic requirements for delta-encoded responses). I think I've resolved all of the editorial issues, too. There are still a few places that could use some review. For example, are there any other "basic requirements" for section 10.3? Are there any known "security considerations" for the basic delta document (all of the known security issues were related to DCluster/DTemplate)? Otherwise, I *think* this version is ready to go to the IETF for publication as an Internet-Draft. I'd like to do this on or before Thursday, May 18, since I'll be travelling May 19-30, unless we find any significant new issues. -Jeff From mogul Tue May 16 17:08:19 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id RAA09674; Tue, 16 May 2000 17:08:18 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200005170008.RAA09674@wera.pa.dec.com> To: http-delta Subject: Bug & fix: Deltas & content-codings Date: Tue, 16 May 2000 17:08:18 -0700 X-Mts: smtp danielh@crosslink.net found this bug in the Delta spec: 10.7, pg 34 I'm not sure it this makes sense.. 2. If the new (delta) response and the cached response have a different set of content-codings, the client decodes the content-codings from both the delta response and the cached response, before applying the delta. How do you content-decode a "delta response" -- you first have to generate it's instance (which requires differencing against the base instance). In retrospect, I can't imagine what I was thinking when I wrote that. I went through a case analysis, and Daniel and I decided that we need to adopt the principle that a client should never be required to *apply* (as opposed to decode) a content-coding simply to extract a delta-coding. The spec needs to change in two ways: (1) specify some restrictions on what the server can send (to avoid requiring a client to content-encode), and (2) fix the requirements on clients, which now can be a lot simpler. Result: ========= 10.7 Rules for deltas in the presence of content-codings The use of delta encoding with content-encoded instances adds some slight complexity. When a client (perhaps a proxy) has received a delta encoded response, either or both of that new response and a cached previous response may have non-identity content-codings. We specify rules for the server and client, to prevent situations where the client is unable to make sense of the server's response. 10.7.1 Rules for generating deltas in the presence of content-codings When a server generates a delta-encoded response, the list of content-codings the server uses (i.e., the value of the response's Content-Encoding header field) SHOULD be a prefix of the list of content-codings the server would have used had it not generated a delta encoding. This requirement allows a client receiving a delta-encoded response to apply the delta to a cached base instance without having to apply any content-codings during the process (although the client might, of course, be required to decode some content-codings). 10.7.2 Rules for applying deltas in the presence of content-codings When a client receives a delta response with one or more non-identity content codings: 1. If both the new (delta) response and the cached response (instance) have exactly the same set of content-codings, the client applies the delta response to the cached response without removing the content-codings from either response. 2. If the new (delta) response and the cached response have a different set of content-codings, before applying the delta the client decodes one or more content-codings from the cached response, until the result has the same set of content-codings as the delta response. 3. If a proxy or cache is forwarding the result of applying the delta response to a cached base instance response, or later forwards this result from a cache entry, the forwarded response MUST carry the same Content-Encoding header field as the new (delta) response (and so it must be content-encoded as indicated by that header field). The intent of these rules (and in particular, rule #3) is that the results are always consistent with the rule that the entity tag is associated with the result of the content-coding, and that any recipient after the application of the delta-coding receives exactly the same response it would have received as a status-200 response from the origin server (without any delta-coding). ========= -Jeff From danielh@crosslink.net Tue May 16 21:09:10 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id VAA03608; Tue, 16 May 2000 21:09:10 -0700 (PDT) From: Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA23174; Tue, 16 May 2000 21:09:09 -0700 Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id VAA04753 for ; Tue, 16 May 2000 21:09:09 -0700 (PDT) Received: from smtp.crosslink.net (dyn24.c5200-1.springfield.236.crosslink.net [207.199.142.25]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id AAA11210 for ; Wed, 17 May 2000 00:09:06 -0400 Message-Id: <200005170409.AAA11210@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Wed, 17 May 2000 00:03:14 -0300 To: http-delta@pa.dec.com In-Reply-To: <200005170008.RAA09674@wera.pa.dec.com> Subject: Re: Bug & fix: Deltas & content-codings X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.05 c05 Jeff proposed: The spec needs to change in two ways: (1) specify some restrictions on >what the server can send (to avoid requiring a client to content-encode), >and (2) fix the requirements on clients, which now can be a lot simpler. ....... >10.7.1 Rules for generating deltas in the presence of content-codings > When a server generates a delta-encoded response, the list of > content-codings the server uses (i.e., the value of the response's > Content-Encoding header field) SHOULD be a prefix of the list of > content-codings the server would have used had it not generated a > delta encoding. And since any content-encoding will be used only if an appropriate accept-encoding was recieved from the client, the server will know that the client can decode the instance. >.... > 2. If the new (delta) response and the cached response have a > different set of content-codings, before applying the > delta the client decodes one or more content-codings from > the cached response, until the result has the same set of > content-codings as the delta response. This is where the prefix rule comes into play > 3. If a proxy or cache is forwarding the result of applying > the delta response to a cached base instance response, or > later forwards this result from a cache entry, the > forwarded response MUST carry the same Content-Encoding > header field as the new (delta) response (and so it must > be content-encoded as indicated by that header field). > The intent of these rules (and in particular, rule #3) is that the > results are always consistent with the rule that the entity tag is > associated with the result of the content-coding, and that any > recipient after the application of the delta-coding receives exactly > the same response it would have received as a status-200 response > from the origin server (without any delta-coding). Conclusion: looks good! ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul Thu May 18 17:03:33 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id RAA22279; Thu, 18 May 2000 17:03:33 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200005190003.RAA22279@wera.pa.dec.com> To: http-delta Subject: Bug & Fix(??): server's consistent use of IMs Date: Thu, 18 May 2000 17:03:32 -0700 X-Mts: smtp 10.5.3 If a response uses more than one instance-manipulation, the instance-manipulations MUST be applied in the order in which they appear in the A-IM request-header field. I was going to say that we could add a sentence: However the server may choose to use only a subset the listed A-IM manipulations, so long as they are applied in the order listed in the A-IM request header. But is this true -- suppose we have A-IM: diff,gzip,range say, because the client wants just the range of a prior "diff,gzip'ed" response. If the server choosed to use IM: diff,range the result probably is NOT helpful to the client. I'm not sure what this implies; that a trailing range means "don't use range unless you use all the preceding manipulations"???? Upon analysis, I think we've decided that this particular case isn't a disaster. However, during this analysis, we realized that there is a problem if the server isn't consistent about what instance-manipulations it applies prior to computing a delta. Here's a proposed solution (inserted in section 10.5.3 just before the Examples): ===== The server's choice about whether to apply an instance-manipulation SHOULD be independent of its choice to apply any subsequently-applied two-input instance-manipulations, to the response. (Two-input instance-manipulations include delta-codings, because they take two different values as input. Compression and "range" instance-manipulations take only one input. Other instance-manipulations may be defined in the future.) Note: the intent of this requirement is to prevent the server from generating a delta-encoded response that the client can only decode by first applying an instance-manipulation encoding to its cached base instance. A server implementor might wish to consider what the client would logically have in its cache, when deciding which instance-manipulations to apply prior to a delta-coding. ===== Daniel isn't entirely happy with the phrasing, but I needed to put something down in writing and we basically agree on the intent. -Jeff From mogul Thu May 18 17:06:39 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id RAA21050; Thu, 18 May 2000 17:06:39 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200005190006.RAA21050@wera.pa.dec.com> To: http-delta Subject: draft-mogul-http-delta-04.txt submitted to the IETF Date: Thu, 18 May 2000 17:06:39 -0700 X-Mts: smtp Since I'm about to go out of town for a few weeks, and we seem to have reached a relatively stable draft (of course, the last time we had apparent stability, it was an illusion), I've sent the latest draft to the IETF. You can see *approximately* what I sent to the IETF at ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-delta-04.18may2000.txt Comments are welcome, but I won't be able to read them (let alone reply) until about June 1. -Jeff From mogul Wed May 31 13:07:24 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA04111; Wed, 31 May 2000 13:07:24 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200005312007.NAA04111@wera.pa.dec.com> To: http-delta Subject: Latest HTTP Delta draft now available from the IETF Date: Wed, 31 May 2000 13:07:24 -0700 X-Mts: smtp From: Internet-Drafts@ietf.org Message-ID: <200005221036.GAA28644@ietf.org> Subject: I-D ACTION:draft-mogul-http-delta-04.txt Date: Mon, 22 May 2000 06:36:48 -0400 Mime-Version: 1.0 Content-Type: Multipart/Mixed; Boundary="NextPart" To: IETF-Announce@isi.edu --NextPart A New Internet-Draft is available from the on-line Internet-Drafts directories. Title : Delta encoding in HTTP Author(s) : J. Mogul, B. Krishnamurthy, F. Douglis, A. Feldmann, Y. Goland, A. van Hoff, D. Hellerstein Filename : draft-mogul-http-delta-04.txt Pages : 45 Date : 19-May-00 Many HTTP requests cause the retrieval of slightly modified instances of resources for which the client already has a cache entry. Research has shown that such modifying updates are frequent, and that the modifications are typically much smaller than the actual entity. In such cases, HTTP would make more efficient use of network bandwidth if it could transfer a minimal description of the changes, rather than the entire new instance of the resource. This is called 'delta encoding.' This document describes how delta encoding can be supported as a compatible extension to HTTP/1.1. A URL for this Internet-Draft is: http://www.ietf.org/internet-drafts/draft-mogul-http-delta-04.txt Internet-Drafts are also available by anonymous FTP. Login with the username "anonymous" and a password of your e-mail address. After logging in, type "cd internet-drafts" and then "get draft-mogul-http-delta-04.txt". A list of Internet-Drafts directories can be found in http://www.ietf.org/shadow.html or ftp://ftp.ietf.org/ietf/1shadow-sites.txt Internet-Drafts can also be obtained by e-mail. Send a message to: mailserv@ietf.org. In the body type: "FILE /internet-drafts/draft-mogul-http-delta-04.txt". NOTE: The mail server at ietf.org can return the document in MIME-encoded form by using the "mpack" utility. To use this feature, insert the command "ENCODING mime" before the "FILE" command. To decode the response(s), you will need "munpack" or a MIME-compliant mail reader. Different MIME-compliant mail readers exhibit different behavior, especially when dealing with "multipart" MIME messages (i.e. documents which have been split up into multiple messages), so check your local documentation on how to manipulate these messages. Below is the data which will enable a MIME compliant mail reader implementation to automatically retrieve the ASCII version of the Internet-Draft. --NextPart Content-Type: Multipart/Alternative; Boundary="OtherAccess" --OtherAccess Content-Type: Message/External-body; access-type="mail-server"; server="mailserv@ietf.org" Content-Type: text/plain Content-ID: <20000519111728.I-D@ietf.org> ENCODING mime FILE /internet-drafts/draft-mogul-http-delta-04.txt --OtherAccess Content-Type: Message/External-body; name="draft-mogul-http-delta-04.txt"; site="ftp.ietf.org"; access-type="anon-ftp"; directory="internet-drafts" Content-Type: text/plain Content-ID: <20000519111728.I-D@ietf.org> --OtherAccess-- --NextPart-- From mogul Wed Jun 7 13:23:17 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA01729; Wed, 7 Jun 2000 13:23:17 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200006072023.NAA01729@wera.pa.dec.com> To: http-delta Subject: Finally: http-delta mailing list log accessible on the Web Date: Wed, 07 Jun 2000 13:23:17 -0700 X-Mts: smtp I finally got my act together and set up a crude tunnel through our firewall, so you can now read the log of the http-delta@pa.dec.com mailing list at: ftp://ftp.digital.com/pub/DEC/WRL/mogul/http-delta-log.txt This is suboptimal in at least two ways: (1) it's one long flat text file, it's not hypertext or broken down by messages or threads or authors. (2) it's only updated about once per day, so it might be as much as a day behind the actual mailing list. I hope this is sufficient! -Jeff P.S.: Fred Douglis pointed out that the message I sent to the HTTP-WG list about the new draft: http://www.ics.uci.edu/pub/ietf/http/hypermail/2000/0130.html didn't go to the http-delta list. I didn't think that was necessary, since the HTTP-WG message summarizes stuff the rest of you already know, but you might want to look at it. From douglis@research.att.com Wed Jun 7 13:49:58 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA03984; Wed, 7 Jun 2000 13:49:58 -0700 (PDT) Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA31061; Wed, 7 Jun 2000 13:49:57 -0700 Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id NAA03378; Wed, 7 Jun 2000 13:49:57 -0700 (PDT) Received: from alliance.research.att.com (alliance.research.att.com [135.207.26.26]) by mail-blue.research.att.com (Postfix) with ESMTP id B8D6B4CE1C; Wed, 7 Jun 2000 16:49:11 -0400 (EDT) Received: from windsor.research.att.com (windsor.research.att.com [135.207.26.46]) by alliance.research.att.com (8.8.7/8.8.7) with ESMTP id QAA14792; Wed, 7 Jun 2000 16:49:11 -0400 (EDT) Received: from windsor.research.att.com (localhost [127.0.0.1]) by windsor.research.att.com (8.8.8+Sun/8.8.5) with ESMTP id QAA20660; Wed, 7 Jun 2000 16:47:30 -0400 (EDT) Message-Id: <200006072047.QAA20660@windsor.research.att.com> X-Mailer: exmh version 2.1.1 10/15/1999 From: Fred Douglis To: Jeffrey Mogul Cc: http-delta@pa.dec.com Subject: Re: Finally: http-delta mailing list log accessible on the Web In-Reply-To: Your message of "Wed, 07 Jun 2000 13:23:17 PDT." <200006072023.NAA01729@wera.pa.dec.com> X-Uri: http://www.research.att.com/~douglis/ Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 07 Jun 2000 16:47:30 -0400 Sender: douglis@research.att.com > P.S.: Fred Douglis pointed out that the message I sent to the > HTTP-WG list about the new draft: > http://www.ics.uci.edu/pub/ietf/http/hypermail/2000/0130.html > didn't go to the http-delta list. I didn't think that was necessary, > since the HTTP-WG message summarizes stuff the rest of you already > know, but you might want to look at it. Just to be clear for the others: what I found interesting was not the content of the message, which as Jeff says was already known, but rather that the draft had been reannounced to that mailing list. I note that so far, neither Bala's original last call nor Jeff's recent reannouncement has actually generated any discussion in the http-wg list -- I suppose because those who care have moved to the http-delta list. (I subscribe to that list, but I was about a year behind in the messages when I came across the message in question.) Fred From mogul Tue Jun 20 10:56:01 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA23660; Tue, 20 Jun 2000 10:56:01 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200006201756.KAA23660@wera.pa.dec.com> To: http-delta Subject: Issac Goldstand's comments on Delta encoding spec Date: Tue, 20 Jun 2000 10:56:01 -0700 X-Mts: smtp With Isaac's permission, I'm resending this message to the entire mailing list [my apologies to those of you who are now seeing it for the third time :-)]. Isaac is now on the mailing list, as well. I'll send my reply either today, or (more likely) next week, when I get back from USENIX. A suggestion to others on the list: there are a lot of comments in this message; it might be a good idea to discuss them individually, under topic-specific Subject lines, rather than trying to deal with them all in a single message. -Jeff ------- Forwarded Message Date: Tue, 20 Jun 2000 00:15:02 +0300 (IDT) Message-Id: <200006192115.e5JLF2n08056@megila.jct.ac.il> From: Issac Goldstand Reply-To: neoi@writeme.com Organization: Jerusalem College of Technology Firstly, an apology to all those who are either getting the smae thing twice, or who got it BASE64 encoded the first time. I don't know WHY it was BASE64 encoded, but I'm sending it with 8bit encoding now, just to be safe. I only recently found the I-D for delta encoding on the IETF site and wanted to raise a few issues. You'll have to excuse me for being a "newbie" to this area (i.e., involvement in IETF and related activities). I'll try not to broach "internal protocol" too much :-) I will try to keep my comments here brief, but will be more than happy to elaborate should it be requested of me. All comments are regarding document "draft-mogul-http-delta-04.txt" Firstly: regarding section 10.4.1 of the document, you write: "...The response MUST include an Etag header field giving the entity tag of the current instance, and MUST include an IM header field listing the instance manipulations that were applied to the current instance..." Why is this being defined as MUST? Can the server not be allowed to send it back without an entity tag? RFC 2616 section 13.3.4 clearly states "HTTP/1.1 origin servers...SHOULD send an entity tag validator unless it is not feasible to generate one." Therefore, I believe you should be more specific to your reasoning as to why delta encodings suddenly MUST carry them. Secondly: You mention the seeming necessity to add the Delta-Base response tag. This seems to me to be a waste. The IM response header tag, defined in section 10.5.2 seems to be specific to delta encoded responses. Therefore, it would seem to me that rather than adding a Delta-Base response header as defined in section 10.5.1, we could save a few bytes AND registration of a new header by simply adding the Delta-Base as a parameter to the IM response header, as such: IM = "IM" ":" #(instance-manipulation [ ";" "base" "=" entity-tag ]) Since this is specific to the instance, we can possibly define multiple entity-tags for the same recourse in different formats, should it ever be needed (I still have to think a bit about this, so I'm being intentionally vague) in a similar format - particularly by proxies/caches Thirdly: Also regarding the Delta-base header (I will continue to refer to it as such for the sake of simplicity, but still strongly reemphasize what I stated above about adding it as a parameter), you mention in section 10.5.1 A cache or proxy that receives a delta-encoded response that lacks a Delta-base header MAY add a Delta-Base header whose value is the entity tag given in the If-None-Match field of the request (but only if that field lists exactly one entity tag). This seems to me to be lacking forward compatibility. It is the place of the server and _not_ an intermediate proxy to assign entity-tags, and therefore, if the same server which originally generated an entity-tag for the instance (otherwise the instance would not exist within the cache) does NOT supply the entity-tag for the given response, we MUST assume that the server had a specific reason for withholding the entity-tag and therefore, SHOULD NOT (perhaps even MUST NOT) supply the client with one. Fourth: You state the following in section 10.6: "A status-226 cache entry MUST NOT be used in response to a subsequent request ... If any of the instance-manipulation values in the cached IM header field is a delta-coding, the cache entry does not include a Delta-Base header field, and the If-None-Match header field of the request that led to that cache entry does not match the If-None-Match header field of the subsequent request." It seems to me that we should be stopping the cache from responding as soon as EITHER the If-None-Match header fields do not match OR if the Delta-Base response header is missing. The document seems to be requiring the lack of BOTH of these conditions. Additionally, you later state (also section 10.6) "...we know of no formal specification for deciding if a cached status-206 response is consistent with a subsequent request..." yet in section 9 you clearly seem to be making an issue of having digests for both the full data response (i.e., what would come with a 200 reply) and for delta-repsonses (226). Could this kind of digesting not be used to formally match 206 entries? Furthermore, I stated above that it may be beneficial to replace the Delta-base header with base parameters within the IM tag. By implementing numerous tags in a similar fashion we could define an entity-tag that represents the delta between the delta base and the instance itself (i.e. what would be transmitted with a 200 response), and thus be able to cache it properly, and, more importantly, define a _unique_ identifying tag to each delta. Next: The last example in section 10.7.3 (which I will repeat here) says that a request by a client: GET /foo.html HTTP/1.1 Host: example.com If-none-match: "abc" Accept-encoding: gzip A-IM: diffe, gzip might return: HTTP/1.1 226 IM Used Date: Wed, 24 Dec 1997 14:01:00 GMT Delta-base: "abc" Etag: "ghi" IM: diffe, gzip with a body containing GZIP(DIFFE_DELTA(GUNZIP(GZIP(foo.html;"abc")), foo.html;"ghi")) I simply wanted to add that such a request could, and would very likely, return the following response: HTTP/1.1 226 IM Used Date: Wed, 24 Dec 1997 14:01:00 GMT Delta-base: "abc" Etag: "jkl" IM: diffe, gzip Content-Encoding: gzip with a body containing GZIP(DIFFE_DELTA(GZIP(foo.html;"abc"), GZIP(foo.html;"jkl"))) This would be because the server is interpreting the Accept-Encoding and A-IM headers as two distinct requests, so if both can be done, then it probably will. I assume that either I misunderstood something here or that you had some obscure reason for not mentioning this scenario with the examples, because if I _did_ correctly assess this, then it should be in the list of examples, as not to confuse readers into thinking that this would not happen - a situation that would certainly be applicable should the I-D become a RFC at some future point. In section 10.8.1 it is stated that "...By implication, if a client has retrieved and cached several instances of a resource, some of which are marked with ``retain'' and some not, then there is no point in caching the instances not marked with ``retain''." This seems silly, as if the server knows how to use the retain at all, it should know to send a "retain=0" to eliminate caching. It seems to me that a lack of "retain" should just indicate that normal caching rules should apply, rather than imply that the server is hinting that it should NOT be cached. While I'm discussing section 10.8.1, there are another two minor issues that I just wanted to point out: Firstly, you state "A client ought not use the corresponding entity tag in a future request for a delta-encoded response after that interval ends." If you're going to write this in section 10, where you are playing by RFC 2119's rules, you "ought" to define "ought" :-) Secondly, you state a bit later on that "A server SHOULD NOT send ``retain=0'' except in reply to a request that attempts to obtain a delta-encoded response." Here, it seems to me that the server should either NEVER send "retain"s unless an A-IM is present, or send "retain=0" any time it likes - INCLUDING when no A-IM is specified. I personally prefer the second line of thought. My final "formal" comment is on section 10.8.2, and is, I'm afraid, a bit vague at this point. Section 10.8.2 states "A cache that complies with the specification for the IM header, the A-IM header, and the 226 response-status code SHOULD ignore a no-store cache-directive if an im directive is present in the same response. All other implementations MUST ignore the im directive (i.e., MUST observe a no-store directive, if present)." I have not yet worked out case examples, but it seems to me that this may present problems with forward compatibility with HTTP/1.1 protocol extensions. What if another extension is suggested that runs parallel with the IM field and also has special cache control requirements. Might it not be possible that our implementation here will handicap later HTTP extension proposals? I look forward to hearing from some or all of you, and would be most interested in being kept up-to-date on this (And other relevant) issues. Also - one final question: It is painfully obvious that there are some other "supporting" internet-drafts connected with this, and I'd appreciate it if you could send me the names of related drafts so I can more completely familiarize myself with this project. Sincerely, Issac Goldstand - -- Internet is a wonderful mechanism for making a fool of yourself in front of a very large audience. --Anonymous Moving the mouse won't get you into trouble... Clicking it might. --Anonymous ------- End of Forwarded Message From jmacd@helen.CS.Berkeley.EDU Tue Jun 20 17:16:05 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id RAA01263; Tue, 20 Jun 2000 17:16:05 -0700 (PDT) Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA25361; Tue, 20 Jun 2000 17:16:04 -0700 Received: from helen.CS.Berkeley.EDU (helen.CS.Berkeley.EDU [128.32.131.251]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id RAA05708; Tue, 20 Jun 2000 17:16:04 -0700 (PDT) Received: (from jmacd@localhost) by helen.CS.Berkeley.EDU (8.9.1a/8.9.1) id RAA21899; Tue, 20 Jun 2000 17:12:19 -0700 (PDT) Message-Id: <20000620171219.23658@helen.CS.Berkeley.EDU> Date: Tue, 20 Jun 2000 17:12:19 -0700 From: Josh MacDonald To: Jeffrey Mogul , http-delta@pa.dec.com Subject: Delta-compression Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.89.1 This is to announce the release of a new piece of software that the http-delta readers may find interesting. I have finished my Master's thesis on the topic of delta-compressed storage and transport and recently released my software under a BSD-style license. The technical report and software is available at: http://www.cs.berkeley.edu/~jmacd/xdelta.html My software is primarily intended as a replacement for back-end delta- compressed storage using RCS. I show that a transactions can improve the performance, reliability, and extensibility of this kind of application. My software is geared towards embedding in server applications that want delta-compression, but it does not implement a network protocol of its own. While it is not immediately applicable to the delta encoding draft (I have file format issues to sort out-- contact me for details), a prototype HTTP proxy has already been implemented using my system. A paper by Mihut Ionescu and Matthew Delco describing their HTTP proxy is available at: http://www.cs.pdx.edu/~delco/xproxy.ps.gz I would appreciate any feedback you might have regarding this work. -josh From mogul@pa.dec.com Mon Jul 3 16:41:11 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA31776; Mon, 3 Jul 2000 16:41:11 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA25351; Mon, 3 Jul 2000 16:41:10 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA30450; Mon, 3 Jul 2000 16:41:10 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200007032341.QAA30450@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: Issac Goldstand's comments on Delta encoding spec In-Reply-To: Your message of "Tue, 20 Jun 2000 10:56:01 PDT." <200006201756.KAA23660@wera.pa.dec.com> Date: Mon, 03 Jul 2000 16:41:10 -0700 X-Mts: smtp Here are my replies to Issac Goldstand's comments of 20 June 2000. (Sorry for the long delay, but I do have to keep up with my day job.) All comments are regarding document "draft-mogul-http-delta-04.txt" And all are greatly appreciated, even if I think most don't require changes to the draft. Firstly: regarding section 10.4.1 of the document, you write: "...The response MUST include an Etag header field giving the entity tag of the current instance, and MUST include an IM header field listing the instance manipulations that were applied to the current instance..." Why is this being defined as MUST? Can the server not be allowed to send it back without an entity tag? RFC 2616 section 13.3.4 clearly states "HTTP/1.1 origin servers...SHOULD send an entity tag validator unless it is not feasible to generate one." Therefore, I believe you should be more specific to your reasoning as to why delta encodings suddenly MUST carry them. That section of RFC2616 isn't actually the most relevant one. Instead, look at RFC2616 section 13.3.3 (Weak and Strong Validators). Basically, when you're using delta-coding or ranges, at least, you need a strong validator, and there is not much chance of this when using Last-Modified. Otherwise, you can't be sure that you are matching up the response to the proper instance. Having said that, I suppose that there is one exception: if the only instance-manipulation for the message is a compression, and it will never be used with a delta-coding or range, then you don't actually need a strong validator. But I can't think of why you would want to use a compression i-m in this case, instead of a compression content-coding, so I think this is not worth making an exception for. Secondly: You mention the seeming necessity to add the Delta-Base response tag. This seems to me to be a waste. The IM response header tag, defined in section 10.5.2 seems to be specific to delta encoded responses. Therefore, it would seem to me that rather than adding a Delta-Base response header as defined in section 10.5.1, we could save a few bytes AND registration of a new header by simply adding the Delta-Base as a parameter to the IM response header, as such: IM = "IM" ":" #(instance-manipulation [ ";" "base" "=" entity-tag ]) It's true that the Delta-Base header is somewhat of a carryover from an earlier draft of the delta encoding spec (before we had the IM header at all). I don't think there is any real overhead in "registration of a new header" (since we have to register at least a few new headers), and in some ways it's probably easier to parse this than as a parameter of the instance-manipulation. As far as wasting bytes: yes, this would save a few bytes. On the other hand, I'm reluctant to change the spec yet again, especially if there is even a chance that this would trigger some subtle bug that we haven't thought about. I'll take comments on this one; let's see which way the consensus goes. Since this is specific to the instance, we can possibly define multiple entity-tags for the same recourse in different formats, should it ever be needed (I still have to think a bit about this, so I'm being intentionally vague) in a similar format - particularly by proxies/caches I'm not sure I understand what you are proposing here. Thirdly: Also regarding the Delta-base header (I will continue to refer to it as such for the sake of simplicity, but still strongly reemphasize what I stated above about adding it as a parameter), you mention in section 10.5.1 A cache or proxy that receives a delta-encoded response that lacks a Delta-base header MAY add a Delta-Base header whose value is the entity tag given in the If-None-Match field of the request (but only if that field lists exactly one entity tag). This seems to me to be lacking forward compatibility. It is the place of the server and _not_ an intermediate proxy to assign entity-tags, and therefore, if the same server which originally generated an entity-tag for the instance (otherwise the instance would not exist within the cache) does NOT supply the entity-tag for the given response, we MUST assume that the server had a specific reason for withholding the entity-tag and therefore, SHOULD NOT (perhaps even MUST NOT) supply the client with one. The reason that this is allowed is to resolve some issues that were discussed on the mailing list; search for "implict delta-base" in to see what the details were. I can't think of any plausible reason why the server should ever be able to withhold this information, because otherwise it would be impossible to guarantee application of the delta to the right base instance. So there is always a delta-base for a delta-coded response; the only question is whether it is explicit or implicit. The implicit approach saves a more than a few bytes (which seems to concern you; see above!) in a fairly common case. But there are scenarios where a cached delta-coded response cannot be properly returned to a subsequent request without an explicit delta-base - hence we decided to allow a cache to convert the implicit form to an explicit form. Fourth: You state the following in section 10.6: "A status-226 cache entry MUST NOT be used in response to a subsequent request ... If any of the instance-manipulation values in the cached IM header field is a delta-coding, the cache entry does not include a Delta-Base header field, and the If-None-Match header field of the request that led to that cache entry does not match the If-None-Match header field of the subsequent request." It seems to me that we should be stopping the cache from responding as soon as EITHER the If-None-Match header fields do not match OR if the Delta-Base response header is missing. The document seems to be requiring the lack of BOTH of these conditions. This is another piece of the support for implicit delta-base values. If the cache entry does include a delta-base value E, then the entry is suitable for use in reply to a request whose If-None-Match field lists E. It's not necessary for the If-None-Match fields to match exactly, in this case. More concretely, if the original request had If-None-Match: "a", "b", "c" and the cached response had Delta-Base: "c" and the new request had If-None-Match: "c", "d", "e" then the cached response is usable with the request (other conditions being satisfied). We didn't want to require an exact match on the If-None-Match fields in this case, because it greatly reduces the probably of the cache entry being useful. On the other hand, if the entry does NOT include a delta-base value, then it *is* necessary for the If-None-Match fields to match exactly (and, by virtue of other rules, they have to list exactly one entity-tag). Otherwise, you can't tell whether the cache entry is suitable for the new response. The rule is written in terms of "the cache entry" rather than, say, "the response [as sent by the origin server]" in order to allow the implicit delta-base to be made explicit. For example, if the original request had If-None-Match: "c" and the cached response originally had no explicit "Delta-Base" header, and the new request had If-None-Match: "c", "d", "e" then the cache could (by converting the implicit Delta-base to a real one) properly make use of the cache entry. Additionally, you later state (also section 10.6) "...we know of no formal specification for deciding if a cached status-206 response is consistent with a subsequent request..." This should probably read "no existing, published formal specification". I'll make the change. Does that make it clearer? yet in section 9 you clearly seem to be making an issue of having digests for both the full data response (i.e., what would come with a 200 reply) and for delta-repsonses (226). Could this kind of digesting not be used to formally match 206 entries? Probably, but 206, "Range", and "Content-Range" are already in (somewhat) widespread use, and I don't think it's feasible to retroactively tighten up their specification. (I'll take some blame for not having done this during the HTTP/1.1 design process, but the whole "entity" vs. "instance" debacle was not yet clear enough then.) Furthermore, I stated above that it may be beneficial to replace the Delta-base header with base parameters within the IM tag. By implementing numerous tags in a similar fashion we could define an entity-tag that represents the delta between the delta base and the instance itself (i.e. what would be transmitted with a 200 response), and thus be able to cache it properly, and, more importantly, define a _unique_ identifying tag to each delta. I think you're stumbling back into the confusion over what an "entity tag" really is. It's not surprising, given that it actually has nothing to do with the "entity" as the term is defined in RFC2616. This also relates to the confusing debate we had, in the HTTP/1.1 design process, about what a cache actually stores. I think it really only makes sense to talk about the instances (or partial instances) that a cache stores; talking about the entities (or "responses") that a cache stores seems to inevitably lead to ambiguity and confusion. Some day I plan to write a paper about this, since I think it is a fairly difficult but important (and badly misundertood) aspect of HTTP. Next: The last example in section 10.7.3 (which I will repeat here) says that a request by a client: GET /foo.html HTTP/1.1 Host: example.com If-none-match: "abc" Accept-encoding: gzip A-IM: diffe, gzip might return: HTTP/1.1 226 IM Used Date: Wed, 24 Dec 1997 14:01:00 GMT Delta-base: "abc" Etag: "ghi" IM: diffe, gzip with a body containing GZIP(DIFFE_DELTA(GUNZIP(GZIP(foo.html;"abc")), foo.html;"ghi")) I simply wanted to add that such a request could, and would very likely, return the following response: HTTP/1.1 226 IM Used Date: Wed, 24 Dec 1997 14:01:00 GMT Delta-base: "abc" Etag: "jkl" IM: diffe, gzip Content-Encoding: gzip with a body containing GZIP(DIFFE_DELTA(GZIP(foo.html;"abc"), GZIP(foo.html;"jkl"))) I don't think this makes sense, in practice. To quote Daniel Hellerstein, "[the] computation of differences between compressed instances is probably useless, hence the [delta spec] goes to some length to allow efficient use of compression [after] delta coding." While it is perhaps remotely possible that some future delta algorithm could make use of compressed inputs (or maybe not; perhaps there's a good information-theory reason?), I think it would be misleading to provide an example of that. In section 10.8.1 it is stated that "...By implication, if a client has retrieved and cached several instances of a resource, some of which are marked with ``retain'' and some not, then there is no point in caching the instances not marked with ``retain''." This seems silly, as if the server knows how to use the retain at all, it should know to send a "retain=0" to eliminate caching. It seems to me that a lack of "retain" should just indicate that normal caching rules should apply, rather than imply that the server is hinting that it should NOT be cached. Again, this is mostly in the service of avoiding the transmission of excessive bytes. If you read the statement carefully, it's not saying that "lack of retain implies do not cache this response" it is saying that a response lacking "retain" should not be cached *in addition to* (or instead of) a response for the same resource that does have a "retain" directive. If this is the only stored cache entry for the resource, then the normal caching rules *do* apply. Sending "retain=0" in this case seems to be redundant. While I'm discussing section 10.8.1, there are another two minor issues that I just wanted to point out: Firstly, you state "A client ought not use the corresponding entity tag in a future request for a delta-encoded response after that interval ends." If you're going to write this in section 10, where you are playing by RFC 2119's rules, you "ought" to define "ought" :-) Actually, that was intentional - I was trying to avoid adding a formal requirement (and the word "ought" is fairly well-defined in any English dictionary). Our bias in writing the HTTP/1.1 spec was to avoid imposing "normative requirements" (MUST/SHOULD) when they were not required for interoperability or reasonable performance requirements, and I think this is one of those places. Secondly, you state a bit later on that "A server SHOULD NOT send ``retain=0'' except in reply to a request that attempts to obtain a delta-encoded response." Here, it seems to me that the server should either NEVER send "retain"s unless an A-IM is present, or send "retain=0" any time it likes - INCLUDING when no A-IM is specified. I personally prefer the second line of thought. I'd be curious as to your reasoning for the "second line of thought." This "SHOULD NOT" was originally motivated by a desire to avoid sending extra bytes in a context where they would not be useful (because many clients would not ever care about deltas). As to the first, if by "NEVER" you mean "MUST NOT", again I think that's probably not justifiable in this case. My final "formal" comment is on section 10.8.2, and is, I'm afraid, a bit vague at this point. Section 10.8.2 states "A cache that complies with the specification for the IM header, the A-IM header, and the 226 response-status code SHOULD ignore a no-store cache-directive if an im directive is present in the same response. All other implementations MUST ignore the im directive (i.e., MUST observe a no-store directive, if present)." I have not yet worked out case examples, but it seems to me that this may present problems with forward compatibility with HTTP/1.1 protocol extensions. What if another extension is suggested that runs parallel with the IM field and also has special cache control requirements. Might it not be possible that our implementation here will handicap later HTTP extension proposals? Yes, I suppose this is, at least remotely, a potential problem. However, it's probably plausible to expect that if a server generates HTTP 226 IM Used Cache-Control: no-store, im, some-other-extension then that "some-other-extension" had better be compatible with IM field - otherwise it probably shouldn't be included in a 226 response, right? So, in practice, I think this is not a problem except perhaps for extensions to the IM mechanism itself, and (hopefully) the current IM specifications are general enough that this shouldn't be necessary. Also - one final question: It is painfully obvious that there are some other "supporting" internet-drafts connected with this, and I'd appreciate it if you could send me the names of related drafts so I can more completely familiarize myself with this project. Actually, there are no "supporting internet-drafts" except (1) prior versions of draft-mogul-http-delta-*.txt (2) draft-mogul-http-digest-02, as cited. I did strip out, from draft-mogul-http-delta-03.txt, some optional parts of the delta spec ("Clusters" and "Templates") because they were causing a lot of debate on some potential security holes. It's my intention to generate a new I-D that (separately) specifies those extensions, but I don't want to spend any time on that until the basic (and long-overdue) delta spec is mostly done. -Jeff From mogul Thu Jul 6 16:59:59 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA02096; Thu, 6 Jul 2000 16:59:59 -0700 (PDT) Message-Id: <200007062359.QAA02096@wera.pa.dec.com> To: http-delta From: Issac Goldstand Reply-To: neoi@writeme.com Subject: Re: Issac Goldstand's comments on Delta encoding spec Date: Thu, 06 Jul 2000 16:59:59 -0700 Sender: mogul X-Mts: smtp [I'm resending Issac's reply to the list, with some reformatting, on his request. He has had problems sending mail without base64-encodings. -Jeff] OK - here are my replies to Jeff's replies to my original comments (geez this has the potential to get very long :-)) Jeffrey Mogul wrote: > Here are my replies to Issac Goldstand's comments of 20 June 2000. > (Sorry for the long delay, but I do have to keep up with my day job.) > Firstly: regarding section 10.4.1 of the document, you write: > > "...The response MUST include an Etag header field giving the > entity tag of the current instance, and MUST include an IM > header field listing the instance manipulations that were applied > to the current instance..." > > Why is this being defined as MUST? Can the server not be allowed > to send it back without an entity tag? RFC 2616 section 13.3.4 > clearly states "HTTP/1.1 origin servers...SHOULD send an entity tag > validator unless it is not feasible to generate one." > > Therefore, I believe you should be more specific to your reasoning > as to why delta encodings suddenly MUST carry them. > > That section of RFC2616 isn't actually the most relevant one. > Instead, look at RFC2616 section 13.3.3 (Weak and Strong Validators). > Basically, when you're using delta-coding or ranges, at least, > you need a strong validator, and there is not much chance of this > when using Last-Modified. Otherwise, you can't be sure that you > are matching up the response to the proper instance. Just want to make sure I'm understanding correctly: The MUST here is really going on the strong validator. We are saying that since delta-encoding is so utterly dependant on the ETags, then we want the server to constantly provide us with one. But that doesn't appear to be what we're saying here. The I-D states "MUST include an ETag header field". That does not in any way imply whether it must be a strong or weak validator, and as is, leaves one wondering why it MUST be sent. I still think that this should be changed to SHOULD supply an ETag header. Basically, we have three scenarios here: 1) Server replies with strong validator ETag - This is what we want to happen 2) Server replies with weak validator ETag - This is inacceptable, as in this case the ETag is garbage as far as we're concerned, since it doesn't point to a specific instance of the requested resource (that sounds very BAD - mixing instance with entity. I'll address this later) 3) Server replies with no ETag - This, surprisingly enough, is acceptable - even if it's part of a 226 response. The only problem is that we will not be able to use it as a base for future IMs. What the I-D implies, however, is that we are just requiring an ETag. This is actually bad because it implies that acceptable responses are (1) or (2), while the only accpetable responses are (1) and possibly (3). Having explained this, I urge you to reread my original comment in a new light. Basically, I was trying to push you into accepting scenario (3). Of course, now that I've gone into detail, it seems ver obvious that we MUST reword this, as scenario (2) MUST NOT be allowed to occur. > Secondly: You mention the seeming necessity to add the Delta-Base > response tag. This seems to me to be a waste. The IM response > header tag, defined in section 10.5.2 seems to be specific to delta > encoded responses. Therefore, it would seem to me that rather than > adding a Delta-Base response header as defined in section 10.5.1, > we could save a few bytes AND registration of a new header by > simply adding the Delta-Base as a parameter to the IM response > header, as such: > > IM = "IM" ":" #(instance-manipulation [ ";" "base" "=" entity-tag ]) > > Since this is specific to the instance, we can possibly define > multiple entity-tags for the same recourse in different formats, > should it ever be needed (I still have to think a bit about this, > so I'm being intentionally vague) in a similar format - > particularly by proxies/caches > > I'm not sure I understand what you are proposing here. OK. What I was getting at is an "enhanced" version of the IM tag. It would go something like this (Pls forgive me if I make mistakes in this - I'm still pretty new at this) IM = "IM" ":" #(instance-manipulation [";" "base" "=" entity-tag] [";" "ITag" "=" entity-tag]) Before I explain this, I'm gonna go into that instance/entity issue I mentioned earlier: As far as I understand, we have two things: An instance is a description of a "state" of a given resouce. Basically, it's a version of it at a given time. Entity I understand as a referance to the _current_ instance of the resource and exists solely for the duration of the transfer (even though later transfers might still be using the same instance). So basically, we have the ETags, which appear to me to be pointing to instances, rather than "entities". Having stated this, we can look at what I described above. Although I formally specified the header (or tried to, at least :)), I want it to be clear that I'm leaving the parameters loose. What I called an "ITag" there would basically be a pointer to the instance after it's corresponding IM was applied but BEFORE all of the rest. Supplying it would enable browsers and proxies to cache the resource multiple times with different IM tags. Then, if a client later requests a new instance with a "If-non-matches" containing one of thesee "ITag"s, it (web cache/proxy/server/whatever) can calculate the remaining IMs and return it (if it knows that it is still valid - possbily due to the presense of a "retain" directive). In addition to this, the DECODED version of the resource after all IMs have been applied (ie, what would be returned in a 200 response) can be identified via the normal ETag header. Like I said before, though, this is all tentative. Comments are welcome though. > Secondly, you > state a bit later on that "A server SHOULD NOT send ``retain=0'' > except in reply to a request that attempts to obtain a > delta-encoded response." Here, it seems to me that the server > should either NEVER send "retain"s unless an A-IM is present, or > send "retain=0" any time it likes - INCLUDING when no A-IM is > specified. I personally prefer the second line of thought. > > I'd be curious as to your reasoning for the "second line of thought." It allows for a client (and even better - an intermediate proxy) who DOES understand deltas and "retain" to get this additional information even if a delta was not requested. > This "SHOULD NOT" was originally motivated by a desire to avoid > sending extra bytes in a context where they would not be useful > (because many clients would not ever care about deltas). But, like I said above, many proxies might. > As to the first, if by "NEVER" you mean "MUST NOT", again I think > that's probably not justifiable in this case. Assuming that you still don't agree with my above reason, which I think is a worthwhile gamble (although bytes might be wasteed on individual transfers that don't use the retain, the time and bandwidth saved by good intermediate delta-understanding proxies should justify it) I'd like to know why the MUST NOT would not be justifiable. That's pretty much it for now. Issac -- Internet is a wonderful mechanism for making a fool of yourself in front of a very large audience. --Anonymous Moving the mouse won't get you into trouble... Clicking it might. --Anonymous From mogul@pa.dec.com Mon Jul 10 16:01:45 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA29558; Mon, 10 Jul 2000 16:01:45 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA00746; Mon, 10 Jul 2000 16:01:45 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA04232; Mon, 10 Jul 2000 16:01:44 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200007102301.QAA04232@wera.pa.dec.com> To: neoi@writeme.com Cc: http-delta@pa.dec.com Subject: "Entity" is a dirty word [was Re: Issac Goldstand's comments on Delta encoding spec} In-Reply-To: Your message of "Thu, 06 Jul 2000 16:59:59 PDT." <200007062359.QAA02096@wera.pa.dec.com> Date: Mon, 10 Jul 2000 16:01:44 -0700 X-Mts: smtp Issac, I think you (like many others) have been confused by the use of the word "entity" in the HTTP spec. I've written about this before, but I think I need to say something about this problem in general, before addressing your specific comments. Simply put, all uses of the word "entity" in the HTTP specification are bogus. The word should never have been used. It was introduced by well-meaning people who were very wrong about what the definition should be. I fought against this, but lost. The term "entity tag" is also a mistake, because it has ABSOLUTELY NOTHING AT ALL to do with an "entity" (following the HTTP/1.1 definition of the term). I probably should have fought harder against this myself, but at the time I didn't realize how wrong this was. In retrospect, the HTTP/1.1 specification should have used the term "instance tag" and named the header field "ITag". This would have avoided a lot of confusion. So, when you wrote: As far as I understand, we have two things: An instance is a description of a "state" of a given resouce. Basically, it's a version of it at a given time. Entity I understand as a referance to the _current_ instance of the resource and exists solely for the duration of the transfer (even though later transfers might still be using the same instance). So basically, we have the ETags, which appear to me to be pointing to instances, rather than "entities". Having stated this, we can look at what I described above. you were on the right track, but still not quite right. The problem is the specification never clearly defines what a "resource" is - at least, not clearly enough to decide what kind of thing has a "state" at a given time. The word "version" doesn't quite solve the problem, because "resources" can have "variants" which might or might not be time-based. If you are looking for something to describe as "a version of [a resource] at a given time", however, the "variant" is probably the closest we can get. Part of the confusion is whether a content-coding (such as gzip) is intrinsic to the resource variant, or whether it is applied on the fly. Another problem is that it's impossible to define what the "current state" of a resource (or variant) is, because the result of applying a method to a resource isn't just based on time, it could be based on who is making the request and what request headers are provided. In any case, we've chosen the word "instance" to apply to a possibly content-encoded possibly variant of a resource in response to a given request at a particular time. And the protocol makes no sense in certain contexts unless "entity tags" are associated with "instances". So what is an "entity"? As defined in the spec, all it really is a message-in-transit. It can't be defined as *the* "_current_ instance of the resource", because the same instance could be sent in many different entities, as you've noted, and because "current" isn't really meaningful by itself. So, clearly, an "entity tag" has no useful connection with an "entity" in HTTP/1.1. -Jeff From mogul@pa.dec.com Mon Jul 10 18:17:52 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA32615; Mon, 10 Jul 2000 18:17:52 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA01590; Mon, 10 Jul 2000 18:17:52 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA32135; Mon, 10 Jul 2000 18:17:52 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200007110117.SAA32135@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: Issac Goldstand's comments on Delta encoding spec In-Reply-To: Your message of "Thu, 06 Jul 2000 16:59:59 PDT." <200007062359.QAA02096@wera.pa.dec.com> Date: Mon, 10 Jul 2000 18:17:52 -0700 X-Mts: smtp Issac Goldstand wrote: > That section of RFC2616 isn't actually the most relevant one. > Instead, look at RFC2616 section 13.3.3 (Weak and Strong Validators). > Basically, when you're using delta-coding or ranges, at least, > you need a strong validator, and there is not much chance of this > when using Last-Modified. Otherwise, you can't be sure that you > are matching up the response to the proper instance. Just want to make sure I'm understanding correctly: The MUST here is really going on the strong validator. OK, I think I see your point. Basically, we have three scenarios here: 1) Server replies with strong validator ETag - This is what we want to happen 2) Server replies with weak validator ETag - This is inacceptable, as in this case the ETag is garbage as far as we're concerned, since it doesn't point to a specific instance of the requested resource (that sounds very BAD - mixing instance with entity. I'll address this later) 3) Server replies with no ETag - This, surprisingly enough, is acceptable - even if it's part of a 226 response. The only problem is that we will not be able to use it as a base for future IMs. We both agree on #1. Re: #3 - I guess you're right that it's theoretically OK if a delta arrives without an Etag, because that response can be used even if it cannot be (usefully) cached. Re: #2 - if we don't require an entity tag at all, then I don't think we can insist on a strong one. To put this another way - the "Etag" field in a 226 response is not relevant to the immediate use of this response (e.g., delta application to an existing cache entry). It's only relevant to caching of the re-assembled instance. And that's basically independent of whether the instance was completed as the result of a 200 response or a 226 response. Since the spec doesn't require 200 responses to use strong entity tags, I think we can't, either, in this situation. So I accept your correction; the second paragraph of 10.4.1 "226 IM Used" changes from: The request MUST have included an A-IM header field listing at least one instance-manipulation. The response MUST include an Etag header field giving the entity tag of the current instance, and MUST include an IM header field listing the instance-manipulations that were applied to the current instance. To The request MUST have included an A-IM header field listing at least one instance-manipulation. The response MUST include an IM header field listing the instance-manipulations that were applied to the current instance. and just leave the question of including an Etag header field to other parts of the HTTP/1.1 spec (and other parts of the delta spec). > Secondly, you > state a bit later on that "A server SHOULD NOT send ``retain=0'' > except in reply to a request that attempts to obtain a > delta-encoded response." Here, it seems to me that the server > should either NEVER send "retain"s unless an A-IM is present, or > send "retain=0" any time it likes - INCLUDING when no A-IM is > specified. I personally prefer the second line of thought. > > I'd be curious as to your reasoning for the "second line of thought." It allows for a client (and even better - an intermediate proxy) who DOES understand deltas and "retain" to get this additional information even if a delta was not requested. > This "SHOULD NOT" was originally motivated by a desire to avoid > sending extra bytes in a context where they would not be useful > (because many clients would not ever care about deltas). But, like I said above, many proxies might. OK, I don't think it's a big deal, and (as I said) the bias for spec-writers ought to be to leave out requirements if they don't have clear justifications (this is the reason why I am very reluctant to use the keyword "MUST" unless we can point to a specific problem that it solves). So, the last paragraph of section 10.8.1 "Retain directive" changes from: If the retain directive includes a delta-seconds value of zero, a client SHOULD NOT use the corresponding entity tag in a future request for a delta-encoded response. A server SHOULD NOT send ``retain=0'' except in reply to a request that attempts to obtain a delta-encoded response. To If the retain directive includes a delta-seconds value of zero, a client SHOULD NOT use the corresponding entity tag in a future request for a delta-encoded response. Note: We recommend that server implementors consider the bandwidth implications of sending "retain=0" directives to clients or proxies that might not have the ability to make use of it. Any comments? -Jeff From mogul@pa.dec.com Mon Jul 10 18:35:28 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA29782; Mon, 10 Jul 2000 18:35:28 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA12078; Mon, 10 Jul 2000 18:35:28 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA02063; Mon, 10 Jul 2000 18:35:28 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200007110135.SAA02063@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Issac's "enhanced" version of the IM header In-Reply-To: Your message of "Thu, 06 Jul 2000 16:59:59 PDT." <200007062359.QAA02096@wera.pa.dec.com> Date: Mon, 10 Jul 2000 18:35:28 -0700 X-Mts: smtp Issac Goldstand wrote: OK. What I was getting at is an "enhanced" version of the IM tag. It would go something like this (Pls forgive me if I make mistakes in this - I'm still pretty new at this) IM = "IM" ":" #(instance-manipulation [";" "base" "=" entity-tag] [";" "ITag" "=" entity-tag]) [... stuff elided by Jeff ...] What I called an "ITag" there would basically be a pointer to the instance after it's corresponding IM was applied but BEFORE all of the rest. Supplying it would enable browsers and proxies to cache the resource multiple times with different IM tags. Then, if a client later requests a new instance with a "If-non-matches" containing one of thesee "ITag"s, it (web cache/proxy/server/whatever) can calculate the remaining IMs and return it (if it knows that it is still valid - possbily due to the presense of a "retain" directive). In addition to this, the DECODED version of the resource after all IMs have been applied (ie, what would be returned in a 200 response) can be identified via the normal ETag header. Like I said before, though, this is all tentative. Comments are welcome though. I can see what you are getting at. However, I think this is something that (1) at the very least can be postponed to be treated as a future extension of the current spec, and (2) perhaps isn't actually necessary. Reasoning behind (1): (a) We're already way too late on the basic delta spec. (b) I would rather not add something that changes the definition of "instance" and hence the interpretation of entity tags without a lot of thought; we have gotten into trouble over this subject area before! (c) I think this is something that, if it proves to be useful, could probably be specific as an extension (and an implementation-optional feature) and so does not need to be in the basic delta spec. Reasoning behind (2): Your goal here, if I understand it correctly, is to allow a cache to take a cache entry and use it in a reply with a different set of instance-manipulations than it had when "it" arrived. I put "it" in quotes because I think this is one of those places where we need to be more rigorous about saying what we mean. There has been some debate over the years about exactly what an HTTP cache stores. Does it store "resources" or "responses" or "objects" or "messages" or what? I think it's probably most accurate to think of a cache as storing "instances" or "partial instances". If you agree with that conceptualization, then what is stored in a cache (in abstract terms) never actually has an instance manipulation. I.e., the cache doesn't store a "delta response", it stores the result of applying the delta to a previous cache entry. So instead of thinking of how to allow a cache to identify a cache entry with some level of instance manipulations already included, we should simply think of a cache as FIRST trying to find the right *instance* in its cache to use in a response, and THEN it might optionally apply some instance manipulations, if it has enough information to do that locally. As a performance optimization, the cache could keep a "hidden cache" of instances with pre-applied instance manipulations. But this kind of optimization does not need be (nor should be) visible to any client or server (and hence not one that needs to be covered in the spec), except that we might possibly want to warn implementors that any such hidden caches cannot become inconsistent with the true (spec-visible) cache entries. If you can come up with a specific, concrete example (that is, showing all of the headers in the sequence of requests and responses) that cannot be solved this way, then it is worth thinking harder about the extension you are proposing. But I don't think we actually need it. -Jeff From mogul Tue Jul 11 16:59:49 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA09964; Tue, 11 Jul 2000 16:59:48 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200007112359.QAA09964@wera.pa.dec.com> To: http-delta Subject: Remaining issue - replace Delta-Base header with IM param? Date: Tue, 11 Jul 2000 16:59:48 -0700 X-Mts: smtp I would like to submit a revised Internet-Draft (draft -05) before the Friday deadline (the IETF stops accepting new I-Ds prior to an IETF meeting). I think we are almost at closure on the basic delta-encoding specification. There is one remaining issue, which is Issac Goldstand's suggestion that we replace the Delta-Base header with a parameter carried in the IM header. For example, this: HTTP/1.1 226 IM Used Date: Wed, 24 Dec 1997 14:01:00 GMT Etag: "def" Delta-base: "abc" Content-encoding: gzip IM: vcdiff would become HTTP/1.1 226 IM Used Date: Wed, 24 Dec 1997 14:01:00 GMT Etag: "def" Content-encoding: gzip IM: vcdiff;base="abc" I haven't received much feedback about this (I think that Daniel Hellerstein is in favor of the change). On the one hand, it would regularize the protocol specification somewhat. On the other hand, I'm not sure this buys us very much. For example, if we do go ahead with the DCluster and/or DTemplate headers, these seem to actually require the use of separate headers - they might be sent on responses that do not have IM headers. So I think there is no clear benefit to handling the delta-base value either as a header or as a parameter, aside from saving perhaps a few bytes in the 226 response. With that in mind, my inclination is to leave the protocol spec as it is, since we have already reviewed it for some time. I'm nervous about making a change at this point whose consequences we might not entirely understand. If anyone has strong arguments in favor of making a change, please speak up ASAP. Thanks -Jeff From mogul Thu Jul 13 11:49:32 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA28560; Thu, 13 Jul 2000 11:49:32 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200007131849.LAA28560@wera.pa.dec.com> To: http-delta Subject: Last chance to review draft-mogul-http-delta-05.txt pre-publication Date: Thu, 13 Jul 2000 11:49:32 -0700 X-Mts: smtp I've posted the latest version of the Delta Encoding spec at: ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-delta-05.11july2000.txt I plan to submit this to the IETF before noon tomorrow (they stop accepting I-Ds several weeks before their meeting), so if anyone has any last-minute comments, please tell me today! This version has some minor changes (mostly the result of Isaac Goldstand's comments, some typos were found by others). It preserves the Delta-Base: header (at least, for now). My hope is that we can proceed to a "Last Call" on the HTTP-WG mailing list, and then ask the IESG to bless this as a Proposed Standard. I will then get to work on the Cluster/Template extensions, I promise :-) Thanks, -Jeff From mogul Thu Jul 20 08:52:45 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id IAA18836; Thu, 20 Jul 2000 08:52:45 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200007201552.IAA18836@wera.pa.dec.com> To: http-delta Subject: draft-mogul-http-delta-05.txt now available from IETF Date: Thu, 20 Jul 2000 08:52:45 -0700 X-Mts: smtp It took the I-D editor a few days to push this through their backlog: From: Internet-Drafts@ietf.org Message-ID: <200007201044.GAA10458@ietf.org> Subject: I-D ACTION:draft-mogul-http-delta-05.txt Date: Thu, 20 Jul 2000 06:44:08 -0400 A New Internet-Draft is available from the on-line Internet-Drafts directories. Title : Delta encoding in HTTP Author(s) : J. Mogul, B. Krishnamurthy, F. Douglis, A. Feldmann, Y. Goland, A. van Hoff, D. Hellerstein Filename : draft-mogul-http-delta-05.txt Pages : 45 Date : 19-Jul-00 Many HTTP requests cause the retrieval of slightly modified instances of resources for which the client already has a cache entry. Research has shown that such modifying updates are frequent, and that the modifications are typically much smaller than the actual entity. In such cases, HTTP would make more efficient use of network bandwidth if it could transfer a minimal description of the changes, rather than the entire new instance of the resource. This is called 'delta encoding.' This document describes how delta encoding can be supported as a compatible extension to HTTP/1.1. A URL for this Internet-Draft is: http://www.ietf.org/internet-drafts/draft-mogul-http-delta-05.txt -Jeff P.S.: Not to be confused with the far more interesting "Delta 5" http://www.hiljaiset.sci.fi/punknet/delta5_e.htm From mogul Thu Jul 20 08:59:10 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id IAA23804; Thu, 20 Jul 2000 08:59:10 -0700 (PDT) Message-Id: <200007201559.IAA23804@wera.pa.dec.com> To: http-delta Orig-Date: Sun, 16 Jul 2000 17:10:43 -0500 From: Mike Dahlin Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] Date: Thu, 20 Jul 2000 08:59:10 -0700 Sender: mogul X-Mts: smtp [Re-sent by Jeff Mogul with Mike's permission, with slight editing.] As for the draft, it looks good. There were a couple points where I got a bit bogged down/confused. Details below. -mike Comments on draft-mogul-http-delta-04.txt 10.5.2 (p 29) The case when multiple ranges are returned (Content-type: multipart/byteranges) might be worth describing in a bit more detail. In particular, it may not be immediately obvious how the case of IM: range, vcdiff would be decoded if there are multiple ranges (how does the decoder know the length of the pieces to know when one range stops and the next begins after a vcdiff encoding?) The answer (I presume) is that the contents are encoded as multipart/byteranges as per RFC2046, so the decoder doesn't need to know the encoded lengths of the pieces to know when one stops and the next begins -- the decoder uses the delimiters specified in the header instead. It might be helpful to see examples of how the Range:, Content-length:, and RFC2046 delimiters fit together for IM: range,vcdiff (and maybe IM: vcdiff,range) cases with multiple ranges returned. 10.5.3 (p30) "If a request includes an A-IM header field that lists the 'range' instance-manipulation prior to any delta-coding(s), and the request also includes an If-Range: header that lists the entity tag of the current instance, the server SHOULD ignore the delta-codings." At first glance, the need for the rule is not obvious. It seems like a server could interpret GET /foo.html HTTP/1.1 host: bar.example.net If-None-Match: "A" If-Range: "B" A-IM: range, vcdiff Range: bytes=900- As meaning "if B is still the current version, send me bytes 900-... of B and you may vcdiff (with bytes 900-... of A) it if you like." This seems sensible. e.g., following the example in section 5.7 if Tcur = "A" -> server replies with 304 (not modified) if Tcur = "B" --> server replies with 266 (im used) + an "IM: range,vcdiff" response header, and a message body including the vcdiff(A[900-], B[900-]); {If the server doesn't understand IM, the right thing still happens, I think} {The server, of course, is still welcome to ignore the vcdiff specification and just send the raw range} if Tcur = "C" --> send VCDIFF(A, C) Is the intention to avoid the need to have delta encoding algorithms be defined for encoding ranges of files against each other? I don't understand why it is particularly more demanding to expect clients to be able to run VUNDIF(SELECT(A, 900, END), VDIFF(SELECT(A, 900, END), SELECT(B, 900, END))) than to run VUNDIF(A, VDIFF(A, B)) But I could easily not be noticing a corner case. If there are particular codes for which this is a problem, it seems like this restriction should be part of the specification of the code not part of the specification for all codes. So it seems like this rule should be relaxed. Whether or not the rule is relaxed, it might be good to discuss this case in section 5.7 before this rule appears in 10.5.2 10.5.2 (p31-32) I don't understand the last full paragraph on p31 ("The server's choice about whether...") and the Note: spanning p31-32. My best guess is that this is an explanation of the rule from the previous paragraph (that I discussed above). Even if so, I still don't understand the argument. 10.5.2 (p 31) Typo "subsequentLY-APPLIED" 10.6 (p33) I don't understand the need for MUST-NOT rule 3 ("If the cache implementation is not aware of, or is not at least conditionally...") My understanding of the specification is that baseInstance * IM-Pipeline = currentInstance (Where "* IM-Pipeline" means apply the functions listed in IM: header in the order listed). E.g., the actual functions implemented by the IM-Pipeline can be opaque to the cache implementation. The specification says that the element in the cache must be tagged by baseInstance and by IM-Pipeline and that it will not return the element unless both match what the client is able to accept. I don't understand why the cache would need to understand a new encoding as long as it has the ability to only supply the new encoding to clients that understand it. It seems like the matching rules give it that ability. It seems like it would be desirable to allow caches to treat new algorithms as opaque functions but to still allow caching of the output of those functions. At the bottom of the subsection there is a note "Rule 3 allows for extending the set of instance-manipulations without causing deployed cache implementations to commit errors", but I should think that rules 1 and 2 suffice to prevent errors. (Is the reason for rule 3 is to help support "new instance manipulations [that] may include additional caching rules to improve cache-hit rates in cognizant implementations" as per the Note? I don't see the need, but perhaps you have a particular optimization in mind?) 10.6 (p. 34) Separating the Note about MUST-NOT rule 3 from the MUST-NOT list with another numbered list (range conditions list) is confusing. From mogul@pa.dec.com Tue Aug 1 18:41:16 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA04716; Tue, 1 Aug 2000 18:41:15 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA18359; Tue, 1 Aug 2000 18:41:15 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA06662; Tue, 1 Aug 2000 18:41:15 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200008020141.SAA06662@wera.pa.dec.com> To: Mike Dahlin Cc: http-delta@pa.dec.com Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] In-Reply-To: Your message of "Thu, 20 Jul 2000 08:59:10 PDT." <200007201559.IAA23804@wera.pa.dec.com> Date: Tue, 01 Aug 2000 18:41:15 -0700 X-Mts: smtp Comments on draft-mogul-http-delta-04.txt 10.5.2 (p 29) The case when multiple ranges are returned (Content-type: multipart/byteranges) might be worth describing in a bit more detail. In particular, it may not be immediately obvious how the case of IM: range, vcdiff would be decoded if there are multiple ranges (how does the decoder know the length of the pieces to know when one range stops and the next begins after a vcdiff encoding?) The answer (I presume) is that the contents are encoded as multipart/byteranges as per RFC2046, so the decoder doesn't need to know the encoded lengths of the pieces to know when one stops and the next begins -- the decoder uses the delimiters specified in the header instead. Doesn't section 10.10 (Delta encoding and multipart/byteranges) already cover this issue? I.e., it has this paragraph: When a multipart/byteranges response uses a delta-coding after a range selection, the A-IM and IM header fields list the delta-coding after the "range" literal. (Recall that this is the approach taken to obtain an updated version just of selected sections of an instance.) The server first selects the specified ranges from the current instance, and also selects the same specified ranges from the base instance. (Some of these selected ranges might be the empty sequence, if the instance is not long enough.) The server then generates the individual differences (deltas) between the pairs of ranges, and transmits each such difference in a part of the multipart/byteranges media type. Perhaps it would be sufficient to add a cross-reference from section 10.5.2 to 10.10. It might be helpful to see examples of how the Range:, Content-length:, and RFC2046 delimiters fit together for IM: range,vcdiff (and maybe IM: vcdiff,range) cases with multiple ranges returned. If the text in 10.10 doesn't seem clear enough, I could add that, but it would be a fairly lengthy example. I'll try to get to your other comments tomorrow ... -Jeff From mogul@pa.dec.com Wed Aug 2 13:17:36 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA31621; Wed, 2 Aug 2000 13:17:36 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA27740; Wed, 2 Aug 2000 13:17:36 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA02332; Wed, 2 Aug 2000 13:17:35 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200008022017.NAA02332@wera.pa.dec.com> To: Mike Dahlin Cc: http-delta@pa.dec.com Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] In-Reply-To: Your message of "Thu, 20 Jul 2000 08:59:10 PDT." <200007201559.IAA23804@wera.pa.dec.com> Date: Wed, 02 Aug 2000 13:17:35 -0700 X-Mts: smtp Going through your comments one by one ... 10.5.3 (p30) "If a request includes an A-IM header field that lists the 'range' instance-manipulation prior to any delta-coding(s), and the request also includes an If-Range: header that lists the entity tag of the current instance, the server SHOULD ignore the delta-codings." At first glance, the need for the rule is not obvious. It seems like a server could interpret GET /foo.html HTTP/1.1 host: bar.example.net If-None-Match: "A" If-Range: "B" A-IM: range, vcdiff Range: bytes=900- As meaning "if B is still the current version, send me bytes 900-... of B and you may vcdiff (with bytes 900-... of A) it if you like." This seems sensible. e.g., following the example in section 5.7 if Tcur = "A" -> server replies with 304 (not modified) if Tcur = "B" --> server replies with 266 (im used) + an "IM: range,vcdiff" response header, and a message body including the vcdiff(A[900-], B[900-]); {If the server doesn't understand IM, the right thing still happens, I think} {The server, of course, is still welcome to ignore the vcdiff specification and just send the raw range} if Tcur = "C" --> send VCDIFF(A, C) Is the intention to avoid the need to have delta encoding algorithms be defined for encoding ranges of files against each other? I don't understand why it is particularly more demanding to expect clients to be able to run VUNDIF(SELECT(A, 900, END), VDIFF(SELECT(A, 900, END), SELECT(B, 900, END))) than to run VUNDIF(A, VDIFF(A, B)) But I could easily not be noticing a corner case. If there are particular codes for which this is a problem, it seems like this restriction should be part of the specification of the code not part of the specification for all codes. So it seems like this rule should be relaxed. Whether or not the rule is relaxed, it might be good to discuss this case in section 5.7 before this rule appears in 10.5.2 In reviewing the mailing list log: ftp://ftp.digital.com/pub/DEC/WRL/mogul/http-delta-log.txt I found some discussion on this point, but I think we may have been using faulty logic. Your example seems to show a valid interpretation of the A-IM header. I'm inclined to remove this restriction from 10.5.2. Perhaps we need a paragraph in 5.7 explaining how to interpret your example. -Jeff From mogul@pa.dec.com Wed Aug 2 15:18:47 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA04547; Wed, 2 Aug 2000 15:18:47 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA24625; Wed, 2 Aug 2000 15:18:47 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA12036; Wed, 2 Aug 2000 15:18:46 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200008022218.PAA12036@wera.pa.dec.com> To: Mike Dahlin Cc: http-delta@pa.dec.com Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] In-Reply-To: Your message of "Thu, 20 Jul 2000 08:59:10 PDT." <200007201559.IAA23804@wera.pa.dec.com> Date: Wed, 02 Aug 2000 15:18:46 -0700 X-Mts: smtp 10.5.2 (p31-32) I don't understand the last full paragraph on p31 ("The server's choice about whether...") and the Note: spanning p31-32. My best guess is that this is an explanation of the rule from the previous paragraph (that I discussed above). Even if so, I still don't understand the argument. This was also discussed on the mailing list; the Note was supposed to have captured the intention. Perhaps a clearer phrasing of the Note would be: Note: the intent of this requirement is to prevent the server from generating a delta-encoded response that the client can only decode by first applying an instance-manipulation encoding to its cached base instance. One cannot assume that a client willing to decode a given instance-manipulation format (as indicated by its A-IM request header) is also able encode into that format. A server implementor might wish to consider what the client would logically have in its cache, when deciding which instance-manipulations to apply prior to a delta-coding. Consider this case: suppose that the client sends an initial request: GET /foo.html HTTP/1.1 Host: example.com and gets back HTTP/1.1 200 OK Etag: "A" Then the client sends another request GET /foo.html HTTP/1.1 Host: example.com If-None-Match: "A" A-IM: diffe,gzip,vcdiff One plausible interpretation of this A-IM is that the client is willing to accept a response with either: Etag: "B" IM: diffe, gzip or Etag: "B" IM: vcdiff but this one: Etag: "B" IM: gzip,vcdiff would require the client to compute GZIP(A) before it could decode the delta. This violates the rule you're referring to: The server's choice about whether to apply an instance-manipulation SHOULD be independent of its choice to apply any subsequent two-input instance-manipulations, to the response. because it didn't apply gzip to /foo.html except in the case where it subsequently applied vcdiff. Going through my logs of private email with Daniel Hellerstein, we worked through a number of corner cases before coming up with this phrasing of the spec. I can probably reconstruct the entire argument, but it would take more time than I have before my next meeting :-) -Jeff From douglis@research.att.com Wed Aug 2 17:31:51 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id RAA09139; Wed, 2 Aug 2000 17:31:51 -0700 (PDT) Received: from zmamail01.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA19970; Wed, 2 Aug 2000 17:31:50 -0700 Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345) id 78E94275A; Wed, 2 Aug 2000 20:31:50 -0400 (EDT) Received: from mail-green.research.att.com (H-135-207-30-103.research.att.com [135.207.30.103]) by zmamail01.zma.compaq.com (Postfix) with ESMTP id 67BA32556 for ; Wed, 2 Aug 2000 20:31:50 -0400 (EDT) Received: from alliance.research.att.com (alliance.research.att.com [135.207.26.26]) by mail-green.research.att.com (Postfix) with ESMTP id 1C3E01E037; Wed, 2 Aug 2000 20:31:50 -0400 (EDT) Received: from windsor.research.att.com (windsor.research.att.com [135.207.26.46]) by alliance.research.att.com (8.8.7/8.8.7) with ESMTP id UAA28455; Wed, 2 Aug 2000 20:31:48 -0400 (EDT) Received: from windsor.research.att.com (localhost [127.0.0.1]) by windsor.research.att.com (8.8.8+Sun/8.8.5) with ESMTP id UAA15961; Wed, 2 Aug 2000 20:31:47 -0400 (EDT) Message-Id: <200008030031.UAA15961@windsor.research.att.com> From: Fred Douglis To: Jim Whitehead Cc: ietf-dav-versioning@w3.org, http-delta@pa.dec.com Subject: Re: Delta Encoding in HTTP In-Reply-To: Your message of "Fri, 21 Jul 2000 10:28:35 PDT." X-Uri: http://www.research.att.com/~douglis/ Date: Wed, 02 Aug 2000 20:31:47 -0400 Sender: douglis@research.att.com >This is a feature that would need to interact well with DeltaV versioning >services. I fully agree, per my comments at the meeting today. I think the ability to use stored versions as known base versons for deltas would be a big plus. I'm copying the delta-encoding mailing list; hopefully there can be some cross-fertilization here. Fred From mogul@pa.dec.com Wed Aug 2 19:04:22 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id TAA18570; Wed, 2 Aug 2000 19:04:22 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA20943; Wed, 2 Aug 2000 19:04:22 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id TAA18089; Wed, 2 Aug 2000 19:04:21 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200008030204.TAA18089@wera.pa.dec.com> To: Mike Dahlin Cc: http-delta@pa.dec.com Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] In-Reply-To: Your message of "Thu, 20 Jul 2000 08:59:10 PDT." <200007201559.IAA23804@wera.pa.dec.com> Date: Wed, 02 Aug 2000 19:04:21 -0700 X-Mts: smtp 10.6 (p33) I don't understand the need for MUST-NOT rule 3 ("If the cache implementation is not aware of, or is not at least conditionally...") My understanding of the specification is that baseInstance * IM-Pipeline = currentInstance (Where "* IM-Pipeline" means apply the functions listed in IM: header in the order listed). E.g., the actual functions implemented by the IM-Pipeline can be opaque to the cache implementation. The specification says that the element in the cache must be tagged by baseInstance and by IM-Pipeline and that it will not return the element unless both match what the client is able to accept. I don't understand why the cache would need to understand a new encoding as long as it has the ability to only supply the new encoding to clients that understand it. It seems like the matching rules give it that ability. It seems like it would be desirable to allow caches to treat new algorithms as opaque functions but to still allow caching of the output of those functions. At the bottom of the subsection there is a note "Rule 3 allows for extending the set of instance-manipulations without causing deployed cache implementations to commit errors", but I should think that rules 1 and 2 suffice to prevent errors. (Is the reason for rule 3 is to help support "new instance manipulations [that] may include additional caching rules to improve cache-hit rates in cognizant implementations" as per the Note? I don't see the need, but perhaps you have a particular optimization in mind?) Consider what would have happened without rule #3 if we had hypothetically standardized on the IM/A-IM mechanism before delta encodings had been invented. I.e., it was introduced to deal with compression and ranges, and caches had been deployed that followed rules #1 and #2 (matching simply the IM and A-IM headers), but not rule #3. Then we introduce new instance manipulations for delta-encoding. That would have made it impossible to later add rules #4 and #5 (which are specific to deltas and the "Delta-Base" header). If you can think of a way to replace rules #4 and #5 with more generic rules that accomplish the same thing (allowing caching of deltas without allowing incorrect caching), and if you could argue that these generic rules would work for all yet-to-be-conceived instance manipulations, then we could drop rule #3. Otherwise, I think it's a conservative approach that does allow future extensions to do aggressive caching. You also comment: 10.6 (p. 34) Separating the Note about MUST-NOT rule 3 from the MUST-NOT list with another numbered list (range conditions list) is confusing. Hmm. I'll try to figure out a way to clarify this. (Maybe this rule, in particular, needs a name.) -Jeff From dahlin@cs.utexas.edu Tue Aug 8 09:42:40 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id JAA23774; Tue, 8 Aug 2000 09:42:40 -0700 (PDT) Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA18221; Tue, 8 Aug 2000 09:42:39 -0700 Received: from mail.cs.utexas.edu (mail.cs.utexas.edu [128.83.139.10]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id JAA17352; Tue, 8 Aug 2000 09:42:39 -0700 (PDT) Received: from cs.utexas.edu (vaio.csres.utexas.edu [128.83.141.4]) by mail.cs.utexas.edu (8.9.3/8.9.3) with ESMTP id LAA23478; Tue, 8 Aug 2000 11:38:53 -0500 (CDT) Message-Id: <39903866.70383CAC@cs.utexas.edu> Date: Tue, 08 Aug 2000 11:42:14 -0500 From: Mike Dahlin X-Mailer: Mozilla 4.74 [en] (Win98; U) X-Accept-Language: en Mime-Version: 1.0 To: Jeffrey Mogul Cc: http-delta@pa.dec.com Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] 10.5.3 p30 References: <200008022017.NAA02332@wera.pa.dec.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit > > > > > > > > > > > > > 10.5.3 (p30) > "If a request includes an A-IM header field that lists the 'range' > instance-manipulation prior to any delta-coding(s), and the request > also includes an If-Range: header that lists the entity tag of the > current instance, the server SHOULD ignore the delta-codings." > > At first glance, the need for the rule is not obvious. > > ... long example and discussion omitted ... > > I'm inclined to remove this restriction from 10.5.2. Perhaps > we need a paragraph in 5.7 explaining how to interpret your > example. > Yes. Given that the other case is discussed in so much detail, an example of this case using the same structure would make a lot of sense. -mike From dahlin@cs.utexas.edu Tue Aug 8 09:42:44 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id JAA01897; Tue, 8 Aug 2000 09:42:44 -0700 (PDT) Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA28114; Tue, 8 Aug 2000 09:42:44 -0700 Received: from mail.cs.utexas.edu (mail.cs.utexas.edu [128.83.139.10]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id JAA09400; Tue, 8 Aug 2000 09:42:43 -0700 (PDT) Received: from cs.utexas.edu (vaio.csres.utexas.edu [128.83.141.4]) by mail.cs.utexas.edu (8.9.3/8.9.3) with ESMTP id LAA23490; Tue, 8 Aug 2000 11:38:58 -0500 (CDT) Message-Id: <3990394C.5B5DA2E9@cs.utexas.edu> Date: Tue, 08 Aug 2000 11:46:04 -0500 From: Mike Dahlin X-Mailer: Mozilla 4.74 [en] (Win98; U) X-Accept-Language: en Mime-Version: 1.0 To: Jeffrey Mogul Cc: http-delta@pa.dec.com Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] 10.6 p33-34 References: <200008030204.TAA18089@wera.pa.dec.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit > 10.6 (p33) > > I don't understand the need for MUST-NOT rule 3 ("If the cache > implementation is not aware of, or is not at least > conditionally...") > ... > .... Otherwise, I think it's a conservative > approach that does allow future extensions to do aggressive > caching. > Right. Conservative engineering good. The only alternative I can see is to create a type system (which you are on the verge of with "one-input" and "two-input" manipulations) and to tag each manipulation with its type (then the rule becomes "if I see I type I don't understand,...".) I doubt there is much enthusiasm for this. Giving up a bit of performance as new extensions are deployed for simplicity seems reasonable (unless people have a sense that there are dozens of people itching to add new extensions for current types...). > You also comment: > 10.6 (p. 34) > Separating the Note about MUST-NOT rule 3 from the MUST-NOT list with > another numbered list (range conditions list) is confusing. > > Hmm. I'll try to figure out a way to clarify this. (Maybe > this rule, in particular, needs a name.) > I think you could just move the note directly below rule 3 (and indent it as a continuation of rule 3. e.g., 1. lsjfdlsj 2. sldjfljdf 3. If the cache implementation is not aware of, or is not at least conditionally compliant with... Note: Rule #3 allows for.. 4. slkjdflsjf -mike From dahlin@cs.utexas.edu Tue Aug 8 09:42:41 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id JAA12586; Tue, 8 Aug 2000 09:42:41 -0700 (PDT) Received: from mail1.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA28192; Tue, 8 Aug 2000 09:42:41 -0700 Received: from mail.cs.utexas.edu (mail.cs.utexas.edu [128.83.139.10]) by mail1.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id JAA08604; Tue, 8 Aug 2000 09:42:40 -0700 (PDT) Received: from cs.utexas.edu (vaio.csres.utexas.edu [128.83.141.4]) by mail.cs.utexas.edu (8.9.3/8.9.3) with ESMTP id LAA23482; Tue, 8 Aug 2000 11:38:54 -0500 (CDT) Message-Id: <39903898.B1F7CB65@cs.utexas.edu> Date: Tue, 08 Aug 2000 11:43:04 -0500 From: Mike Dahlin X-Mailer: Mozilla 4.74 [en] (Win98; U) X-Accept-Language: en Mime-Version: 1.0 To: Jeffrey Mogul Cc: http-delta@pa.dec.com Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] 10.5.2 p29 References: <200008020141.SAA06662@wera.pa.dec.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit > > > Comments on draft-mogul-http-delta-04.txt > > 10.5.2 (p 29) > ... > > In particular, it may not be immediately obvious how the case of > > IM: range, vcdiff > > would be decoded if there are multiple ranges (how does the decoder > know the length of the pieces to know when one range stops and the > next begins after a vcdiff encoding?) > > ... > > Perhaps it would be sufficient to add a cross-reference from > section 10.5.2 to 10.10. After reading it and getting confused and unconfused a few times, I would suggest the following small addition that would have, I think, clarify things for me a lot. (Hopefully, I ended in the unconfused state and this actually makes sense.) Current (10.5.2, p 30): As a special case, if the instance-manipulations include both range selection and at least one other non-identity instance-manipulation, the IM header field MUST be used to indicate the order in which all of these instance-manipulations, including range selection, were applied. If the IM header lists the "range" instance-manipulation, the response MUST include either a Content-Range header or a multipart/byteranges Content-Type. Proposed: As a special case, ... the response MUST include either a Content-Range header or a multipart/byteranges Content-Type in which each part contains a Content-Range header. This requirement is implied by appendix 19.2 of RFC2616 and by the statement from 2616 "When a client requests multiple byte-ranges in one request, the server SHOULD return them in the order that they appeared in the request." But making it explicit here I think would have helped me see what was going on. I think the original wording may have confused me because it seemed to imply that putting Content-Range for the pieces was not a requirement for a multipart/byteranges reply. > > It might be helpful to see examples of how the Range:, > Content-length:, and RFC2046 delimiters fit together for IM: > range,vcdiff (and maybe IM: vcdiff,range) cases with multiple ranges > returned. > > If the text in 10.10 doesn't seem clear enough, I could add > that, but it would be a fairly lengthy example. > This was a wording/clarity issue only, and I suspect that a long example would be as likely to confuse as illuminate... -mike From dahlin@cs.utexas.edu Tue Aug 8 09:42:43 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id JAA02871; Tue, 8 Aug 2000 09:42:43 -0700 (PDT) Received: from mail2.digital.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA30200; Tue, 8 Aug 2000 09:42:42 -0700 Received: from mail.cs.utexas.edu (mail.cs.utexas.edu [128.83.139.10]) by mail2.digital.com (8.9.2/8.9.3/WV2.0h) with ESMTP id JAA12144; Tue, 8 Aug 2000 09:42:42 -0700 (PDT) Received: from cs.utexas.edu (vaio.csres.utexas.edu [128.83.141.4]) by mail.cs.utexas.edu (8.9.3/8.9.3) with ESMTP id LAA23486; Tue, 8 Aug 2000 11:38:56 -0500 (CDT) Message-Id: <399038FF.69D6E08A@cs.utexas.edu> Date: Tue, 08 Aug 2000 11:44:47 -0500 From: Mike Dahlin X-Mailer: Mozilla 4.74 [en] (Win98; U) X-Accept-Language: en Mime-Version: 1.0 To: Jeffrey Mogul Cc: http-delta@pa.dec.com Subject: Re: delta encoding [Comments on draft-mogul-http-delta-04.txt] 10.5.2 p31-32 References: <200008022218.PAA12036@wera.pa.dec.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit > > 10.5.2 (p31-32) > I don't understand the last full paragraph on p31 ("The server's > choice about whether...") and the Note: spanning p31-32. > > ... > > This was also discussed on the mailing list; the Note was > supposed to have captured the intention. Perhaps a clearer > phrasing of the Note would be: > > Note: the intent of this requirement is to prevent the server from > generating a delta-encoded response that the client can only decode > by first applying an instance-manipulation encoding to its cached > base instance. One cannot assume that a client willing to decode a > given instance-manipulation format (as indicated by its A-IM > request header) is also able encode into that format. A server > implementor might wish to consider what the client would logically > have in its cache, when deciding which instance-manipulations to > apply prior to a delta-coding. > That helps me understand the Note. > > Going through my logs of private email with Daniel Hellerstein, > we worked through a number of corner cases before coming up > with this phrasing of the spec. I can probably reconstruct > the entire argument, but it would take more time than I have > before my next meeting :-) > Understood. But the wording ("Server's choice...SHOULD be independent...")still confuses me, I'm afraid. (Since I jumped in late, I don't want to make you reconstruct a settled argument. But if I were implementing the system, I might still be confused about what requirement I'm supposed to meet here.) Is this rule trying to restrict what a server can/should send a client? It seems like that is the intent, but I'm not sure I would interpret the words that way. The notion "choose independently" doesn't seem to restrict what choices are legal. This wording seems to imply that a server MAY choose to do a 1-instance manipulation before a 2-instance manipulation or MAY choose not to (since it makes its choices independently, either possibility could arise, right?). So this requirement doesn't seem to change what could go over the wire and what the client has to be ready to accept. (So, we're basically back to asking the server to "do something sensible"?) In Jeff's example, > > ...client sends... > GET /foo.html HTTP/1.1 > Host: example.com > If-None-Match: "A" > A-IM: diffe,gzip,vcdiff > ... > One Plausible interpretation...client is willing to accept > either: > Etag: "B" > IM: diffe,gzip > or > Etag: "B" > IM: vcdiff > but this one: > Etag: "B" > IM: gzip,vcdiff > would require the client to computer GZIP(A) before it could decode > the delta. This violates the rule you're referring to: > The server's choice about whether to apply an instance > manipulation SHOULD be independent of its choice to apply any > subsequent two-input instance-manipulations, to the response. > because it didn't apply gzip to /foo.html except in the case where it > subsequently applied vcdiff I don't see how this violates the rule as stated. The server can and does choose to apply gzip in cases when it doesn't apply vcdiff ("IM: gzip", "IM: diffe, gzip"). The "problem" example above seems exactly to be a case of the server making its choice to apply gzip *independently* from its later choice to apply vcdiff. I tried to work through what I now understand this rule to be trying to say (based on the Note more than the rule) and ended up with a fairly long example of (maybe) a way to say it with a different rule. Since I have missed the earlier discussions that created "this phrasing of the spec", this could well end up being more useful as an illustration of what exactly I missed in understanding the current spec (so that you guys can figure out the sentence that needs to be added to fix it) than as a replacement. Again: I'm not even sure I understand the intent of the original rule, so I could be WAY off base. I would suggest first reading this at a high-level with the hope of getting an "Ah, I see where he got confused and I see what we can add to the current paragraph to fix it" rather than reading this at a detailed level and generating a critique of whether it covers all of the corner cases or not. (But on the other hand, if it does make sense on a first pass, I'm happy to help talk through the details.) OK. Really venturing to where I shouldn't be going without having been in on the earlier discussion ... I don't suppose you could get the same effect by pushing the onus back on the client? E.g., if a client says If-None-Match: "A" A-IM: OI1,TI,OI2 (Where "OIx" represents a "one-input" manipulation and TI represents a "two-input" manipulation.) The client implies that it has cached or can generate A OI1(A) Since the server can send back any of B OI2(B) TI(A, B) --> client needs A OI1(B) OI2(TI(A,B)) --> client needs A TI(OI1(A), OI1(B)) --> cient needs OI1(A) OI2(TI(OI1(A), OI1(B))) --> client needs OI2(A) Suppose the client doesn't want to be required to generate OI1(A). It could have sent A-IM: TI,OI1,OI2 Which removes XX TI(OI1(A), OI1(B)) XX OI2(TI(OI1(A), OI1(B))) as legal replies but adds OI1(TI(A, B)) OI2(OI1(TI(A, B))) The rule would be something like "A server MAY choose to apply any subset of instance manipulations specified by the client {choose to apply them independently?} but MUST apply them in the order listed. A client SHOULD NOT {MUST NOT?} list a 'one-input' manipulation before a 'two-input' manipulation unless the client is prepared to provide as input to that two input manipulation the result of the one-input manipulation operating on any base instance listed in the If-None-Match header. E.g., if a client lists instance 'A' in an If-None-Match header and lists one-input manipulation OI before two-input manipulation TI in the A-IM header, the client implies that it can provide either A or OI(A) as input to TI from its cache or by applying OI to a cached instance of A." Following this rule, in Jeff's example, the client should have said: GET /foo.html HTTP/1.1 Host: example.com If-None-Match: "A" A-IM: diffe,vcdiff,gzip or said A-IM: vcdiff,diffe,gzip if it doesn't want to be on the hook for generating gzip(A) Looking at the mailing list discussion (Thu May 18 17:03:33 2000), Jeff writes: %> ... %> I was going to say that we could add a sentence: %> However the server may choose to use only a subset the listed A-IM %> manipulations, so long as they are applied in the order listed in %> the A-IM request header. %> %> But is this true -- suppose we have %> A-IM: diff,gzip,range %> say, because the client wants just the range of a prior "diff,gzip'ed" %> response. If the server choosed to use %> IM: diff,range %> the result probably is NOT helpful to the client. %> %> I'm not sure what this implies; that a trailing range means "don't use %> range unless you use all the preceding manipulations"???? %> %> Upon analysis, I think we've decided that this particular case %> isn't a disaster. However, during this analysis, we realized %> that there is a problem if the server isn't consistent about %> what instance-manipulations it applies prior to computing a delta. %> %> Here's a proposed solution (inserted in section 10.5.3 just %> before the Examples): ... {The current wording of the rule} Hm. This case is a problem. My proposed solution will break (but I don't see that the original rule made it clear how to fix this either.) It seems like the straw man idea in Jeff's May e-mail was right. A range is a peculiar beast since it specifies start and end coordinates that only make sense for a particular encoding of the data. I don't see that anything weaker than that works. I would be inclined to use a slightly more general rule: a range IM (trailing or not) means "don't use range unless you use all the preceding manipulations". "If a server applies a range manipulation, it MUST also apply all manipulations listed before the range manipulation in the client's A-IM header." This allows: A-IM: diff,range,gzip --> "IM: diff,range", "IM: diff,range,gzip", "IM: diff,gzip", "IM: gzip", "IM: diff" A-IM: diff,gzip,range --> "IM: diff,gzip,range", "IM: diff,gzip", "IM: diff", "IM: gzip" All of which are reasonably useful to the client The downside is that a client cannot say A-IM: vcdiff,diff,gzip,range Since the server would have to apply both vcdiff and diff which makes no sense. So, if a client says "range", it gives up some flexibilty on what else it can say. Reasonable compromise? "A client SHOULD NOT list two two-input manipulations before a range manipulation in an A-IM header. A server receiving such a header SHOULD ignore the range manipulation." (Corner case: a client could still say A-IM: diff,gzip,range,vcdiff as long as it can handle either gzip(A) or A as an input to vcdiff). -mike From mogul Tue Aug 15 16:20:04 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA15885; Tue, 15 Aug 2000 16:20:04 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200008152320.QAA15885@wera.pa.dec.com> To: http-delta Subject: ordering of range and delta-codings when If-Range is used In-reply-to: Your message of "Wed, 02 Aug 2000 13:17:35 PDT." <200008022017.NAA02332@wera.pa.dec.com> Date: Tue, 15 Aug 2000 16:20:04 -0700 X-Mts: smtp Mike Dahlin wrote: 10.5.3 (p30) "If a request includes an A-IM header field that lists the 'range' instance-manipulation prior to any delta-coding(s), and the request also includes an If-Range: header that lists the entity tag of the current instance, the server SHOULD ignore the delta-codings." At first glance, the need for the rule is not obvious. It seems like a server could interpret GET /foo.html HTTP/1.1 host: bar.example.net If-None-Match: "A" If-Range: "B" A-IM: range, vcdiff Range: bytes=900- As meaning "if B is still the current version, send me bytes 900-... of B and you may vcdiff (with bytes 900-... of A) it if you like." This seems sensible. e.g., following the example in section 5.7 if Tcur = "A" -> server replies with 304 (not modified) if Tcur = "B" --> server replies with 266 (im used) + an "IM: range,vcdiff" response header, and a message body including the vcdiff(A[900-], B[900-]); {If the server doesn't understand IM, the right thing still happens, I think} {The server, of course, is still welcome to ignore the vcdiff specification and just send the raw range} if Tcur = "C" --> send VCDIFF(A, C) I wrote: I'm inclined to remove this restriction from 10.5.2. That should have been "10.5.3" - it's now removed. Perhaps we need a paragraph in 5.7 explaining how to interpret your example. I came up with the following: On the other hand, suppose that the client has a cache entry for the "A" instance of http://bar.example.net/foo.html, and it has already received the first 900 bytes of a new instance "B" (perhaps as the result of an aborted transfer). Now the client wants to receive the entire current instance, so it could send this request: GET /foo.html HTTP/1.1 host: bar.example.net If-None-Match: "A" If-Range: "B" A-IM: range,vcdiff Range: bytes=900- In this example, as in the previous example, if Tcur = "A" then the server should send 304 (Not Modified), and if Tcur = "C", then the server should send the entire new instance, either as a 200 response or as a delta-encoding against instance "A". However, if Tcur = "B", in this case the server should first select the specified range (bytes 900 through the end) from both instances "A" and "B", then compute the delta encoding between these ranges (using vcdiff), and then transmit the result using a 226 (IM Used) response with an "IM:range,vcdiff" response header. I think this might be a bit contrived, but I suppose it's now a valid interpretation of the spec for a client to send this request, so it's probably worth explaining how the server responds to it. This leaves us in the situation where a client that has received a truncated response can try to fill in the gap using either A-IM: range,vcdiff or A-IM: vcdiff,range and it's not clear how to advise a client implementor which approach is "better" (if one is a better approach in general). But we probably don't want to get into that rat-hole, especially lacking any experience that people would in fact do either in practice :-) -Jeff From mogul Tue Aug 15 16:24:10 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA14739; Tue, 15 Aug 2000 16:24:10 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200008152324.QAA14739@wera.pa.dec.com> To: http-delta Subject: Responses to Mike Dahlin's comments Date: Tue, 15 Aug 2000 16:24:10 -0700 X-Mts: smtp As a result of Mike Dahlin's comments, I have made the following changes (mostly in the presentation, not the actual spec.): Moved a Note in section 10.6 to make it clear what it applies to. Added another example in 5.7 for a combination of range and delta. Added some clarification in section 10.5.2. Removed (section 10.5.3) a restriction on the ordering of "range" and delta-codings in the A-IM header. I'm still wrestling with Mike's lengthy comments regarding two-input instance manipulations. I know of no other pending issues, and so if I can figure out how to deal with the "two-input" issue (or if I decide to give up on that for now), I'm about ready to issue a draft-06 version of the delta spec - hopefully, this one can be used for a Last Call and then submitted to the IESG as a Proposed Standard. -Jeff From mogul Thu Aug 17 16:56:00 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA11565; Thu, 17 Aug 2000 16:56:00 -0700 (PDT) Message-Id: <200008172356.QAA11565@wera.pa.dec.com> To: http-delta From: Issac Goldstand Reply-To: neoi@writeme.com Organization: Jerusalem College of Technology X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.12-20 i686) X-Accept-Language: en Mime-Version: 1.0 Subject: Re: ordering of range and delta-codings when If-Range is used References: <200008152320.QAA15885@wera.pa.dec.com> Content-Type: text/plain; charset=iso-8859-7 Content-Transfer-Encoding: 7bit Date: Thu, 17 Aug 2000 16:56:00 -0700 Sender: mogul X-Mts: smtp [Retransmitted by Jeff on Issac's request] Jeffrey Mogul wrote: > Mike Dahlin wrote: > > I came up with the following: > > On the other hand, suppose that the client has a cache entry for the > "A" instance of http://bar.example.net/foo.html, and it has already > received the first 900 bytes of a new instance "B" (perhaps as the > result of an aborted transfer). Now the client wants to receive the > entire current instance, so it could send this request: > > GET /foo.html HTTP/1.1 > host: bar.example.net > If-None-Match: "A" > If-Range: "B" > A-IM: range,vcdiff > Range: bytes=900- > > In this example, as in the previous example, if Tcur = "A" then the > server should send 304 (Not Modified), and if Tcur = "C", then the > server should send the entire new instance, either as a 200 response > or as a delta-encoding against instance "A". > > However, if Tcur = "B", in this case the server should first select > the specified range (bytes 900 through the end) from both instances > "A" and "B", then compute the delta encoding between these ranges > (using vcdiff), and then transmit the result using a 226 (IM Used) > response with an "IM:range,vcdiff" response header. That sounds like you're saying that the server should get the range of the original data ("A" or "B"), select range 900- from both and calculate the delta and return it. However, with deltas it would be extremely difficult for the client to decode the delta and know exactly how much of the decoded data it was missing. In the case above, if TCur="B", the way you wrote it above should actually not do what you wrote in that explanitary paragraph, but rather should run vdiff("A","B") take range 900- of the output of THAT and return a 226. Issac -- Internet is a wonderful mechanism for making a fool of yourself in front of a very large audience. --Anonymous Moving the mouse won't get you into trouble... Clicking it might. --Anonymous From mogul@pa.dec.com Thu Aug 17 18:26:45 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA11641; Thu, 17 Aug 2000 18:26:45 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA03258; Thu, 17 Aug 2000 18:26:45 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA12964; Thu, 17 Aug 2000 18:26:45 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200008180126.SAA12964@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: ordering of range and delta-codings when If-Range is used In-Reply-To: Your message of "Thu, 17 Aug 2000 16:56:00 PDT." <200008172356.QAA11565@wera.pa.dec.com> Date: Thu, 17 Aug 2000 18:26:45 -0700 X-Mts: smtp Issac Goldstand writes: > On the other hand, suppose that the client has a cache entry for the > "A" instance of http://bar.example.net/foo.html, and it has already > received the first 900 bytes of a new instance "B" (perhaps as the > result of an aborted transfer). Now the client wants to receive the > entire current instance, so it could send this request: > > GET /foo.html HTTP/1.1 > host: bar.example.net > If-None-Match: "A" > If-Range: "B" > A-IM: range,vcdiff > Range: bytes=900- > > In this example, as in the previous example, if Tcur = "A" then the > server should send 304 (Not Modified), and if Tcur = "C", then the > server should send the entire new instance, either as a 200 response > or as a delta-encoding against instance "A". > > However, if Tcur = "B", in this case the server should first select > the specified range (bytes 900 through the end) from both instances > "A" and "B", then compute the delta encoding between these ranges > (using vcdiff), and then transmit the result using a 226 (IM Used) > response with an "IM:range,vcdiff" response header. That sounds like you're saying that the server should get the range of the original data ("A" or "B"), select range 900- from both and calculate the delta and return it. However, with deltas it would be extremely difficult for the client to decode the delta and know exactly how much of the decoded data it was missing. In the case above, if TCur="B", the way you wrote it above should actually not do what you wrote in that explanitary paragraph, but rather should run vdiff("A","B") take range 900- of the output of THAT and return a 226. No, your description of how it should actually work describes the previous example, where the A-IM header is A-IM: vcdiff, range Remember, the server isn't allowed to do things except in the order specified by the A-IM header! I would agree that the example that Mike Dahlin proposed (with "A-IM: range,vcdiff") might be somewhat unlikely, but I think you're missing the point. The client already knows how much of instance "B" it has (presumably, it has at least the first 900 bytes of "B"), and it presumably has all of "A", so it shouldn't be any hard to apply a delta to a suffix of "A" than to the whole "A" instance. In summary, a client that needs to recover from a truncated response and wants to use deltas and ranges does have a choice between A-IM: vcdiff, range and A-IM: range, vcdiff Once the *client* has made that choice, the *server* cannot violate it. The server *can* choose to apply none of those ims, either one of them, or both, but if it does apply both, it must do so in the client's chosen order. -Jeff From mogul Thu Aug 17 19:16:49 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id TAA10669; Thu, 17 Aug 2000 19:16:48 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200008180216.TAA10669@wera.pa.dec.com> To: http-delta Subject: two-input instance-manipulations In-reply-to: Your message of "Tue, 08 Aug 2000 11:44:47 CDT." <399038FF.69D6E08A@cs.utexas.edu> Date: Thu, 17 Aug 2000 19:16:48 -0700 X-Mts: smtp Mike Dahlin writes, regarding this part of the spec (in section 10.5.3): The server's choice about whether to apply an instance-manipulation SHOULD be independent of its choice to apply any subsequent two-input instance-manipulations to the response. (Two-input instance-manipulations include delta-codings, because they take two different values as input. Compression and "range" instance-manipulations take only one input. Other instance-manipulations may be defined in the future.) Note: the intent of this requirement is to prevent the server from generating a delta-encoded response that the client can only decode by first applying an instance-manipulation encoding to its cached base instance. A server implementor might wish to consider what the client would logically have in its cache, when deciding which instance-manipulations to apply prior to a delta-coding. Mike writes: Is this rule trying to restrict what a server can/should send a client? It seems like that is the intent, but I'm not sure I would interpret the words that way. The notion "choose independently" doesn't seem to restrict what choices are legal. This wording seems to imply that a server MAY choose to do a 1-instance manipulation before a 2-instance manipulation or MAY choose not to (since it makes its choices independently, either possibility could arise, right?). So this requirement doesn't seem to change what could go over the wire and what the client has to be ready to accept. (So, we're basically back to asking the server to "do something sensible"?) Perhaps the word "independently" is too fuzzy. How about if I reword it as: The server SHOULD NOT apply a sequence of instance manipulations IM(1), IM(2), ..., IM(n) in a response if this sequence would require the client to encode its cache copy of a base instance using IM(j) before it could decode the server's subsequent application of IM(k). In particular, if the server would not have applied IM(j) without applying IM(k), and if IM(k) is a two-input instance manipulation, then the server SHOULD NOT apply IM(j) followed (whether immediately or not) by IM(k). (Two-input instance manipulations include delta-codings, because they take two different values as input. Compression and "range" instance manipulations take only one input. Other instance manipulations may be defined in the future.) Does that make sense? The sentences are a little complex, but I think they are parsable :-). After a lot of examples, Mike also writes: Since the server would have to apply both vcdiff and diff which makes no sense. So, if a client says "range", it gives up some flexibilty on what else it can say. Reasonable compromise? "A client SHOULD NOT list two two-input manipulations before a range manipulation in an A-IM header. A server receiving such a header SHOULD ignore the range manipulation." (Corner case: a client could still say A-IM: diff,gzip,range,vcdiff as long as it can handle either gzip(A) or A as an input to vcdiff). I don't think this is sufficient to solve the problem we were concerned with. Consider this series of requests and responses (which might not be the most uncontrived example, but I'm in a hurry to get home for dinner): Client sends GET /foo.html HTTP/1.1 host: example.com A-IM: gzip Server replies HTTP/1.1 200 OK ETag: "A" Date: Tue, 25 Nov 1997 18:30:05 GMT I.e., the server decided for some reason not to use compression as an instance manipulation. Then the client wants to update its cache entry: GET /foo.html HTTP/1.1 If-None-Match: "A" host: example.com A-IM: diffe,gzip,vcdiff The server could reply HTTP/1.1 226 IM Used ETag: "B" Delta-base: "A" Date: Tue, 25 Nov 1997 18:30:05 GMT IM: diffe, gzip which would be OK. but suppose the server instead sends: HTTP/1.1 226 IM Used ETag: "B" Delta-base: "A" Date: Tue, 25 Nov 1997 18:30:05 GMT IM: gzip,vcdiff Now the client needs to generate GZIP(A) before it can decode the body of the response. I suppose we might be able to work out some language that forces the client in this example to send A-IM: vcdiff,diffe,gzip since, although this allows the nonsensical (vcdiff+gzip) combination, this is one that the client really ought to be able to decode. Let me think about that. -Jeff From mogul@pa.dec.com Fri Aug 18 11:44:23 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA06874; Fri, 18 Aug 2000 11:44:22 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA27658; Fri, 18 Aug 2000 11:44:22 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA05270; Fri, 18 Aug 2000 11:44:22 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200008181844.LAA05270@wera.pa.dec.com> To: http-delta@pa.dec.com Subject: Re: two-input instance-manipulations In-Reply-To: Your message of "Thu, 17 Aug 2000 19:16:48 PDT." <200008180216.TAA10669@wera.pa.dec.com> Date: Fri, 18 Aug 2000 11:44:22 -0700 X-Mts: smtp Last night, I wrote: I suppose we might be able to work out some language that forces the client in this example to send A-IM: vcdiff,diffe,gzip since, although this allows the nonsensical (vcdiff+gzip) combination, this is one that the client really ought to be able to decode. Let me think about that. As soon as I sent that and left work, I realized that a generalization of this is probably the right approach - that is, remove the requirement that the server figure out what is useful to the client, and a requirement on the client (implementor) to send an A-IM header that cannot lead to a bad result. That means (1) REMOVE this from 10.5.3 (A-IM): The server's choice about whether to apply an instance-manipulation SHOULD be independent of its choice to apply any subsequent two-input instance-manipulations to the response. (Two-input instance-manipulations include delta-codings, because they take two different values as input. Compression and "range" instance-manipulations take only one input. Other instance-manipulations may be defined in the future.) along with the note that follows it. (2) ADD something like this A client SHOULD NOT, in its A-IM header field, list a sequence of instance manipulations such that it would be unable to decode the result of any order-preserving sub-sequence of that sequence. Note: the intent of this requirement is to allow the server to apply any sequence of instance manipulations consistent with the A-IM header, without thereby sending a message that the client would be unable to decode. For example, if the client sends "A-IM: gzip, vcdiff" but does not currently have a compressed copy of the base instance in its cache, and is not able to apply the gzip algorithm to its cached base instance, then the server's choice to compress the inputs to vcdiff would result in a response the client could not decode. Warning: not all implementations of algorithms such as gzip will produce identical output for a given input, so even a client implementation equipped with a gzip encoder (for example) might not be able to exactly duplicate the server's gzip-encoding of an instance. Thanks to Clifford Heath for pointing out that unfortunate property of algorithms such as gzip. I think this formulation of the requirement (putting the onus on the client to ask only for things that it can understand) simplifies the specification considerably. (We no longer have to explictly discuss "two-input" instance manipulations, or treat "range" specially.) On the other hand, it does lead to some potentially interesting implementation issues for the client - if it sends a long list of IMs in its A-IM header, there are lots of possible sub-sequences to worry about. Presumably, the implementation does not have to enumerate and test the entire set of possible sub-sequences, but I haven't come up with a simple decision algorithm. Thanks to Mike Dahlin for prodding me in this direction. And if anyone can find a counter-example (i.e., some problem that this new rule doesn't solve), please speak up ASAP. -Jeff From mogul Sat Aug 19 10:28:56 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA13966; Sat, 19 Aug 2000 10:28:56 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200008191728.KAA13966@wera.pa.dec.com> To: http-delta Subject: First draft of I-D on Clusters/Templates Date: Sat, 19 Aug 2000 10:28:56 -0700 X-Mts: smtp I've finished a first draft of an Internet-Draft on "HTTP Delta Clusters and Templates", available for now as: ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-dcluster-00.18aug2000.txt Please review it ASAP if you have an interest; I'd like to submit it to the IETF by the 25th, before I leave for SIGCOMM. Places where I'm sure we need attention are marked with "XXX". Thanks -Jeff From mogul Mon Aug 21 14:28:28 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA31928; Mon, 21 Aug 2000 14:28:28 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200008212128.OAA31928@wera.pa.dec.com> To: http-delta Subject: Another URL for preview draft of Clusters/Templates document Date: Mon, 21 Aug 2000 14:28:28 -0700 X-Mts: smtp Some people have had trouble with the ftp: URL I gave out the other day. It works for me, but instead please try: http://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-dcluster-00.18aug2000.txt -Jeff From koen@win.tue.nl Wed Aug 23 23:52:53 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id XAA11858; Wed, 23 Aug 2000 23:52:53 -0700 (PDT) Received: from ztxmail01.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA13246; Wed, 23 Aug 2000 23:52:52 -0700 Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345) id 70346245F; Thu, 24 Aug 2000 01:52:52 -0500 (CDT) Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157]) by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id 5CA772523; Thu, 24 Aug 2000 01:52:51 -0500 (CDT) Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3) id IAA04763. Thu, 24 Aug 2000 08:52:49 +0200 (MET DST) From: koen@win.tue.nl (Koen Holtman) Message-Id: <200008240652.IAA04763@wsooti09.win.tue.nl> Subject: Re: First draft of I-D on Clusters/Templates In-Reply-To: <200008191728.KAA13966@wera.pa.dec.com> from Jeffrey Mogul at "Aug 19, 2000 10:28:56 am" To: mogul@pa.dec.com (Jeffrey Mogul) Date: Thu, 24 Aug 2000 08:52:49 +0200 (MET DST) Cc: http-delta@pa.dec.com X-Mailer: ELM [version 2.4ME+ PL43 (25)] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit >I've finished a first draft of an Internet-Draft on >"HTTP Delta Clusters and Templates", available for now as: > >ftp://ftp.digital.com/pub/DEC/WRL/mogul/draft-mogul-http-dcluster-00.18aug2000.txt > >Please review it ASAP if you have an interest; I'd like >to submit it to the IETF by the 25th, before I leave for >SIGCOMM. > >Places where I'm sure we need attention are marked with "XXX". Hi Jeff, Because of time constraints on both sides I only did a very quick scan of this draft. Preliminary conclusion: I did not find any internal inconsistencies but (and this should not surprise you, given the discussion in we had April) the anti-spoofing requirements/mechanisms described in this new draft are again much too weak for my taste. The new (?) "hostport" spoofing detection mechanisms you describe rely on having a trust relation that spans all content under a single "hostport". Such a relation won't always exist, as you mention too in the 'security considerations', so these "hostport" mechanisms are not strong enough for me. Concerning this part of the draft: # Therefore, a client MUST NOT use condition # #3 above (DCluster of a prior response for X includes prefix of # Request-URI) unless it can securely verify that a resulting delta is # not spoofed. I can't see right now if excluding condition 3 alone above gives a watertight guarantee that further spoofing verification is not needed -- is is possible that some interpretation of rule 4 also create a leak. I'm not sure, I need to stare at this more. That is all I have for now. I suggest you go ahead and submit the current draft as a formal Internet-Draft. I hope we'll see some feedback on this list that indicates whether or not I am alone in wanting stronger anti-spoofing measures. > >Thanks >-Jeff Koen. From danielh@crosslink.net Thu Aug 24 09:45:16 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id JAA13696; Thu, 24 Aug 2000 09:45:15 -0700 (PDT) From: Received: from zmamail01.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA10711; Thu, 24 Aug 2000 09:45:15 -0700 Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345) id A182D228F; Thu, 24 Aug 2000 12:45:14 -0400 (EDT) Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by zmamail01.zma.compaq.com (Postfix) with ESMTP id 4FC9420C7 for ; Thu, 24 Aug 2000 12:45:14 -0400 (EDT) Received: from danielh (z_a082.ers.usda.gov [151.121.64.82] (may be forged)) by lycanthrope.crosslink.net (8.9.3/) with SMTP id MAA20241 for ; Thu, 24 Aug 2000 12:45:13 -0400 Message-Id: <200008241645.MAA20241@lycanthrope.crosslink.net> X-Really-To: Date: Thu, 24 Aug 2000 12:14:02 -0300 To: http-delta@pa.dec.com Subject: On assigning cached responses to a dcluster. X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.10a c10 This somewhat pedantic outline is meant to clarify how one might implement Dcluster and DTemplate. It is meant as an experiment that might highlight possible misunderstandings. Personally, I advocate including something like this outline in the rfc, but I won't insist on it. ------------------------------------------------------------ On assigning cached responses to a dcluster. In my reading of draft-mogul-http-dcluster-00.txt, the means by which items are associated with a Dcluster remained vague. I think an outline of a possible client-side algorithim might be useful. Hence, consider the following: 1) The client maintains a "DCluster-table". Each entry in this table is initialized by a DCluster response header, and uses a "Dcluster-prefix" as an identifier. 2) Each entry in this table contains a list of cached responses that may be used as a delta base. Each cached response consists of the body of a response (or a pointer to a cache containing the body of a response), and it's etag. For example: A request: GET /foo?p=1 HTTP/1.1 Host: bar.example.net yielding: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "abc" DCluster: "//bar.example.net/foo?" Upon reciept of this response, the client could create a new Dcluster-table entry that would be: a) identified using a Dcluster-prefix ("//bar.example.net/foo?") and would contain: b) the response's content (or a pointer to a cached version of this content) c) the etag ("abc") 3) For all future requests, the request-uri is compared against the Dcluster-prefixs in the Dcluster-table. If a Dcluster-prefix matches (that is, is a prefix match of) the request-uri, then the client may use include the entry's etag(s) in an If-None-Match: request header (implying that the client can use the appropriate content to delta-decode a response from the server). The next point complicates the story. 4) Upon reciept of all requests, including requests that do NOT contain a Dcluster response header, the client may: a) Compare the request-uri to each entry in the Dcluster-table. b) If a Dcluster-prefix matches the request-uri, then the content, and etag, of the current response are added to the list of cached responses. Note that it is possible (though perhaps inefficient) to have multiple matches -- which may occur when a Dcluster-prefix is a prefix of another Dcluster-prefix. For example: //bar.example.net/foo?" is a prefix of //bar.example.net/foo?action=quote" Consider the following example: A request: GET /foo?p=1 HTTP/1.1 Host: bar.example.net yields a response: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "abc" DCluster: "//bar.example.net/foo?" Response to p=1 followed by: GET /foo?r=1 HTTP/1.1 Host: bar.example.net yielding: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "def" Second-response, r=1 and lastly: GET /foo?s=1 HTTP/1.1 Host: bar.example.net yielding: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "ghi" s=1 response Upon recieving the first response, the client would create a Dcluster-table entry with: Dcluster-prefix: "//bar.example.next/foo?" Etag(1)="abc" Content(1)=First-response Upon reciept of the second and third responses, the client would modify the above, yielding: Dcluster-prefix: "//bar.example.next/foo?" Etag(1)="abc" Etag(2)="def" Etag(3)="ghi" Content(1)=Response to p=1 Content(2)=Second-response, r=1 Content(3)=s=1 response The client could then make a request: GET /foo?t=1 HTTP/1.1 Host: bar.example.net If-None-Match:"abc","def","ghi" A-IM: diff-e Note that the server, by specifying a Dcluster header in it's response to /foo?p=1, is declaring that all subsequent responses to request-uri's that match "foo?" will have unique etags. That is, the server is establishing a uniqueness scope defined by the "foo?" prefix. DTemplate is supported in a similar fashion, using a DTemplate-table. 5) If a DTemplate is recieved, the client will (as soon as convenient) request the DTemplate, and store it's content (and it's etag) in the DTemplate-table. Each entry in the Dtemplate-table is identified by a "DTemplate-URI" equal to the the request-URI (that is, a the identifier is NOT a prefix). 6) For all future requests, the request-uri is compared against the DTemplate-URI's in the DTemplate-table. If a DTemplate-URI matches (that is, is an exact match of) the request-uri, then the client may use include the entry's etag in an If-None-Match: request header. Of cousre, the client may check both the Dcluster-table and the Dtemplate-table; the presumption being that the Dtemplate-table will yield better matches. When both a DCluster and a DTemplate are recieved, then the client should do steps 1, 2, 5, and 6. In addition, the etag and the content of the response from the request for the DTemplate should be added to the DCluster-table. For example: A request: GET /foo?p=1 HTTP/1.1 Host: bar.example.net yielding: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "abc" DCluster: "//bar.example.net/foo?" DTemplate: "http://bar.example.net/foo.tplt" This is my p=1 response The client would create a DCluster-table entry of: Dcluster-prefix: "//bar.example.next/foo?" Etag(1)="abc" Content(1)=First-response The client would then request: GET /foo.tplt HTTP/1.1 Host: bar.example.net yielding: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "tabc" This is a template The client would create a DTemplate-table entry of: DTemplate-URI: "//bar.example.next/foo?" Etag(1)="abc" Content(1)=First-response and would modify the DCluster-table, yielding: Dcluster-prefix: "//bar.example.next/foo?" Etag(1)="abc" Content(1)=First-response Etag(2)="tabc" Content(2)=This is a template The client might also wish to mark the (2) entry as having priority -- so that should the list for "//bar.example.next/foo?" become long and require trimming, items associated with DTemplates will be preferentially retained. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From danielh@crosslink.net Thu Aug 24 09:53:03 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id JAA25508; Thu, 24 Aug 2000 09:53:03 -0700 (PDT) From: Received: from ztxmail01.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA32620; Thu, 24 Aug 2000 09:53:03 -0700 Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345) id B637E100D; Thu, 24 Aug 2000 11:53:02 -0500 (CDT) Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id 5FAA911F7 for ; Thu, 24 Aug 2000 11:53:02 -0500 (CDT) Received: from danielh (z_a082.ers.usda.gov [151.121.64.82] (may be forged)) by lycanthrope.crosslink.net (8.9.3/) with SMTP id MAA22462 for ; Thu, 24 Aug 2000 12:53:01 -0400 Message-Id: <200008241653.MAA22462@lycanthrope.crosslink.net> X-Really-To: Date: Thu, 24 Aug 2000 12:52:08 -0300 To: http-delta@pa.dec.com Subject: spoofing X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.10a c10 Regarding dcluster and spoofing... It seems that Jeff is placing great hope on instance-digests, wheras Koen is reluctant to depend on this extra info. I'm wondering if a compromise would be to define a Delta-Uri header, which the server can use to indicate the URI associated with a Delta-base; the server would only use this when this uri is NOT the same as the request-URI. For example: A request: GET /hello.html HTTP/1.1 Host: bar.org yields: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "bcd" DCluster: "//foo.net/hello" And then, GET /hello.html HTTP/1.1 Host: foo.net yields: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "fgh" Later... GET /hello.html HTTP/1.1 Host: foo.net A-IM: vcdiff If-None-Match:"bcd","fgh" could yield: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "fgh2" Delta-base: "fgh" or HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "bcd2" Delta-base: "bcd" Delta-uri: "//bar.org/hello.html" ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From koen@win.tue.nl Fri Aug 25 11:33:14 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA05133; Fri, 25 Aug 2000 11:33:13 -0700 (PDT) Received: from zmamail01.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA06679; Fri, 25 Aug 2000 11:33:13 -0700 Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345) id 7D7E92196; Fri, 25 Aug 2000 14:33:12 -0400 (EDT) Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157]) by zmamail01.zma.compaq.com (Postfix) with ESMTP id 14EBA211A for ; Fri, 25 Aug 2000 14:33:12 -0400 (EDT) Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3) id UAA06635. Fri, 25 Aug 2000 20:33:03 +0200 (MET DST) From: koen@win.tue.nl (Koen Holtman) Message-Id: <200008251833.UAA06635@wsooti09.win.tue.nl> Subject: Re: spoofing In-Reply-To: <200008241653.MAA22462@lycanthrope.crosslink.net> from "danielh@crosslink.net" at "Aug 24, 2000 12:52: 8 pm" To: danielh@crosslink.net Date: Fri, 25 Aug 2000 20:33:02 +0200 (MET DST) Cc: http-delta@pa.dec.com X-Mailer: ELM [version 2.4ME+ PL43 (25)] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit > > >Regarding dcluster and spoofing... > >It seems that Jeff is placing great hope on instance-digests, >wheras Koen is reluctant to depend on this extra info. [...] Just a quick clarification here -- I don't have the time now to study the rest of your message. On instance digests: I believe that as a method of spoofing prevention they are strong enough. But I don't know if implementers would find them too heavy to use -- feedback would be appreciated on this. My security problem with the draft is that is does not currently _require_ the use of a strong-enough spoofing prevention mechanism like instance-digests. Koen. From danielh@crosslink.net Fri Aug 25 13:07:42 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA00558; Fri, 25 Aug 2000 13:07:42 -0700 (PDT) From: Received: from zmamail01.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA01010; Fri, 25 Aug 2000 13:07:42 -0700 Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345) id 3E571215B; Fri, 25 Aug 2000 16:07:40 -0400 (EDT) Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by zmamail01.zma.compaq.com (Postfix) with ESMTP id C5E3A2048 for ; Fri, 25 Aug 2000 16:07:39 -0400 (EDT) Received: from danielh (z_a082.ers.usda.gov [151.121.64.82] (may be forged)) by lycanthrope.crosslink.net (8.9.3/) with SMTP id QAA05491 for ; Fri, 25 Aug 2000 16:07:39 -0400 Message-Id: <200008252007.QAA05491@lycanthrope.crosslink.net> X-Really-To: Date: Fri, 25 Aug 2000 15:59:38 -0300 To: http-delta@pa.dec.com In-Reply-To: <200008251833.UAA06635@wsooti09.win.tue.nl> Subject: Re: spoofing X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.10a c10 >>Regarding dcluster and spoofing... >> >>It seems that Jeff is placing great hope on instance-digests, >>wheras Koen is reluctant to depend on this extra info. >[...] >Just a quick clarification here -- I don't have the time now to study the >rest of your message. >On instance digests: I believe that as a method of spoofing prevention >they are strong enough. But I don't know if implementers would find them >too heavy to use -- feedback would be appreciated on this. Basically that is what I meant. Considering that Content-MD5 response headers are rare, it is likely that instance digests will be effected by the same factors (such as the desire to avoid computation of a digest), hence will also tend not to be computed. >My security problem with the draft is that is does not currently >_require_ the use of a strong-enough spoofing prevention mechanism like >instance-digests. My 'umble proposal is meant to be a simple mechanism that SHOULD be used when a base instance is not a prior instance of the request uri. Perhaps it's not as strong or as elegant as instance-digests, but it's probably good enough (and cheap). ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul Fri Aug 25 15:33:12 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA26412; Fri, 25 Aug 2000 15:33:12 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200008252233.PAA26412@wera.pa.dec.com> To: http-delta Subject: Internet-Drafts have been submitted Date: Fri, 25 Aug 2000 15:33:12 -0700 X-Mts: smtp I submitted these two drafts to the Internet-Drafts editor yesterday: draft-mogul-http-delta-06.txt draft-mogul-http-dcluster-00.txt I know they have been received (because the IETF folks pointed out that I initially gave the first one the wrong number), but they haven't yet been posted to the IETF's server. Presumably, this will happen sometime early next week. I know that Koen and others have already commented on the security mechanisms in draft-mogul-http-dcluster-00.txt. I'm too busy getting ready for my trip to SIGCOMM to even read those, but I gather that we have some more discussions ahead of us. However, I would like to reach closure on draft-mogul-http-delta-06.txt as soon as possible (and before putting a lot of effort into the clusters/templates stuff), so I encourage people to focus on that document first. As far as I know, it currently has no unresolved issues; if this is still true in two weeks, I will issue a "Last Call" for comments on the HTTP-WG mailing list, and (assuming no problems) two weeks after that, I will ask the IESG to approve this as a Proposed Standard. After that, I will try to devote the necessary energy to the clusters/templates document. But I figured it should be out there for everyone to read, even in its current not-quite-final form. Thanks -Jeff From koen@win.tue.nl Thu Aug 31 00:06:25 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id AAA23204; Thu, 31 Aug 2000 00:06:24 -0700 (PDT) Received: from ztxmail01.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA29340; Thu, 31 Aug 2000 00:06:24 -0700 Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345) id D6BF7273F; Thu, 31 Aug 2000 02:06:23 -0500 (CDT) Received: from wsooti09.win.tue.nl (wsooti09.win.tue.nl [131.155.70.157]) by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id 07CCE273E for ; Thu, 31 Aug 2000 02:06:23 -0500 (CDT) Received: from koen@localhost by wsooti09.win.tue.nl (8.9.3) id JAA01303. Thu, 31 Aug 2000 09:06:19 +0200 (MET DST) From: koen@win.tue.nl (Koen Holtman) Message-Id: <200008310706.JAA01303@wsooti09.win.tue.nl> Subject: Re: spoofing In-Reply-To: <200008241653.MAA22462@lycanthrope.crosslink.net> from "danielh@crosslink.net" at "Aug 24, 2000 12:52: 8 pm" To: danielh@crosslink.net Date: Thu, 31 Aug 2000 09:06:19 +0200 (MET DST) Cc: http-delta@pa.dec.com X-Mailer: ELM [version 2.4ME+ PL43 (25)] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Daniel Hellerstein: > > >Regarding dcluster and spoofing... > >It seems that Jeff is placing great hope on instance-digests, >wheras Koen is reluctant to depend on this extra info. > >I'm wondering if a compromise would be to define a Delta-Uri >header, which the server can use to indicate the URI associated >with a Delta-base; the server would only use this when this uri >is NOT the same as the request-URI. > >For example: [example deleted] If I understand your example correctly, the idea is that the recipient of a response with a delta-uri MUST ONLY apply this response to a base instance obtained from the Delta-Uri. Yes, I think that this proposal provides the watertight anti-spoofing protection that I want. In fact I believe this Delta-Uri proposal is similar to, but less complex than, the proposal I made back in April. In my proposal the response would have a Dcluster or Dtemplate with the same function as the delta-uri here. In both cases, the basic idea is that the resource A sending the delta response includes a means for the client to check that the resource B that sent the base instance is in the trust domain of A. >Daniel Hellerstein Koen. From mogul Tue Oct 3 17:03:05 2000 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id RAA07205; Tue, 3 Oct 2000 17:03:05 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200010040003.RAA07205@wera.pa.dec.com> To: http-delta Subject: I've requested Proposed Standard status for HTTP delta encoding Date: Tue, 03 Oct 2000 17:03:05 -0700 X-Mts: smtp I sent a Last Call to the HTTP WG mailing list two weeks ago, asking for any comments on draft-mogul-http-delta-06.txt. Dave Kristol sent some grammatical corrections, but otherwise nobody responded. Therefore, I just sent a message to the IESG asking for Proposed Standard status for draft-mogul-http-delta-07.txt draft-mogul-http-digest-02.txt draft-mogul-http-delta-07.txt is draft-mogul-http-delta-06.txt with grammatical corrections; draft-mogul-http-digest-02.txt is draft-mogul-http-digest-01.txt resubmitted because the previous version has expired. We probably also need to resubmit draft-korn-vcdiff-01.txt (as it has also expired), but Phong Vo has asked for a little time to make some changes. I'm not sure I've followed all of the IETF's required procedures properly; it might be necessary to wait two more weeks after the resubmission date of the digest draft, and it might possibly be necessary to wait for the new VCDIFF draft. As a reminder, while these documents might soon appear as RFCs, that does NOT mean that people should rush to widely deploy this protocol. As it says in RFC2026, Implementors should treat Proposed Standards as immature specifications. It is desirable to implement them in order to gain experience and to validate, test, and clarify the specification. However, since the content of Proposed Standards may be changed if problems are found or better solutions are identified, deploying implementations of such standards into a disruption-sensitive environment is not recommended. We would like to see implementations (which are necessary before we can get Draft Standard status), but please don't deploy anything except in experimental settings. Thanks -Jeff From aking@internet.com Fri Oct 20 07:31:15 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id HAA16036; Fri, 20 Oct 2000 07:31:15 -0700 (PDT) Received: from ztxmail01.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA25978; Fri, 20 Oct 2000 07:31:15 -0700 Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345) id E2F0D16E8; Fri, 20 Oct 2000 09:31:14 -0500 (CDT) Received: from hermes.kos.net (hermes.kos.net [216.13.25.100]) by ztxmail01.ztx.compaq.com (Postfix) with SMTP id 613DB413E for ; Fri, 20 Oct 2000 09:31:14 -0500 (CDT) Received: (qmail 24896 invoked from network); 20 Oct 2000 14:34:10 -0000 Received: from mki5-pl-ri4.kos.net (HELO ?216.13.27.196?) (216.13.27.196) by hermes.kos.net with SMTP; 20 Oct 2000 14:34:10 -0000 X-Sender: aking@mailhost.iworld.com (Unverified) Message-Id: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Fri, 20 Oct 2000 10:54:00 -0400 To: http-delta@pa.dec.com From: Andy King Subject: delta compression i'm doing a story on delta compression for internet.com pls give me a summary of the status of it (fred douglis referred me to you). what live implementations do you know of? i know of two xosoft.com and http://linuxcare.com.au/rproxy/ know any others? what kind of improvement can i expect over mod_gzip? (remotecommunications.com) thanks Andrew B. King internet.com Corp. andrew@internet.com http://www.internet.com Managing Editor 2020 Hogback Rd. STE #4 http://www.webreference.com Ann Arbor, MI 48105 http://www.javascript.com 734.971.7906 v 734.975.9184 x From aking@internet.com Wed Nov 15 07:24:50 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id HAA23121; Wed, 15 Nov 2000 07:24:49 -0800 (PST) Received: from ztxmail01.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA03666; Wed, 15 Nov 2000 07:24:49 -0800 Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345) id 0CBD22125; Wed, 15 Nov 2000 09:24:49 -0600 (CST) Received: from mailhost.iworld.com (unknown [63.95.15.3]) by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id 18CA51FC1 for ; Wed, 15 Nov 2000 09:24:44 -0600 (CST) Received: by mailhost.iworld.com; id KAA27918; Wed, 15 Nov 2000 10:24:43 -0500 (EST) Received: from nodnsquery(10.1.4.47) by darienfw1.iworld.com via smap (V5.5) id xma027748; Wed, 15 Nov 00 10:23:50 -0500 Received: from [10.1.26.57] by schubert.iworld.com (Netscape Messaging Server 3.6) with ESMTP id AAA1C63 for ; Wed, 15 Nov 2000 10:23:47 -0500 X-Sender: aking@mailhost.iworld.com (Unverified) Message-Id: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Wed, 15 Nov 2000 11:47:29 -0400 To: http-delta@pa.dec.com From: "King, Andy" Subject: delta encoding article all, jeffrey mogul has graciously written for us an intro to delta encoding at: http://webref.com/internet/software/servers/http/deltaencoding/intro/ appreciate any feedback you have, tx. short desc follows: What is HTTP Delta Encoding? By sending just the differences between old and new pages, Web caching and load times can be dramatically improved. By Jeffrey Mogul. Andrew B. King internet.com Corp. andrew@internet.com http://www.internet.com Managing Editor 2020 Hogback Rd. STE #4 http://www.webreference.com Ann Arbor, MI 48105 http://www.javascript.com 734.971.7906 v 734.975.9184 x From issac@p-roman.jct.ac.il Thu Dec 7 12:06:21 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id MAA15746; Thu, 7 Dec 2000 12:06:20 -0800 (PST) Received: from zmamail02.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA31760; Thu, 7 Dec 2000 12:06:19 -0800 Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345) id B262556F0; Thu, 7 Dec 2000 15:06:18 -0500 (EST) Received: from mail.jct.ac.il (mail.jct.ac.il [147.161.1.14]) by zmamail02.zma.compaq.com (Postfix) with ESMTP id 10569561F; Thu, 7 Dec 2000 15:06:17 -0500 (EST) Received: from p-roman.jct.ac.il (p-roman.jct.ac.il [147.161.5.104]) by mail.jct.ac.il (8.10.1/8.10.1) with ESMTP id eB7K7DF10807; Thu, 7 Dec 2000 22:07:13 +0200 (IST) Received: from localhost (issac@localhost) by p-roman.jct.ac.il (8.9.3/8.8.7) with ESMTP id WAA26427; Thu, 7 Dec 2000 22:07:00 +0200 Date: Thu, 7 Dec 2000 22:07:00 +0200 (IST) From: Issac Goldstand To: Jeffrey Mogul Cc: http-delta@pa.dec.com Subject: Re: I've requested Proposed Standard status for HTTP delta encoding In-Reply-To: <200010040003.RAA07205@wera.pa.dec.com> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: issac@p-roman.jct.ac.il Isn't this taking a slightly long time to make RFC status??? Or have I been missing some recent activity? On Tue, 3 Oct 2000, Jeffrey Mogul wrote: > I sent a Last Call to the HTTP WG mailing list two weeks ago, asking > for any comments on draft-mogul-http-delta-06.txt. Dave Kristol > sent some grammatical corrections, but otherwise nobody responded. > > Therefore, I just sent a message to the IESG asking for Proposed > Standard status for > draft-mogul-http-delta-07.txt > draft-mogul-http-digest-02.txt > > draft-mogul-http-delta-07.txt is draft-mogul-http-delta-06.txt > with grammatical corrections; draft-mogul-http-digest-02.txt > is draft-mogul-http-digest-01.txt resubmitted because the > previous version has expired. > > We probably also need to resubmit draft-korn-vcdiff-01.txt > (as it has also expired), but Phong Vo has asked for a little > time to make some changes. > > I'm not sure I've followed all of the IETF's required procedures > properly; it might be necessary to wait two more weeks after > the resubmission date of the digest draft, and it might possibly > be necessary to wait for the new VCDIFF draft. > > As a reminder, while these documents might soon appear as > RFCs, that does NOT mean that people should rush to widely > deploy this protocol. As it says in RFC2026, > > Implementors should treat Proposed Standards as immature > specifications. It is desirable to implement them in order to gain > experience and to validate, test, and clarify the specification. > However, since the content of Proposed Standards may be changed if > problems are found or better solutions are identified, deploying > implementations of such standards into a disruption-sensitive > environment is not recommended. > > We would like to see implementations (which are necessary before > we can get Draft Standard status), but please don't deploy anything > except in experimental settings. > > Thanks > -Jeff > -- Internet is a wonderful mechanism for making a fool of yourself in front of a very large audience. --Anonymous Moving the mouse won't get you into trouble... Clicking it might. --Anonymous PGP Key 0xE0FA561B - Fingerprint: 7E18 C018 D623 A57B 7F37 D902 8C84 7675 E0FA 561B From mogul@pa.dec.com Thu Dec 7 14:24:05 2000 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA06599; Thu, 7 Dec 2000 14:24:04 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA25790; Thu, 7 Dec 2000 14:24:04 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA23587; Thu, 7 Dec 2000 14:24:02 -0800 (PST) From: Jeffrey Mogul Message-Id: <200012072224.OAA23587@wera.pa.dec.com> To: Issac Goldstand Cc: http-delta@pa.dec.com Subject: Re: I've requested Proposed Standard status for HTTP delta encoding In-Reply-To: Your message of "Thu, 07 Dec 2000 22:07:00 +0200." Date: Thu, 07 Dec 2000 14:24:02 -0800 X-Mts: smtp Isn't this taking a slightly long time to make RFC status??? Or have I been missing some recent activity? It's a mystery to me. On October 3 2000, I sent the IETF application area directors a request to consider draft-mogul-http-delta-07.txt as a Proposed Standard. I got no response, so I resubmitted the request on November 22 2000. I was actually going to send them another message yesterday, after waiting two weeks, but there was a small flood of IESG actions on the IETF-Announce mailing list, so I decided to wait. But today brought no new IETF-Announce messages, so I have once again sent email to the area directors. If I don't get a response soon, I will try to find someone else in the IESG who might know something. Sorry about the delay, but I'm not sure what else to do. -Jeff From mogul Mon Jan 15 14:14:56 2001 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id OAA15998; Mon, 15 Jan 2001 14:14:56 -0800 (PST) From: Jeffrey Mogul Message-Id: <200101152214.OAA15998@wera.pa.dec.com> To: http-delta Subject: Progress! IESG "Last Calls" for Delta, Vcdiff, Instance Digests Date: Mon, 15 Jan 2001 14:14:56 -0800 X-Mts: smtp Sorry folks, this took way too long. Three months ago, I asked the IESG to consider the HTTP Delta Encoding document as a Proposed Standard. Then, nothing happened. I prodded every few weeks by email, but it took a phone call to the IESG Secretary to unjam the process. People are apologetic. Note that this does not have anything to do with the "Cluster" and "Template" documents, since I had put these off until we have closure on the basic design. In retrospect, that might have been a mistake (given how long this step has taken), but they will have to wait for a while longer. -Jeff FYI, here are the announcements (From: iesg-secretary@ietf.org (The IESG), To: IETF-Announce) === The IESG has received a request to consider Delta encoding in HTTP as a Proposed Standard. This has been reviewed in the IETF but is not the product of an IETF Working Group. The IESG plans to make a decision in the next few weeks, and solicits final comments on this action. Please send any comments to the iesg@ietf.org or ietf@ietf.org mailing lists by February 12, 2001. Files can be obtained via http://www.ietf.org/internet-drafts/draft-mogul-http-delta-07.txt === The IESG has received a request to consider Instance Digests in HTTP as a Proposed Standard. This has been reviewed in the IETF but is not the product of an IETF Working Group. The IESG plans to make a decision in the next few weeks, and solicits final comments on this action. Please send any comments to the iesg@ietf.org or ietf@ietf.org mailing lists by February 12, 2001. Files can be obtained via http://www.ietf.org/internet-drafts/draft-mogul-http-digest-03.txt === The IESG has received a request to consider The VCDIFF Generic Differencing and Compression Data Format as a Proposed Standard. This has been reviewed in the IETF but is not the product of an IETF Working Group. The IESG plans to make a decision in the next few weeks, and solicits final comments on this action. Please send any comments to the iesg@ietf.org or ietf@ietf.org mailing lists by February 12, 2001. Files can be obtained via http://www.ietf.org/internet-drafts/draft-korn-vcdiff-02.txt From philip@alexanderworks.org Sun Jan 21 18:14:13 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA17290; Sun, 21 Jan 2001 18:14:13 -0800 (PST) Received: from zmamail01.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA12522; Sun, 21 Jan 2001 18:14:03 -0800 Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345) id 6C7971162B; Sun, 21 Jan 2001 21:13:57 -0500 (EST) Received: from tungsten.btinternet.com (tungsten.btinternet.com [194.73.73.81]) by zmamail01.zma.compaq.com (Postfix) with ESMTP id EC25D113CB for ; Sun, 21 Jan 2001 21:13:56 -0500 (EST) Received: from [213.1.170.142] (helo=dobbin.btinternet.com) by tungsten.btinternet.com with esmtp (Exim 3.03 #83) id 14KWUU-0002sb-00 for http-delta@pa.dec.com; Mon, 22 Jan 2001 02:13:55 +0000 Message-Id: <4.3.2.7.2.20010122020602.00acf9d0@mail.btinternet.com> X-Sender: philippawley@mail.btinternet.com X-Mailer: QUALCOMM Windows Eudora Version 4.3.2 Date: Mon, 22 Jan 2001 02:07:27 +0000 To: http-delta@pa.dec.com From: Philip Pawley Subject: Re: etag or itag Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed -- Philip Pawley Liverpool, UK http://www.alexanderworks.org/ -- From douglis@research.att.com Fri Feb 9 15:58:54 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA08032; Fri, 9 Feb 2001 15:58:54 -0800 (PST) Received: from zmamail02.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA10911; Fri, 9 Feb 2001 15:58:53 -0800 Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345) id 55CE158CD; Fri, 9 Feb 2001 18:58:53 -0500 (EST) Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102]) by zmamail02.zma.compaq.com (Postfix) with ESMTP id 2A5E55877 for ; Fri, 9 Feb 2001 18:58:53 -0500 (EST) Received: from alliance.research.att.com (alliance.research.att.com [135.207.26.26]) by mail-blue.research.att.com (Postfix) with ESMTP id AB5F94CE02; Fri, 9 Feb 2001 18:58:52 -0500 (EST) Received: from windsor.research.att.com (windsor.research.att.com [135.207.26.46]) by alliance.research.att.com (8.8.7/8.8.7) with ESMTP id SAA27428; Fri, 9 Feb 2001 18:58:47 -0500 (EST) Received: from windsor.research.att.com (localhost [127.0.0.1]) by windsor.research.att.com (8.8.8+Sun/8.8.5) with ESMTP id SAA11325; Fri, 9 Feb 2001 18:58:47 -0500 (EST) Message-Id: <200102092358.SAA11325@windsor.research.att.com> From: Fred Douglis To: iesg@ietf.org Cc: smonetti@att.com, tfrost@att.com, misha@research.att.com, http-delta@pa.dec.com Subject: Re: delta-encoding in HTTP to proposed standard Date: Fri, 09 Feb 2001 18:58:46 -0500 Sender: douglis@research.att.com I've been asked to pass along the following advisory. ====== This is to advise the IETF that AT&T has intellectual property that may be applicable to I-D draft-mogul-http-delta-07.txt. The intellectual property includes U.S. patent 5,931,904, Method for reducing the delay between the time a data page is requested and the time the data page is displayed. AT&T is currently reviewing its licensing intent relative to this Intellectual Property and will notify the IETF accordingly within the next few weeks. Tom Frost AT&T Intellectual Property Management Room 2E37, Bldg. 104 180 Park Avenue Florham Park, NJ 07932 tfrost@att.com From douglis@research.att.com Fri Feb 9 16:05:59 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA24437; Fri, 9 Feb 2001 16:05:59 -0800 (PST) Received: from zmamail02.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA15153; Fri, 9 Feb 2001 16:05:58 -0800 Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345) id 541FF5839; Fri, 9 Feb 2001 19:05:58 -0500 (EST) Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102]) by zmamail02.zma.compaq.com (Postfix) with ESMTP id F191E591B; Fri, 9 Feb 2001 19:05:57 -0500 (EST) Received: from alliance.research.att.com (alliance.research.att.com [135.207.26.26]) by mail-blue.research.att.com (Postfix) with ESMTP id B95264CE0B; Fri, 9 Feb 2001 19:05:57 -0500 (EST) Received: from douglux.research.att.com (root@douglux.research.att.com [135.207.26.106]) by alliance.research.att.com (8.8.7/8.8.7) with ESMTP id TAA27601; Fri, 9 Feb 2001 19:05:57 -0500 (EST) Received: from douglux.research.att.com (IDENT:douglis@localhost.localdomain [127.0.0.1]) by douglux.research.att.com (8.9.3/8.9.3) with ESMTP id TAA17105; Fri, 9 Feb 2001 19:05:57 -0500 Message-Id: <200102100005.TAA17105@douglux.research.att.com> X-Mailer: exmh version 2.1.1 10/15/1999 From: Fred Douglis To: Jeffrey Mogul Cc: http-delta@pa.dec.com, ned.freed@innosoft.com, paf@cisco.com Subject: Re: Last Call: Delta encoding in HTTP to Proposed Standard In-Reply-To: Your message of "Mon, 15 Jan 2001 14:14:56 PST." <200101152214.OAA15998@wera.pa.dec.com> X-Uri: http://www.research.att.com/~douglis/ Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 09 Feb 2001 19:05:56 -0500 Sender: douglis@research.att.com I just copied http-delta on mail to IESG about an AT&T patent that may pertain to the delta-encoding I-D. That was the formal statement; I wanted to make an informal comment as well (and to copy the applications area directors). I apologize if this is perceived to be coming later in the process than it should have. I'm not a long-term/active participant in the IETF and have heard various conflicting statements about when it is appropriate to disclose such information. I endeavored to make a statement before the last-call deadline for the draft. I hope this is sufficient. In regard to the inclusion of rsync/rproxy in the I-D, I support modifying it accordingly. Fred From cjh@osa.com.au Sun Feb 11 21:51:25 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id VAA02126; Sun, 11 Feb 2001 21:51:25 -0800 (PST) Received: from ztxmail02.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA04819; Sun, 11 Feb 2001 21:51:24 -0800 Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345) id 644351E72; Sun, 11 Feb 2001 23:51:24 -0600 (CST) Received: from fw01.osa.com.au (fw01.osa.com.au [203.6.130.130]) by ztxmail02.ztx.compaq.com (Postfix) with SMTP id A5FD31F11 for ; Sun, 11 Feb 2001 23:51:22 -0600 (CST) Received: (qmail 11695 invoked by uid 0); 12 Feb 2001 05:51:20 -0000 Received: (ofmipd 172.16.33.89); 12 Feb 2001 05:50:57 -0000 Received: (qmail 26712 invoked by uid 4005); 12 Feb 2001 05:51:19 -0000 Received: from cjh@magpie.osa.com.au by excalibur.osa.com.au with qmail-scanner-0.90 (. Clean. Processed in 0.293723 secs); 12/02/2001 16:51:19 Received: from magpie.osa.com.au (172.16.36.3) by excalibur.osa.com.au with SMTP; 12 Feb 2001 05:51:17 -0000 Received: (qmail 1717 invoked from network); 12 Feb 2001 05:51:16 -0000 Received: from localhost.osa.com.au (HELO magpie.osa.com.au) (127.0.0.1) by localhost.osa.com.au with SMTP; 12 Feb 2001 05:51:16 -0000 Date: 12 Feb 2001 16:51:16 +1100 Message-Id: <20010212165116.1.11694.qmail@osa.com.au> From: "Clifford Heath" To: "Fred Douglis" Cc: http-delta@pa.dec.com Subject: Re: delta-encoding in HTTP to proposed standard In-Reply-To: Your message of "Fri, 09 Feb 2001 18:58:46 CDT." <200102092358.SAA11325@windsor.research.att.com> > This is to advise the IETF that AT&T has intellectual property that may be > applicable to I-D draft-mogul-http-delta-07.txt. I can't see how this is applicable. The patent clearly delineates the operation of sending an available old version of a document while fetching, computing and sending differences against the new one, with the goal of an overall reduction in latency. I can't see how HTTP deltas would be used to serve this purpose, other than by abusing other tags, like expiry. Only the current version or deltas to reach the current version are expected to be sent, no? -- Clifford Heath, Open Software Associates, mailto:cjh@osa.com.au, Ph +613 9895 2194, Fax 9895 2020, , 56-60 Rutland Rd, Box Hill 3128, Melbourne, Victoria, Australia. From chair@ietf.org Mon Feb 12 18:05:51 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA21331; Mon, 12 Feb 2001 18:05:50 -0800 (PST) Received: from zmamail02.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA24767; Mon, 12 Feb 2001 18:05:50 -0800 Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345) id EF1A55C70; Mon, 12 Feb 2001 21:05:49 -0500 (EST) Received: from sj-msg-core-2.cisco.com (sj-msg-core-2.cisco.com [171.69.43.88]) by zmamail02.zma.compaq.com (Postfix) with ESMTP id 6C3915DF7 for ; Mon, 12 Feb 2001 21:05:49 -0500 (EST) Received: from FRED-W2K.ietf.org (fred-hm-dhcp1.cisco.com [171.69.128.116]) by sj-msg-core-2.cisco.com (8.9.3/8.9.1) with ESMTP id SAA14273; Mon, 12 Feb 2001 18:05:30 -0800 (PST) Message-Id: <4.3.2.7.2.20010212180206.0244b920@mira-sjcm-2.cisco.com> X-Sender: fred@flipper.cisco.com (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 4.3.2 Date: Mon, 12 Feb 2001 18:05:05 -0800 To: Fred Douglis From: Fred Baker Subject: Re: delta-encoding in HTTP to proposed standard Cc: iesg@ietf.org, smonetti@att.com, tfrost@att.com, misha@research.att.com, http-delta@pa.dec.com In-Reply-To: <200102092358.SAA11325@windsor.research.att.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Maybe you could advise us of your intentions with regard to this. The IETF, as a rule, seeks to avoid standardizing people's business plans, especially in individual submissions which have not had working group review. Is there a strong reason to not take this to Informational status - treat it as a corporate white paper? At 06:58 PM 2/9/2001 -0500, Fred Douglis wrote: >I've been asked to pass along the following advisory. >====== > >This is to advise the IETF that AT&T has intellectual property that may be >applicable to I-D draft-mogul-http-delta-07.txt. The intellectual property >includes U.S. patent 5,931,904, Method for reducing the delay between the >time a data page is requested and the time the data page is displayed. > >AT&T is currently reviewing its licensing intent relative to this >Intellectual Property and will notify the IETF accordingly within the next >few weeks. > >Tom Frost >AT&T Intellectual Property Management >Room 2E37, Bldg. 104 >180 Park Avenue >Florham Park, NJ 07932 >tfrost@att.com From mogul Mon Feb 12 23:09:34 2001 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id XAA01572; Mon, 12 Feb 2001 23:09:34 -0800 (PST) From: Jeffrey Mogul Message-Id: <200102130709.XAA01572@wera.pa.dec.com> To: Fred Baker cc: Fred Douglis , iesg@ietf.org, smonetti@att.com, tfrost@att.com, misha@research.att.com, mogul, http-delta Subject: Re: delta-encoding in HTTP to proposed standard In-reply-to: Your message of "Mon, 12 Feb 2001 18:05:05 PST." <4.3.2.7.2.20010212180206.0244b920@mira-sjcm-2.cisco.com> Date: Mon, 12 Feb 2001 23:09:34 -0800 X-Mts: smtp Maybe you could advise us of your intentions with regard to this. The IETF, as a rule, seeks to avoid standardizing people's business plans, especially in individual submissions which have not had working group review. Is there a strong reason to not take this to Informational status - treat it as a corporate white paper? For the record: the HTTP Delta spec has been developed by a group of people from several dozen companies and universities. While several of the authors are (or were) from AT&T, that should not obscure the fact that this is definitely a multi-vendor (and multi-non-vendor) standards proposal. Although it was not the product of a formal working group, it was publicized numerous times within the HTTP working group (but was not within that group's charter), and received some discussion on the HTTP-WG mailing list. Our intention has always been to seek standards-track status. I will let the AT&T people explain how and why they believe that they have intellectual property that is related to this patent; that was news to me and to most of the other people who worked on this spec over a period of several years. However, I want the IESG to be very clear on this one point: this is NOT an AT&T "corporate white paper" in any way. If AT&T's recent claim has confused the IESG about this issue, this is unfortunate. I trust that AT&T will clarify the intellectual property issues promptly. -Jeff From fred@cisco.com Tue Feb 13 00:17:10 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id AAA04019; Tue, 13 Feb 2001 00:17:09 -0800 (PST) Received: from ztxmail02.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA32112; Tue, 13 Feb 2001 00:17:09 -0800 Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345) id 258E31C9B; Tue, 13 Feb 2001 02:17:09 -0600 (CST) Received: from sj-msg-core-2.cisco.com (sj-msg-core-2.cisco.com [171.69.43.88]) by ztxmail02.ztx.compaq.com (Postfix) with ESMTP id 9A08E1E1C; Tue, 13 Feb 2001 02:17:08 -0600 (CST) Received: from FRED-W2K.cisco.com (fred-hm-dhcp1.cisco.com [171.69.128.116]) by sj-msg-core-2.cisco.com (8.9.3/8.9.1) with ESMTP id AAA06567; Tue, 13 Feb 2001 00:17:22 -0800 (PST) Message-Id: <4.3.2.7.2.20010213000603.023d1db0@mira-sjcm-2.cisco.com> X-Sender: fred@mira-sjcm-2.cisco.com X-Mailer: QUALCOMM Windows Eudora Version 4.3.2 Date: Tue, 13 Feb 2001 00:16:27 -0800 To: Jeffrey Mogul From: Fred Baker Subject: Re: delta-encoding in HTTP to proposed standard Cc: Fred Douglis , iesg@ietf.org, smonetti@att.com, tfrost@att.com, misha@research.att.com, mogul@pa.dec.com, http-delta@pa.dec.com In-Reply-To: <200102130709.XAA01572@wera.pa.dec.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed At 11:09 PM 2/12/2001 -0800, Jeffrey Mogul wrote: >I trust that AT&T will clarify the intellectual property issues >promptly. Thanks. I certainly hope that they will. From douglis@research.att.com Tue Feb 13 06:40:05 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id GAA17402; Tue, 13 Feb 2001 06:40:04 -0800 (PST) Received: from ztxmail01.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA18899; Tue, 13 Feb 2001 06:40:00 -0800 Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345) id A0D1929D2; Tue, 13 Feb 2001 08:39:59 -0600 (CST) Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102]) by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id 4837C2880; Tue, 13 Feb 2001 08:39:59 -0600 (CST) Received: from alliance.research.att.com (alliance.research.att.com [135.207.26.26]) by mail-blue.research.att.com (Postfix) with ESMTP id 8E4F24CE2B; Tue, 13 Feb 2001 09:39:58 -0500 (EST) Received: from douglux.research.att.com (root@douglux.research.att.com [135.207.26.106]) by alliance.research.att.com (8.8.7/8.8.7) with ESMTP id JAA16846; Tue, 13 Feb 2001 09:39:57 -0500 (EST) Received: from douglux.research.att.com (IDENT:douglis@localhost.localdomain [127.0.0.1]) by douglux.research.att.com (8.9.3/8.9.3) with ESMTP id JAA01231; Tue, 13 Feb 2001 09:39:56 -0500 Message-Id: <200102131439.JAA01231@douglux.research.att.com> X-Mailer: exmh version 2.1.1 10/15/1999 From: Fred Douglis To: Fred Baker Cc: Jeffrey Mogul , iesg@ietf.org, smonetti@att.com, tfrost@att.com, misha@research.att.com, http-delta@pa.dec.com Subject: Re: delta-encoding in HTTP to proposed standard In-Reply-To: Your message of "Tue, 13 Feb 2001 00:16:27 PST." <4.3.2.7.2.20010213000603.023d1db0@mira-sjcm-2.cisco.com> X-Uri: http://www.research.att.com/~douglis/ Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 13 Feb 2001 09:39:56 -0500 Sender: douglis@research.att.com You can expect a statement from AT&T soon, most likely by the end of the week. I won't go into any further comments now, other than to apologize for the confusion and the timing. (Note also that I sent a separate note on the timing to the applications area directors right after the formal notification to the IESG.) Fred From mogul Tue Feb 13 16:31:34 2001 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA28760; Tue, 13 Feb 2001 16:31:33 -0800 (PST) From: Jeffrey Mogul Message-Id: <200102140031.QAA28760@wera.pa.dec.com> To: http-delta Subject: Last-Call change request by Larry Masinter Date: Tue, 13 Feb 2001 16:31:33 -0800 X-Mts: smtp My response follows. -Jeff ------- Forwarded Message Return-Path: lmnet@attglobal.net From: "Larry Masinter" To: Subject: RE: Last Call: Delta encoding in HTTP to Proposed Standard Date: Sun, 11 Feb 2001 00:06:18 -0800 Message-Id: Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-Msmail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal X-Mimeole: Produced By Microsoft MimeOLE V5.50.4133.2400 In-Reply-To: <200101151228.HAA02160@ietf.org> # The IESG has received a request to consider Delta encoding in HTTP # as a Proposed Standard. This has been # reviewed in the IETF but is not the product of an IETF Working Group. I would like to respectfully request that the author(s) tone down his(their) condemnation of the terminology in RFC 2616 around the word "entity", since it isn't necessary to the understanding of the protocol they're proposing. In almost all system designs with new concepts, it is necessary to take ordinary words and give them technical meanings that don't exactly match their dictionary definitions. I can remember that at times the discussions in HTTP-WG over terminology were heated, but in the end, it was necessary to make some choices. I think all that's necessary is to reword, in a minor way, the paragraphs in section 3 (Terminology) used to introduce the term "instance". OLD: The dictionary definition for ``entity'' is ``something that has separate and distinct existence and objective or conceptual reality'' [21]. Unfortunately, the definition for ``entity'' in HTTP/1.1 is similar to that used in MIME [12], based on an entirely false analogy between MIME and HTTP. NEW: The dictionary definition for ``entity'' is ``something that has separate and distinct existence and objective or conceptual reality'' [21]. The definition for ``entity'' in HTTP/1.1 is similar to that used in MIME [12], based on an analogy between MIME and HTTP. OLD: In MIME, electronic mail messages do have distinct and separate existences, so the MIME definition of ``entity'' as something that ``refers specifically to the MIME-defined header fields and contents of either a message or one of the parts in the body of a multipart entity'' makes sense. NEW: In MIME, electronic mail messages have distinct and separate existences. MIME defines ``entity'' as something that ``refers specifically to the MIME-defined header fields and contents of either a message or one of the parts in the body of a multipart entity''. OLD: In HTTP, however, a response message to a GET does not have a distinct and separate existence. Rather, it is describing the current state of a resource (or a variant, subject to a set of constraints). The HTTP/1.1 specification provides no term to describe ``the value that would be returned in response to a GET request at the current time for the selected variant of the specified resource.'' This leads to awkward wordings in the HTTP/1.1 specification in places where this concept is necessary. NEW: In HTTP, however, an entity in a response message to a GET is more transient. It reflects the current state of a resource (or a variant, subject to a set of constraints). The HTTP/1.1 specification has no term for ``the value that would be returned in response to a GET request at the current time for the selected variant of the specified resource.'' This leads to awkward wordings in the HTTP/1.1 specification in places where this concept is necessary. OLD: It is too late to fix the terminological failure in the HTTP/1.1 specification, so we instead define a new term, for use in this document: NEW: To express this concept, we define a new term, for use in this document: ------- End of Forwarded Message From mogul Tue Feb 13 16:31:58 2001 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA28796; Tue, 13 Feb 2001 16:31:57 -0800 (PST) From: Jeffrey Mogul Message-Id: <200102140031.QAA28796@wera.pa.dec.com> To: "Larry Masinter" cc: , http-delta Subject: Re: Last Call: Delta encoding in HTTP to Proposed Standard In-reply-to: Your message of "Sun, 11 Feb 2001 00:06:18 PST." Date: Tue, 13 Feb 2001 16:31:57 -0800 X-Mts: smtp Larry writes: I think all that's necessary is to reword, in a minor way, the paragraphs in section 3 (Terminology) used to introduce the term "instance". OLD: The dictionary definition for ``entity'' is ``something that has separate and distinct existence and objective or conceptual reality'' [21]. Unfortunately, the definition for ``entity'' in HTTP/1.1 is similar to that used in MIME [12], based on an entirely false analogy between MIME and HTTP. NEW: The dictionary definition for ``entity'' is ``something that has separate and distinct existence and objective or conceptual reality'' [21]. The definition for ``entity'' in HTTP/1.1 is similar to that used in MIME [12], based on an analogy between MIME and HTTP. OLD: In MIME, electronic mail messages do have distinct and separate existences, so the MIME definition of ``entity'' as something that ``refers specifically to the MIME-defined header fields and contents of either a message or one of the parts in the body of a multipart entity'' makes sense. NEW: In MIME, electronic mail messages have distinct and separate existences. MIME defines ``entity'' as something that ``refers specifically to the MIME-defined header fields and contents of either a message or one of the parts in the body of a multipart entity''. OLD: In HTTP, however, a response message to a GET does not have a distinct and separate existence. Rather, it is describing the current state of a resource (or a variant, subject to a set of constraints). The HTTP/1.1 specification provides no term to describe ``the value that would be returned in response to a GET request at the current time for the selected variant of the specified resource.'' This leads to awkward wordings in the HTTP/1.1 specification in places where this concept is necessary. NEW: In HTTP, however, an entity in a response message to a GET is more transient. It reflects the current state of a resource (or a variant, subject to a set of constraints). The HTTP/1.1 specification has no term for ``the value that would be returned in response to a GET request at the current time for the selected variant of the specified resource.'' This leads to awkward wordings in the HTTP/1.1 specification in places where this concept is necessary. OLD: It is too late to fix the terminological failure in the HTTP/1.1 specification, so we instead define a new term, for use in this document: NEW: To express this concept, we define a new term, for use in this document: I would accept all of these changes, except that in the first change Larry suggested, I am going to insist a phrase such as based on a false analogy between MIME and HTTP. Or, if Larry would prefer, based on a naive analogy between MIME and HTTP. -Jeff From mogul Tue Feb 13 16:48:46 2001 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA29443; Tue, 13 Feb 2001 16:48:45 -0800 (PST) From: Jeffrey Mogul Message-Id: <200102140048.QAA29443@wera.pa.dec.com> To: Neale Banks cc: iesg@ietf.org, http-delta Subject: Re: Delta encoding in HTTP to Proposed Standard In-reply-to: Your message of "Fri, 09 Feb 2001 23:54:34 +1100." Date: Tue, 13 Feb 2001 16:48:45 -0800 X-Mts: smtp Neale Banks writes: Submission to the IETF and IESG regarding "Delta encoding in HTTP " as a Proposed Standard. In relation to this Internet-Draft I have a concern regarding its acceptance as a Proposed Standard in its current form, due to a significant omission. This Internet-Draft includes section 1.1 titled "Related research and proposals". However this section completely fails to acknowledge the existence of the rproxy project[1]. Nor is rproxy refered to anywhere else in the current draft. It is my humble opinion that this omission renders this Internet-Draft critically incomplete. This section could also benefit from a reference to rsync[2]. I in no way submit that the technical proposals of Mogul et al are inferior to rproxy, but rather that these two approach similar (if not the same) challenges with contrasting solutions. It is from this point of view that I submit that the current draft is critically incomplete insomuch as includes a section "Related research and proposals" which makes no apparent qualification of incompleteness. Whilst there may be grounds to allege that rproxy is still a work-in-progress, it is a project which has a sound foundation - "The rproxy algorithm is based on the well-known and trustworthy rsync software by Andrew Tridgell." [1],[2] Having discussed this matter with the one of the rproxy developers[3], I am sure that the contributors to rproxy would be agreeable to providing some assistance with including an appropriate reference in this Internet-Draft. [1] rproxy: http://www.linuxcare.com.au/rproxy [2] rsync: http://rsync.samba.org/ [3] Conversation with Martin Pool at linux.conf.au, January 2001 I am not aware that an IETF Standards-Track document is required to include any "related work" section. This document includes one because we believe that it clarifies the background behind the protocol specification. The title of this section is "Related research and proposals." Although I do not believe that it would be necessary for understanding the HTTP Delta specification, I would be happy to cite the rsync technical report, as Andrew Tridgell and Paul Mackerras. The rsync algorithm. Technical Report, Department of Computer Science, Australian National University. November, 1998. http://rsync.samba.org/rsync/tech_report/ especially because there has been some discussion of trying to fit rsync into the framework that has been developed for HTTP Deltas. (I should point out that Andrew Tridgell has been on the http-delta@pa.dec.com mailing list for some time, and has occasionally participated in our discussions. Martin Pool is also on the mailing list, but no messages from him appear in our log.) Andrew, if there is something else I should cite instead, please let me know ASAP. As Mark Nottingham writes, regarding the rproxy pages: Is there an rsync protocol specification to refer to (in a normative or non-normative manner)? I see a presentation about the protocol, and a one-page description with some BNF defining 'delta', but nothing else. I too do not see anything in the rproxy-related pages that constitutes a "related proposal." Therefore, it is hard to see how our failure to cite rproxy renders the HTTP Delta specification "critically incomplete." -Jeff From tridge@au2.samba.org Tue Feb 13 17:09:58 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id RAA20911; Tue, 13 Feb 2001 17:09:58 -0800 (PST) Received: from ztxmail02.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA22193; Tue, 13 Feb 2001 17:09:58 -0800 Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345) id 9577C1CFD; Tue, 13 Feb 2001 19:09:57 -0600 (CST) Received: from au2.samba.org (ns1.samba.org [203.17.0.92]) by ztxmail02.ztx.compaq.com (Postfix) with ESMTP id BBD651DD8; Tue, 13 Feb 2001 19:09:56 -0600 (CST) Received: by au2.samba.org (Postfix, from userid 148) id D1AA3659838; Wed, 14 Feb 2001 11:57:18 +1100 (EST) From: Andrew Tridgell To: mogul@pa.dec.com Cc: neale@lowendale.com.au, iesg@ietf.org, http-delta@pa.dec.com In-Reply-To: <200102140048.QAA29443@wera.pa.dec.com> (message from Jeffrey Mogul on Tue, 13 Feb 2001 16:48:45 -0800) Subject: Re: Delta encoding in HTTP to Proposed Standard Reply-To: tridge@samba.org References: <200102140048.QAA29443@wera.pa.dec.com> Message-Id: <20010214005718.D1AA3659838@au2.samba.org> Date: Wed, 14 Feb 2001 11:57:18 +1100 (EST) Sender: tridge@au2.samba.org Jeff, > Andrew, if there is something else I should cite instead, please > let me know ASAP. Probably the most useful cite is http://rproxy.samba.org/ as that gives the most directly relevant information on the interaction of rsync with http. If you would prefer a more academic cite then my PhD thesis is probably the best (instead of that technical report) as it contains much more up to date and complete information, plus it talks directly about the use of rsync in http. > I too do not see anything in the rproxy-related pages that constitutes > a "related proposal." Therefore, it is hard to see how our failure > to cite rproxy renders the HTTP Delta specification "critically > incomplete." you are correct. It would be nice to have a reference in there for completeness but we are approaching the problem very much from a "exploring implementations" standpoint rather than a standards view. I certainly do not think that the lack of information about rproxy makes the HTTP delta proposal "critically incomplete". It's quite likely that we will be pursuing a standards approach at some time in the future, but that isn't our emphasis at the moment. Cheers, Tridge From neoi@writeme.com Tue Feb 13 23:54:50 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id XAA16161; Tue, 13 Feb 2001 23:54:49 -0800 (PST) Received: from ztxmail02.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA14212; Tue, 13 Feb 2001 23:54:49 -0800 Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345) id D16E21C61; Wed, 14 Feb 2001 01:54:48 -0600 (CST) Received: from mail.jct.ac.il (mail.jct.ac.il [147.161.1.14]) by ztxmail02.ztx.compaq.com (Postfix) with ESMTP id 5C2091EF4; Wed, 14 Feb 2001 01:54:47 -0600 (CST) Received: from beamartyr (goldrush.jct.ac.il [147.161.5.215]) by mail.jct.ac.il (8.10.1/8.10.1) with SMTP id f1E7ub210814; Wed, 14 Feb 2001 09:56:39 +0200 (IST) Message-Id: <001c01c0965b$5e520dc0$d705a193@jct.ac.il> From: "Issac Goldstand" To: "Jeffrey Mogul" Cc: References: <200102140048.QAA29443@wera.pa.dec.com> Subject: Re: Delta encoding in HTTP to Proposed Standard Date: Wed, 14 Feb 2001 09:54:26 +0200 Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1255" Content-Transfer-Encoding: 7bit X-Priority: 3 X-Msmail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4133.2400 X-Mimeole: Produced By Microsoft MimeOLE V5.50.4133.2400 Jeff: While we're on the subject of related works, Akamai, along with many other companies, are working on a similar service called ICAP (www.i-cap.org). In short, ICAP is supposed to be a protocol in which special servers can modify HTTP requests, response headers and payloads. Now, the projects DO differ as delta is about the content server sending "updated" versions of the payload, while ICAP is more targeted at changing or adding to the payload (mush as server-based preprocessors like SSI and PHP). I still think, however, that it's worth taking a closer look at. They have an Internet-Draft, although to the best of my knowledge, they have not yet submitted it. A copy of the current draft is available at http://www.i-cap.org/icap/media/draft-opes-icap-00.txt Issac Internet is a wonderful mechanism for making a fool of yourself in front of a very large audience. --Anonymous Moving the mouse won't get you into trouble... Clicking it might. --Anonymous PGP Key 0xE0FA561B - Fingerprint: 7E18 C018 D623 A57B 7F37 D902 8C84 7675 E0FA 561B From mogul@pa.dec.com Wed Feb 14 11:54:27 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA17141; Wed, 14 Feb 2001 11:54:27 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA07584; Wed, 14 Feb 2001 11:54:26 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA17409; Wed, 14 Feb 2001 11:54:25 -0800 (PST) From: Jeffrey Mogul Message-Id: <200102141954.LAA17409@wera.pa.dec.com> To: "Issac Goldstand" Cc: Subject: Re: Delta encoding in HTTP to Proposed Standard In-Reply-To: Your message of "Wed, 14 Feb 2001 09:54:26 +0200." <001c01c0965b$5e520dc0$d705a193@jct.ac.il> Date: Wed, 14 Feb 2001 11:54:25 -0800 X-Mts: smtp With all due respect to all of the other related work out there, this document is not a "survey of things related in some vague way to delta encoding". The section on related research and proposals is meant to be a background, not the bibliography from a doctoral dissertation :-). I do think it makes sense to mention rsync, since we have made some brief stabs at trying to fit rsync (or something with a similar function) into the instance-manipulation layer. But going beyond that seems like a slippery slope, especially given the number of research and commercial projects out there are that are "related". For a list that someone else put together, see http://webreference.com/internet/software/servers/http/deltaencoding/ and this doesn't include some other things that I know about. -Jeff From lmm@acm.org Wed Feb 14 20:31:07 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id UAA26149; Wed, 14 Feb 2001 20:31:07 -0800 (PST) Received: from zmamail02.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA31365; Wed, 14 Feb 2001 20:31:06 -0800 Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345) id 506255ED4; Wed, 14 Feb 2001 23:31:06 -0500 (EST) Received: from smtp-relay-1.Adobe.COM (smtp-relay-1.adobe.com [192.150.11.1]) by zmamail02.zma.compaq.com (Postfix) with ESMTP id ADC2A5C92; Wed, 14 Feb 2001 23:31:05 -0500 (EST) Received: from inner-relay-1.Adobe.COM (inner-relay-1.corp.adobe.com [153.32.1.51]) by smtp-relay-1.Adobe.COM (8.8.6) with ESMTP id UAA19977; Wed, 14 Feb 2001 20:34:45 -0800 (PST) Received: from mailsj-v1.corp.adobe.com by inner-relay-1.Adobe.COM (8.8.5) with ESMTP id UAA03559; Wed, 14 Feb 2001 20:30:12 -0800 (PST) Received: from larrypad ([153.32.67.80]) by mailsj-v1.corp.adobe.com (Netscape Messaging Server 4.15) with SMTP id G8S77R00.JDL; Wed, 14 Feb 2001 20:31:03 -0800 From: "Larry Masinter" To: "Jeffrey Mogul" Cc: , Subject: RE: Last Call: Delta encoding in HTTP to Proposed Standard Date: Wed, 14 Feb 2001 20:30:47 -0800 Message-Id: Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-Msmail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) In-Reply-To: <200102140031.QAA28796@wera.pa.dec.com> Importance: Normal X-Mimeole: Produced By Microsoft MimeOLE V5.50.4133.2400 > I would accept all of these changes, except that in the first > change Larry suggested, I am going to insist a phrase such as > based on a false analogy between MIME and HTTP. > Or, if Larry would prefer, > based on a naive analogy between MIME and HTTP. It's hard for an analogy to be false ("the moon is like a piece of green cheese"), and I don't think the analogy between MIME and HTTP was particularly naive. I might suggest abandoning "analogy" altogether: based on the (somewhat problematic) relationship between MIME and HTTP. The relationship between MIME and HTTP is problematic, (IMHO) due as much to narrowness of the MIME document's focus on email as it is to the reuse MIME constructs in HTTP. But I think I've made my point, and I'll go along with whatever wording the editor(s) choose. Larry From neale@lowendale.com.au Fri Feb 16 06:08:05 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id GAA11227; Fri, 16 Feb 2001 06:08:05 -0800 (PST) Received: from zmamail01.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA00482; Fri, 16 Feb 2001 06:07:54 -0800 Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345) id 8F1A8BC4D; Fri, 16 Feb 2001 09:07:53 -0500 (EST) Received: from marina.lowendale.com.au (gw.lowendale.com.au [203.26.242.120]) by zmamail01.zma.compaq.com (Postfix) with ESMTP id 73FFACB53; Fri, 16 Feb 2001 09:07:48 -0500 (EST) Received: from localhost (neale@localhost) by marina.lowendale.com.au (8.9.3/8.9.3/Debian/GNU) with ESMTP id BAA06937; Sat, 17 Feb 2001 01:10:04 +1100 Date: Sat, 17 Feb 2001 01:10:01 +1100 (EST) From: Neale Banks To: Jeffrey Mogul Cc: iesg@ietf.org, http-delta@pa.dec.com Subject: Re: Delta encoding in HTTP to Proposed Standard In-Reply-To: <200102140048.QAA29443@wera.pa.dec.com> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Jeff, On Tue, 13 Feb 2001, Jeffrey Mogul wrote: [...] > Andrew, if there is something else I should cite instead, please > let me know ASAP. I'll defer to Andrew's and Martin's comments regarding approriate references/citations. [...] > I too do not see anything in the rproxy-related pages that constitutes > a "related proposal." Therefore, it is hard to see how our failure > to cite rproxy renders the HTTP Delta specification "critically > incomplete." My concern (which I apologise for not bringing up earlier) was limited to my perception of a lack of completeness in referencing "related research". I in no way meant to imply that I considered the HTTP Delta specification itself to be incomplete. Regards, Neale. From douglis@research.att.com Wed Feb 28 07:28:51 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id HAA01155; Wed, 28 Feb 2001 07:28:51 -0800 (PST) Received: from ztxmail02.nz-cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA18893; Wed, 28 Feb 2001 07:28:49 -0800 Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345) id 807AC1D73; Wed, 28 Feb 2001 09:28:49 -0600 (CST) Received: from mail-green.research.att.com (H-135-207-30-103.research.att.com [135.207.30.103]) by ztxmail02.ztx.compaq.com (Postfix) with ESMTP id EE63F1E2E for ; Wed, 28 Feb 2001 09:28:48 -0600 (CST) Received: from alliance.research.att.com (alliance.research.att.com [135.207.26.26]) by mail-green.research.att.com (Postfix) with ESMTP id 8BB731E010; Wed, 28 Feb 2001 10:28:48 -0500 (EST) Received: from douglux.research.att.com (root@douglux.research.att.com [135.207.26.106]) by alliance.research.att.com (8.8.7/8.8.7) with ESMTP id KAA14970; Wed, 28 Feb 2001 10:28:45 -0500 (EST) Received: from douglux.research.att.com (IDENT:douglis@localhost.localdomain [127.0.0.1]) by douglux.research.att.com (8.9.3/8.9.3) with ESMTP id KAA29562; Wed, 28 Feb 2001 10:28:46 -0500 Message-Id: <200102281528.KAA29562@douglux.research.att.com> X-Mailer: exmh version 2.1.1 10/15/1999 From: Fred Douglis To: "Clifford Heath" Cc: http-delta@pa.dec.com, iesg@ietf.org Subject: Re: delta-encoding in HTTP to proposed standard In-Reply-To: Your message of "12 Feb 2001 16:51:16 +1100." <20010212165116.1.11696.qmail@osa.com.au> X-Uri: http://www.research.att.com/~douglis/ Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 28 Feb 2001 10:28:46 -0500 Sender: douglis@research.att.com Clifford, I am belatedly replying to your note to the http-delta list on the subject of the applicability of a patent, which I mentioned earlier this month, to draft-mogul-http-delta-07.txt. I held off on replying while AT&T sorted out the situation and made a formal pronouncement to the IESG on our licensing stance. I include it here since it didn't go to the deltas list initially. And, I am copying the IESG because my explanation below applies to them as well. (However, it is not a formal declaration for the IESG in the way the forwarded message was.) ------- Forwarded Message Date: Fri, 23 Feb 2001 17:18:34 -0500 From: tfrost@att.com To: iesg@ietf.org cc: douglis@att.com Subject: Re: delta-encoding in HTTP to proposed standard This declaration is being made pursuant to the provisions of IETF IPR Policy, Sections 10.3.1 and 10.3.2. This is to advise the IETF that AT&T believes it owns at least one patent that may relate to Internet Draft document "draft-mogul-http-delta-07.txt", including United States Patent No. 5,931,904. To the extent that the technology discussed in that Internet Draft becomes an IETF Standard and to the extent claims of AT&T's patents are required to implement the IETF Standard, AT&T agrees that, upon written request, AT&T will offer, on a nondiscriminatory basis, non-exclusive, royalty-free licenses under such patent claims to implement that IETF Standard. AT&T's willingness to grant such licenses is conditioned upon the prospective licensee granting a reciprocal license to AT&T under any patents that the prospective licensee has to any technology required to implement that IETF Standard. Written requests for licenses may be sent to: AT&T Intellectual Property Licensing Room 2E37, Bldg. 104 180 Park Avenue Florham Park, NJ 07932 ------- End of Forwarded Message The reason for the belated announcement of this intellectual property claim was that we only recently realized that the patent mentioned above was very broad and could encompass the proposed standard. Having realized this, we strove to make this information public prior to the IESG last call deadline, and then make a public pronouncement of our licensing stance as quickly as we could. We hope that the grant of a royalty-free license to those following the standard will put to rest a question of either the timeliness of the announcement or the specific impact of this intellectual property on the proposed standard. Finally, I want to again express my regrets to the IETF and IESG, the other co-authors of the delta-encoding specification, and anyone who may already be building systems based on the proposed standard, for the tardiness of the announcement and the concern the original announcement may have caused before the royalty-free terms were publicly announced. -- Fred Douglis ----------------------------------------------------------------- PGP Fingerprint: 83 B9 D6 7E 7F 78 8E BB 16 95 DE 69 1A 52 BC 82 From avh@marimba.com Mon Mar 5 13:27:07 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id NAA13410; Mon, 5 Mar 2001 13:27:07 -0800 (PST) Received: from zmamail02.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA00722; Mon, 5 Mar 2001 13:27:06 -0800 Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345) id 3951E3600; Mon, 5 Mar 2001 16:27:06 -0500 (EST) Received: from cobra.marimba.com (acheron.marimba.com [207.126.123.64]) by zmamail02.zma.compaq.com (Postfix) with ESMTP id 304674C4F for ; Mon, 5 Mar 2001 16:27:05 -0500 (EST) Received: by cobra.marimba.com with Internet Mail Service (5.5.2653.19) id <13CRFXRZ>; Mon, 5 Mar 2001 13:27:04 -0800 Message-Id: <02414951E0406C47BF01477DDF8D443E197A37@cobra.marimba.com> From: Arthur van Hoff To: "'http-delta@pa.dec.com'" Subject: FW: DRP and IETF WEBI Working Group Date: Mon, 5 Mar 2001 13:25:32 -0800 Mime-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Hi, For those that are interested in distribution protocols. Here is an interesting working group that is doing work related to caching, Delta-encoding, and DRP. Have fun, Arthur van Hoff -- From: Mark Nottingham [mnot@akamai.com] Subject: DRP and IETF WEBI Working Group Arthur, Although it's been some time since the publication of the DRP Note, I thought you might be interested in the work of the IETF WEBI (Web Intermediaries) Working Group. One of our work items is to define a "Resource Update Protocol" which is similar in many ways to DRP. Although the primary interest is currently in allowing invalidations to be sent to caches, there are other potential uses, which may be underrepresented in the work at this stage. If you (or anyone you know of either at Marimba or who worked on DRP) are interested, we'd very much welcome input into the work. Currently, we're gathering requirements, a first draft of which can be found at: http://www.ietf.org/internet-drafts/draft-ietf-webi-rup-reqs-00.txt If you have additional (or contrary) requirements based on your experience with DRP, we'd love to have them. Our charter is at: http://www.ietf.org/html.charters/webi-charter.html We'll be discussing the requirements during our meeting in the Minneapolis IETF. After the requirements are finalized, we'll be soliciting proposals to compare against the requirements; you might want to consider putting DRP into the ring. Cheers, -- Mark Nottingham, Research Scientist Akamai Technologies (San Mateo, CA USA) From mogul Thu Oct 11 16:30:26 2001 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA29237; Thu, 11 Oct 2001 16:30:26 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200110112330.QAA29237@wera.pa.dec.com> To: http-delta Subject: FWD: Protocol Action: Delta encoding in HTTP to Proposed Standard Date: Thu, 11 Oct 2001 16:30:26 -0700 X-Mts: smtp On October 3 of last year, I asked the IESG to approve the "Delta encoding in HTTP" specification as a "Proposed Standard". On October 11, they approved it. Unfortunately, it took them until October 11 of *this year* - the whole process took more than a year. I can't take responsibility for all of the delay, although in hindsight I should have been a lot more aggressive about bugging other people to get their jobs done. (I do have at least 124 archived email messages, mostly of the form "why is this taking so long and do you want me to anything more?" or "why haven't you answered my email for the past two months?") We finally broke the log-jam by removing the requirement that an implementation of the Delta Encoding specification SHOULD support, as a default, the "vcdiff" format. I decided that I had to do this because the vcdiff specification seemed not to be making progress, and because the Delta spec made this SHOULD-level reference to the vcdiff spec, the IESG refused to act on the Delta spec until vcdiff was ready. Our hope is that we can get vcdiff back on track very soon, and so by the time that the Delta spec is ready to go to Draft Standard status, vcdiff will also be ready for that. Then we should be able to restore the SHOULD-level reference that was deleted. The other changes that have been made since last October are basically cosmetic and/or procedural. The Delta specification is now in the RFC Editor's queue. I hope this part won't take too long, although there are standards-track documents that have been in this queue since February 2001. The next IETF stage is a "Draft Standard". From RFC2026: A specification from which at least two independent and interoperable implementations from different code bases have been developed, and for which sufficient successful operational experience has been obtained, may be elevated to the "Draft Standard" level. For the purposes of this section, "interoperable" means to be functionally equivalent or interchangeable components of the system or process in which they are used. If patented or otherwise controlled technology is required for implementation, the separate implementations must also have resulted from separate exercise of the licensing process. Elevation to Draft Standard is a major advance in status, indicating a strong belief that the specification is mature and will be useful. So the next step for our group is to find people to do two independent implementations of (at least) the major features of the Delta spec. Or, if any of you have already done implementations that match our current design, please let me know; we will need to document that they interoperate. Thanks for your patience. -Jeff ------- Forwarded Message Date: Thu, 11 Oct 2001 17:36:12 -0400 From: The IESG To: IETF-Announce: ; cc: RFC Editor , IANA , Internet Architecture Board Subject: Protocol Action: Delta encoding in HTTP to Proposed Standard The IESG has approved the Internet-Draft 'Delta encoding in HTTP' as a Proposed Standard. This has been reviewed in the IETF but is not the product of an IETF Working Group. The IESG contact persons are Patrik Faltstrom and Ned Freed. Technical Summary The document specify a way for an HTTP server and client to negotiate sending only the changed versions of a requested instance of a resource over a HTTP connection. This is especially interesting when a cache already have a version of the instance, and finds that the instance have changed. Research have shown that changes only in small parts of instances of resources are frequent, so the ability to only send changes would speed up the transactions. Working Group Summary The document is an individual submission to the IETF, but the specification have been discussed on the mailing list during the development of the document. It is noted that AT&T has filed an IPR note about this document. See http://www.ietf.org/ietf/IPR/AT&T-MOGUL-HTTP-DELTA. The IPR note was filed before the Last Call ended, and no concernes were rised from the community in regards of this IPR notice. Protocol Quality The protocol was reviewed for the IESG by Patrik Faltstrom. IANA Considerations: Section 10.2 specifies the creation of a new registry for instance-manipulation values. RFC-Editor note: Please delete section 13 at the time of publication. Author has asked for review of References Section (15) at time of publication. Suggestion is that this be handled via 48 hour notice. ------- End of Forwarded Message From jmacd@helen.cs.berkeley.edu Fri Oct 12 01:33:12 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id BAA10910; Fri, 12 Oct 2001 01:33:12 -0700 (PDT) Received: from mailrelay01.cac.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA01240; Fri, 12 Oct 2001 01:33:08 -0700 Received: by mailrelay01.cac.cpqcorp.net (Postfix, from userid 12345) id 8282DB26; Fri, 12 Oct 2001 01:33:08 -0700 (PDT) Received: from ztxmail02.ztx.compaq.com (ztxmail02.nz-cce.cpqcorp.net [161.114.8.206]) by mailrelay01.cac.cpqcorp.net (Postfix) with ESMTP id 60E01BAC; Fri, 12 Oct 2001 01:33:08 -0700 (PDT) Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345) id DB66B4359; Fri, 12 Oct 2001 03:33:07 -0500 (CDT) Received: from helen.CS.Berkeley.EDU (helen.CS.Berkeley.EDU [128.32.131.251]) by ztxmail02.ztx.compaq.com (Postfix) with ESMTP id 674A941A8; Fri, 12 Oct 2001 03:33:07 -0500 (CDT) Received: (from jmacd@localhost) by helen.CS.Berkeley.EDU (8.9.1a/8.9.1) id BAA03930; Fri, 12 Oct 2001 01:33:06 -0700 (PDT) Date: Fri, 12 Oct 2001 01:33:06 -0700 From: Josh MacDonald To: Jeffrey Mogul Cc: http-delta@pa.dec.com, mihut@cs.berkeley.edu Subject: Re: FWD: Protocol Action: Delta encoding in HTTP to Proposed Standard Message-Id: <20011012013306.A3901@helen.CS.Berkeley.EDU> References: <200110112330.QAA29237@wera.pa.dec.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200110112330.QAA29237@wera.pa.dec.com>; from mogul@pa.dec.com on Thu, Oct 11, 2001 at 04:30:26PM -0700 Quoting Jeffrey Mogul (mogul@pa.dec.com): > > So the next step for our group is to find people to do two independent > implementations of (at least) the major features of the Delta spec. > Or, if any of you have already done implementations that match > our current design, please let me know; we will need to document > that they interoperate. Our Xdelta/Xproxy prototype is a fairly close match to the current design, and it works. Mihut can provide more specific comments on the state of our protocol as compared to the draft. Our main issue has been with vcdiff. I have personally read the vcdiff draft several times and still I am not comfortable with the level of complexity. The current Xdelta encoding is quite simple to explain, although it is not well suited as a standard either--the encoder/decorder are automatically generated code right now. Delta encoding is still a sore point for us, and I would like to replace the existing code. Have you considered the old W3C/Marimba GDIFF encoding? At least it is easy to implement. Do we know of any independent vcdiff implementations? I think if the vcdiff proposal is to succeed it needs another author who has been successful to go through the draft and really improve the description. Currently I find it difficult to make sense of. For anyone interested in Xdelta/Xproxy: http://prdownloads.sourceforge.net/xdelta/xdelta-2.0-beta9.tar.gz -josh From mihut@eecs.berkeley.edu Fri Oct 12 11:14:47 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA31691; Fri, 12 Oct 2001 11:14:47 -0700 (PDT) Received: from mailrelay01.cac.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA11712; Fri, 12 Oct 2001 11:14:46 -0700 Received: by mailrelay01.cac.cpqcorp.net (Postfix, from userid 12345) id D81C2987; Fri, 12 Oct 2001 11:14:46 -0700 (PDT) Received: from ztxmail02.ztx.compaq.com (ztxmail02.nz-cce.cpqcorp.net [161.114.8.206]) by mailrelay01.cac.cpqcorp.net (Postfix) with ESMTP id 1EE85AF9 for ; Fri, 12 Oct 2001 11:14:46 -0700 (PDT) Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345) id 8200E42C7; Fri, 12 Oct 2001 13:14:45 -0500 (CDT) Received: from relay.EECS.Berkeley.EDU (relay.EECS.Berkeley.EDU [169.229.34.228]) by ztxmail02.ztx.compaq.com (Postfix) with ESMTP id 3A3D54115 for ; Fri, 12 Oct 2001 13:14:45 -0500 (CDT) Received: from EECS.Berkeley.EDU (mihut@argus.EECS.Berkeley.EDU [169.229.60.79]) by relay.EECS.Berkeley.EDU (8.9.3/8.9.3) with ESMTP id LAA24437 for ; Fri, 12 Oct 2001 11:14:44 -0700 (PDT) Received: from localhost (mihut@localhost) by EECS.Berkeley.EDU (8.9.3/8.9.3) with ESMTP id LAA09206 for ; Fri, 12 Oct 2001 11:14:40 -0700 (PDT) X-Authentication-Warning: argus.EECS.Berkeley.EDU: mihut owned process doing -bs Date: Fri, 12 Oct 2001 11:14:40 -0700 (PDT) From: Mihut Ionescu To: Subject: xProxy, an implementation of delta encoding in HTTP In-Reply-To: Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII xProxy is an HTTP dual-proxy system which implements delta encoding and compression to increase the performance of web traffic. A paper describing the architecture, implementation and performance evaluation of xProxy can be found at: http://www.cs.berkeley.edu/~mihut/xproxy-ms.pdf xProxy has been deployed at UC Berkeley. The paper evaluates the total bandwidth savings for all web resource types realized by using xProxy in "real life", as well as reduction in modem retrieval times for HTML pages. Moreoever, the paper provides detailed (comparative) information on the benefits of compression and delta encoding, and gives insight into how the size of deltas changes over time. The analysis is done in the context of both static and dynamic web content, and gives insight into the set of features that should be supported by a delta/compression enabled HTTP system that is to provide the maximum bandwidth savings with the least amount of computational overhead. xProxy is compatible with the protocol proposed by IETF, with the exception that it identifies versions based on their MD5's (using the "Delta-Base" header to indicate the client proxy version). However, this can be (easily) changed so that versions are identified based on entity tags. The current implementation supports the major features of the delta encoding specification, although others can be added if needed. I will provide later a detailed list of the protocol features supported by xProxy. Let me know if you have any suggestions. Mihut From bala@research.att.com Fri Oct 12 11:28:15 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA11204; Fri, 12 Oct 2001 11:28:15 -0700 (PDT) Received: from taynzmail03.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA06024; Fri, 12 Oct 2001 11:28:14 -0700 Received: by taynzmail03.nz-tay.cpqcorp.net (Postfix, from userid 12345) id 346C2523; Fri, 12 Oct 2001 14:28:14 -0400 (EDT) Received: from zmamail01.zma.compaq.com (zmamail01.nz-tay.cpqcorp.net [161.114.72.101]) by taynzmail03.nz-tay.cpqcorp.net (Postfix) with ESMTP id 2EEEE57F for ; Fri, 12 Oct 2001 14:28:14 -0400 (EDT) Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345) id F1C981578; Fri, 12 Oct 2001 14:28:13 -0400 (EDT) Received: from mail-green.research.att.com (H-135-207-30-103.research.att.com [135.207.30.103]) by zmamail01.zma.compaq.com (Postfix) with ESMTP id D5AA31531 for ; Fri, 12 Oct 2001 14:28:13 -0400 (EDT) Received: from raptor.research.att.com (raptor.research.att.com [135.207.23.32]) by mail-green.research.att.com (Postfix) with ESMTP id 847251E0A7 for ; Fri, 12 Oct 2001 14:28:10 -0400 (EDT) Received: from localhost (bala@localhost) by raptor.research.att.com (SGI-8.9.3/8.8.7) with SMTP id OAA70183 for ; Fri, 12 Oct 2001 14:28:10 -0400 (EDT) Message-Id: <200110121828.OAA70183@raptor.research.att.com> X-Authentication-Warning: raptor.research.att.com: bala@localhost didn't use HELO protocol To: http-delta@pa.dec.com Subject: xproxy Date: Fri, 12 Oct 2001 14:28:10 -0400 From: Balachander Krishnamurthy the url should be http://www.cs.berkeley.edu/~mihut/xproxy/xproxy-ms.pdf and not http://www.cs.berkeley.edu/~mihut/xproxy-ms.pdf as stated in the mail From mogul@pa.dec.com Fri Oct 12 16:38:53 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA06127; Fri, 12 Oct 2001 16:38:53 -0700 (PDT) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA12538; Fri, 12 Oct 2001 16:38:53 -0700 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA29028; Fri, 12 Oct 2001 16:38:52 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200110122338.QAA29028@wera.pa.dec.com> To: Josh MacDonald Cc: http-delta@pa.dec.com Subject: Re: FWD: Protocol Action: Delta encoding in HTTP to Proposed Standard In-Reply-To: Your message of "Fri, 12 Oct 2001 01:33:06 PDT." <20011012013306.A3901@helen.CS.Berkeley.EDU> Date: Fri, 12 Oct 2001 16:38:52 -0700 X-Mts: smtp Josh MacDonald wrote: Our Xdelta/Xproxy prototype is a fairly close match to the current design, and it works. Mihut can provide more specific comments on the state of our protocol as compared to the draft. Our main issue has been with vcdiff. I have personally read the vcdiff draft several times and still I am not comfortable with the level of complexity. Several comments: (1) Vcdiff is no longer a SHOULD-level requirement of the Delta spec, so that shouldn't currently be an issue. (We hope to restore this requirement later on, though). (2) Can you distinguish between your Xdelta/Xproxy implementation from the protocol that it implements? The IETF will require us to show implementations of the actual Delta protocol (as specified in the soon-to-appear RFC), not something "fairly close". I would hope that your code could be modified fairly easily, though. The current Xdelta encoding is quite simple to explain, although it is not well suited as a standard either--the encoder/decorder are automatically generated code right now. Delta encoding is still a sore point for us, and I would like to replace the existing code. Have you considered the old W3C/Marimba GDIFF encoding? At least it is easy to implement. "gdiff" is indeed already included in the soon-to-be-created IANA registry. In the current version of the Delta spec, it has equal status with vcdiff. So it might be a better format for initial implementations. (Not necessarily "better" in the sense of efficiency, just in the sense of making it easier to start hacking on the Delta protocol itself!) I think if the vcdiff proposal is to succeed it needs another author who has been successful to go through the draft and really improve the description. Currently I find it difficult to make sense of. Phong Vo just submitted a revised version, http://www.ietf.org/internet-drafts/draft-korn-vcdiff-05.txt which was announced today on the IETF-Annouce list. This draft specifies exactly the same format as previous drafts, but we have edited it a lot, both in the hope of making it clearer, and also to remove the need to understand the C code. There is still some C code in the draft, but it is only there for clarification purposes. Phong and I have already started working on a few minor clarifications for an -06 version of this, but nothing major. Do we know of any independent vcdiff implementations? This is something that I am about to start working on. Since I helped Phong edit the latest draft, neither one of us is a good candidate for doing an "independent" implementation. If anyone on this list would like to try to do an implementation, that would be wonderful, otherwise I will try to con one of my colleagues into it. Note that at this point, we basically need to find someone to write a vcdiff *decoder*, which should be fairly simple. And it can be as dumb as possible, there is no need at this point to do something very fast. This would be sufficient (I hope) to demonstrate that the spec is written clearly enough. Later on, we should also find someone to write an independent implementation of an encoder. However, this is a trickier problem, because the encoder has a lot more freedom of action than the decoder. One could write an "encoder" that was very simple, but it might not generate a particularly compact encoding. AT&T has agreed to make the existing encoder source code available "for anyone to use to transmit data via HTTP/1.1 Delta Encoding," so as a practical matter, a high-quality encoder is already available. But an independent implementation is going to be required before we can get Draft Standard status. -Jeff From avh@marimba.com Fri Oct 12 16:42:55 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA13758; Fri, 12 Oct 2001 16:42:55 -0700 (PDT) Received: from mailrelay01.cac.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA18833; Fri, 12 Oct 2001 16:42:55 -0700 Received: by mailrelay01.cac.cpqcorp.net (Postfix, from userid 12345) id 62EE1BAD; Fri, 12 Oct 2001 16:42:55 -0700 (PDT) Received: from zmamail02.zma.compaq.com (zmamail02.nz-tay.cpqcorp.net [161.114.72.102]) by mailrelay01.cac.cpqcorp.net (Postfix) with ESMTP id 2856C83A; Fri, 12 Oct 2001 16:42:55 -0700 (PDT) Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345) id B16284E42; Fri, 12 Oct 2001 19:42:54 -0400 (EDT) Received: from cobra.marimba.com (unknown [207.126.123.66]) by zmamail02.zma.compaq.com (Postfix) with ESMTP id 194EE4DE7; Fri, 12 Oct 2001 19:42:54 -0400 (EDT) Received: by cobra.marimba.com with Internet Mail Service (5.5.2653.19) id ; Fri, 12 Oct 2001 16:42:53 -0700 Message-Id: <02414951E0406C47BF01477DDF8D443E19800F@cobra.marimba.com> From: Arthur van Hoff To: "'Jeffrey Mogul'" , Josh MacDonald Cc: http-delta@pa.dec.com, Jonathan Payne Subject: RE: FWD: Protocol Action: Delta encoding in HTTP to Proposed Stan dard Date: Fri, 12 Oct 2001 16:42:03 -0700 Mime-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Hi Jeff, We have an vcdiff prototype implementation, but we dont use it in any of our products. The format is easy enough to implement but constructing a good enough algorithm that works for various file sizes has proven much harder. We still use compressed gdiff in our products. Have fun, Arthur van Hoff > -----Original Message----- > From: Jeffrey Mogul [mailto:mogul@pa.dec.com] > Sent: Friday, October 12, 2001 4:39 PM > To: Josh MacDonald > Cc: http-delta@pa.dec.com > Subject: Re: FWD: Protocol Action: Delta encoding in HTTP to Proposed > Standard > > > Josh MacDonald wrote: > > Our Xdelta/Xproxy prototype is a fairly close match to the current > design, and it works. Mihut can provide more specific > comments on > the state of our protocol as compared to the draft. Our > main issue > has been with vcdiff. I have personally read the vcdiff > draft several > times and still I am not comfortable with the level of > complexity. > > Several comments: > (1) Vcdiff is no longer a SHOULD-level requirement of the Delta > spec, so that shouldn't currently be an issue. (We hope to restore > this requirement later on, though). > > (2) Can you distinguish between your Xdelta/Xproxy implementation > from the protocol that it implements? The IETF will require us > to show implementations of the actual Delta protocol (as specified > in the soon-to-appear RFC), not something "fairly close". I would > hope that your code could be modified fairly easily, though. > > The current Xdelta encoding is quite simple to explain, > although it > is not well suited as a standard either--the encoder/decorder are > automatically generated code right now. Delta encoding > is still a > sore point for us, and I would like to replace the existing code. > Have you considered the old W3C/Marimba GDIFF encoding? At least > it is easy to implement. > > "gdiff" is indeed already included in the soon-to-be-created > IANA registry. In the current version of the Delta spec, it has > equal status with vcdiff. So it might be a better format for > initial implementations. (Not necessarily "better" in the > sense of efficiency, just in the sense of making it easier to > start hacking on the Delta protocol itself!) > > I think if the vcdiff proposal is to succeed it needs > another author > who has been successful to go through the draft and really improve > the description. Currently I find it difficult to make sense of. > > Phong Vo just submitted a revised version, > > http://www.ietf.org/internet-drafts/draft-korn-vcdiff-05.txt > > which was announced today on the IETF-Annouce list. This draft > specifies exactly the same format as previous drafts, but we have > edited it a lot, both in the hope of making it clearer, and also > to remove the need to understand the C code. There is still some > C code in the draft, but it is only there for clarification purposes. > > Phong and I have already started working on a few minor clarifications > for an -06 version of this, but nothing major. > > Do we know of any independent vcdiff implementations? > > This is something that I am about to start working on. Since I > helped Phong edit the latest draft, neither one of us is a good > candidate for doing an "independent" implementation. If anyone > on this list would like to try to do an implementation, that > would be wonderful, otherwise I will try to con one of my > colleagues into it. > > Note that at this point, we basically need to find someone to > write a vcdiff *decoder*, which should be fairly simple. And it > can be as dumb as possible, there is no need at this point to do > something very fast. This would be sufficient (I hope) to > demonstrate that the spec is written clearly enough. > > Later on, we should also find someone to write an independent > implementation of an encoder. However, this is a trickier > problem, because the encoder has a lot more freedom of action > than the decoder. One could write an "encoder" that was very > simple, but it might not generate a particularly compact > encoding. > > AT&T has agreed to make the existing encoder source code > available "for anyone to use to transmit data via HTTP/1.1 > Delta Encoding," so as a practical matter, a high-quality > encoder is already available. But an independent implementation > is going to be required before we can get Draft Standard status. > > -Jeff > From mogul Fri Oct 12 16:51:05 2001 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA00382; Fri, 12 Oct 2001 16:51:04 -0700 (PDT) From: Jeffrey Mogul Message-Id: <200110122351.QAA00382@wera.pa.dec.com> To: cc: http-delta Subject: Re: FWD: Protocol Action: Delta encoding in HTTP to Proposed Standard In-reply-to: Your message of "Fri, 12 Oct 2001 15:42:23 EDT." <20011012194438.7B575C8B5@ztxmail01.ztx.compaq.com> Date: Fri, 12 Oct 2001 16:51:04 -0700 X-Mts: smtp danielh@crosslink.net writes: One favor -- could you post the url to the latest version of the delta encoding document. Sure (for some odd reason, the IETF announcements of "Protocol Actions" don't include URLs!): http://www.ietf.org/internet-drafts/draft-mogul-http-delta-10.txt -Jeff From mihut@eecs.berkeley.edu Mon Oct 15 18:53:58 2001 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA13274; Mon, 15 Oct 2001 18:53:58 -0700 (PDT) Received: from mailrelay01.cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA01155; Mon, 15 Oct 2001 18:53:58 -0700 Received: by mailrelay01.cce.cpqcorp.net (Postfix, from userid 12345) id 1786C58A; Mon, 15 Oct 2001 20:53:58 -0500 (CDT) Received: from ztxmail01.ztx.compaq.com (ztxmail01.nz-cce.cpqcorp.net [161.114.8.205]) by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id 069DB41E for ; Mon, 15 Oct 2001 20:53:58 -0500 (CDT) Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345) id D6170CB5A; Mon, 15 Oct 2001 20:53:55 -0500 (CDT) Received: from relay.EECS.Berkeley.EDU (relay.EECS.Berkeley.EDU [169.229.34.228]) by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id 768FDC8B4 for ; Mon, 15 Oct 2001 20:53:55 -0500 (CDT) Received: from EECS.Berkeley.EDU (mihut@argus.EECS.Berkeley.EDU [169.229.60.79]) by relay.EECS.Berkeley.EDU (8.9.3/8.9.3) with ESMTP id SAA13293 for ; Mon, 15 Oct 2001 18:53:54 -0700 (PDT) Received: from localhost (mihut@localhost) by EECS.Berkeley.EDU (8.9.3/8.9.3) with ESMTP id SAA01845 for ; Mon, 15 Oct 2001 18:53:52 -0700 (PDT) X-Authentication-Warning: argus.EECS.Berkeley.EDU: mihut owned process doing -bs Date: Mon, 15 Oct 2001 18:53:52 -0700 (PDT) From: Mihut Ionescu To: Subject: xProxy features Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII xProxy supports the GET and POST methods, most of the relevant HTTP/1.1 headers and cache directives, as well as entity tags and cookies. The current implementation "clusters" URLs that execute the same CGI program but have a different CGI query string (versions are indexed by the prefix string up to the '?' character). xProxy supports (only) xdelta for delta encoding and gzip for compression. It is compatible with the IETF proposed protocol, with the exception that it identifies versions based on their MD5's (using the "Delta-Base" header to indicate the client proxy version). This can be (easily) changed to identify versions based on entity tags. xProxy supports the major features of the IETF specification, with the following exceptions: * No support for (deltas on) byte ranges. * No support for the cache directive "retain" ... XDFS (xDelta File System), the versioned cache used, did not support deletions when xProxy was implemented. * Client proxy specifies in the request only the most recent version it holds. It should be fairly easy to modify the client proxy to indicate multiple versions in the request and add the necessary logic in the server proxy. Therefore, version identification based on etag and indication of multiple versions in a request should complete the xProxy implementation. Let me know if you have any questions or suggestions. Mihut From mogul Mon Nov 19 18:13:45 2001 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id SAA01819; Mon, 19 Nov 2001 18:13:45 -0800 (PST) From: Jeffrey Mogul Message-Id: <200111200213.SAA01819@wera.pa.dec.com> To: http-delta Subject: FWD: Protocol Action: Instance Digests in HTTP to Proposed Standard Date: Mon, 19 Nov 2001 18:13:45 -0800 X-Mts: smtp On October 3 of last year, I asked the IESG to approve the "Instance Digests in HTTP" specification as a "Proposed Standard". Today, they finally approved it. This one should have been approved the same day as the HTTP Delta Encoding proposal (October 11, 2001) but the Area Director forgot to put it on the IESG agenda, so it languished for a while, until I thought to ask why nothing had happened. -Jeff From: iesg-secretary@ietf.org (The IESG) Subject: Protocol Action: Instance Digests in HTTP to Proposed Standard Date: Mon, 19 Nov 2001 23:23:35 +0000 (UTC) Message-ID: <200111192255.RAA01891@ietf.org> Cc: RFC Editor , IANA , Internet Architecture Board To: IETF-Announce The IESG has approved the Internet-Draft 'Instance Digests in HTTP' as a Proposed Standard. This has been reviewed in the IETF but is not the product of an IETF Working Group. The IESG contact persons are Patrik Faltstrom and Ned Freed. Technical Summary HTTP/1.1 defines a Content-MD5 header that allows a server to include a digest of the response body. However, this is specifically defined to cover the body of the actual message, not the contents of the full file (which might be quite different, if the response is a Content-Range, or uses a delta encoding). Also, the Content-MD5 is limited to one specific digest algorithm; other algorithms, such as SHA-1, may be more appropriate in some circumstances. Finally, HTTP/1.1 provides no explicit mechanism by which a client may request a digest. This document proposes HTTP extensions that solve these problems. Working Group Summary This is an individual submission to the IETF, but, the document has been discussed on various mailing lists which have to do with the HTTP protocol. Protocol Quality The spec was reviewed by Patrik Faltstrom. From remove@eif.net Tue Jan 1 09:28:01 2002 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id JAA22887; Tue, 1 Jan 2002 09:28:00 -0800 (PST) Received: from taynzmail03.nz-tay.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA06404; Tue, 1 Jan 2002 09:28:00 -0800 Received: by taynzmail03.nz-tay.cpqcorp.net (Postfix, from userid 12345) id 17685FE8; Tue, 1 Jan 2002 12:28:00 -0500 (EST) Received: from ztxmail02.ztx.compaq.com (ztxmail02.nz-cce.cpqcorp.net [161.114.8.206]) by taynzmail03.nz-tay.cpqcorp.net (Postfix) with ESMTP id DE4B8C20 for ; Tue, 1 Jan 2002 12:27:59 -0500 (EST) Received: by ztxmail02.ztx.compaq.com (Postfix, from userid 12345) id 7FEA433AB; Tue, 1 Jan 2002 11:27:59 -0600 (CST) Received: from ruby.he.net (ruby.he.net [216.218.187.2]) by ztxmail02.ztx.compaq.com (Postfix) with ESMTP id 3230D30E3 for ; Tue, 1 Jan 2002 11:27:59 -0600 (CST) Received: from eif.net ([212.161.14.187] (may be forged)) by ruby.he.net (8.8.6/8.8.2) with SMTP id JAA06058; Tue, 1 Jan 2002 09:27:40 -0800 Message-Id: <200201011727.JAA06058@ruby.he.net> From: "HAPPY NEW YEAR FROM EIF" To: Subject: NEW YEAR EIF OFFER + CHASE OFFER Sender: "HAPPY NEW YEAR FROM EIF" Mime-Version: 1.0 Content-Type: text/html; charset="ISO-8859-1" Date: Tue, 1 Jan 2002 17:23:59 -0800 X-Priority: 1 (Highest) Content-Transfer-Encoding: 8bit Eif Security Solutions and Rapid Traffic Search Optimization

WISHING YOU ALL A VERY HAPPY AND PROSPEROUS NEW YEAR!

FREE PC FIREWALL AND ANTIVIRUS TO ALL THE HUMAN BEINGS CONTACTING US!

THANKS

I TAKE THIS OPPORTUNITY TO TAKE YOU THE MESSAGE FOR THE END OF THE YEAR OF THE PRESIDENT:

'THE ALMOST BEST FUTURE CAN BE MADE BETTER' G CRASTI PRESIDENT HYKSOS GROUP

Rob Tel + 39 32 00 25 80 44

Fax + 1 212 656 1546

Rob Photo

www.eif.net

Apply Now for the Chase Platinum Credit Card iCard Holiday Rewards_468 Shop Safely_468X60 Outtatown 468x60 Platinum Lollipops 468x60

NOTE: If you have asked to be removed from our mailing list, and are continuing to receive our emails, please send us the names of any older or alias email addresses. These sometimes are forwarded to the new mail address and we must delete these older or alias addresses from our list in order to stop mail from reaching your current address. We apologize for any inconvenience and appreciate your continued patience and cooperation.Your address is Opt in under G law.

If you require I will send you $ 6 + apologies having your address.

remove@eif.net From mogul Tue Jan 8 11:34:09 2002 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA21860; Tue, 8 Jan 2002 11:34:08 -0800 (PST) Message-Id: <200201081934.LAA21860@wera.pa.dec.com> From: To: http-delta X-Original-Date: Sat, 29 Dec 2001 23:12:18 -0500 Reply-To: danielh@crosslink.net Subject: Dcluster comment X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.25/25 Date: Tue, 08 Jan 2002 11:34:08 -0800 Sender: mogul X-Mts: smtp Per usual, I've been sidetracked and have not yet started implementing Delta (much less dcluster). But it still near the front burner (part of what I got sidetracked onto is writing a CRON-like task manager that I will use in my eventual implementation of delta). BTW: happy new year! =============================================================== 29 Dec 2001 Having (re)read the dcluster-00 draft, and having serendipitously reviewed some earlier comments I had on earlier version of this draft (25 Sept 2000 expiration), there is one important issue that needs to be addressed. Other points, including editorial comments, are partially related to clarifying this issue. The issue is whether an instance must be "explicitily" incorporated into a Dcluster, or whether this can be "implicit". That is, if a Dcluster is not provided in a response, can it only be used as a delta-base in future requests for the same request-uri? BTW: It would be nice if there was a term that meant "all the request-uris that match a Dcluster" and "all the instances, both their contents and their Etags, that are members of a Dcluster". Uniqueness scope isn't quite it. Consider the case where a first request generates a response with a Dcluster instance header. For example: GET /foo?p=1 HTTP/1.1 Host: bar.example.net A-IM: vcdiff yielding: HTTP/1.1 200 OK Date: Sun, 06 Nov 2001 08:49:37 GMT Etag: "abc" DCluster: "//bar.example.net/foo?" Suppose a later request of: GET /foo?r=1 HTTP/1.1 Host: bar.example.net If-None-Match: "abc" A-IM: vcdiff yields: HTTP/1.1 200 OK Date: Sun, 06 Nov 2001 08:49:37 GMT Etag: "abc" IM: vcdiff Note that this response does NOT have a Dcluster instance header. The question is what instances can be used as delta bases in an even later request for "foo?s=1". There are two possibilities: i) Explicit assignations only: GET /foo?s=1 HTTP/1.1 Host: bar.example.net If-None-Match: "abc" A-IM: vcdiff Here, "abc" is used because the first response, (which has an "abc" etag) explicitily contains a Dcluster whose value is an abbreviaton for the request-uri (for /foo?s=1). ii) Implicit assignations also: GET /foo?s=1 HTTP/1.1 Host: bar.example.net If-None-Match: "abc","def" A-IM: vcdiff Here, "def" is also used because: a) the "def" instance was received after recieving the "//bar.example.net/foo?" Dcluster definition, and b) "//bar.example.net/foo?" matches /foo?s=1. The first interpretation is much more restrictive. It means that an instance can only be used as a delta-base for future request-uris that: i) are the same as the request-uri that generated the instance ii) match the Dcluster defined (in a Dcluster instance header) with this instance The second interpretation implies both of the above. In addition, any future instance, that is returned as a response to a request-uri that matches this DCluster, can be used as a delta-base to any other request-uri that also matches this Dcluster (I abstract from some timing conditions). In a sense, the first interpretation is a one to many relationship -- one instance can be associated with many request-uris that match one (or perhaps several) Dclusters. The fact that many instances may have been returned with the same Dcluster definition(s), loosens but does not fundamentally change this one-to-many relationship. The second interpretation defines a many-to-many relationship -- all instances whose request-uris match a given Dcluster can be used as delta-bases for any other request-uri that matches this given Dcluster. These two interpretations have different strengths and weaknesses: 1a) Explicit (one-to-many) advantages i) The server has fine grained control of what the client ought to consider using as delta-bases for future request-uris. ii) Dclusters can be terminated, simply by expiring all instances that include this Dcluster (say, by using the Retain token of Cache-Control) 1b) Implicit (many-to-many) advantages: i) The server can easily define broad Dclusters, with just one Dcluster header ii) With broad Dclusters, the client has great range of delta-bases to choose from. 1a) Explicit (one-to-many) disadvantages i) By only allowing instances to be used in Dcluster when explicitily declared, the opportunities for using a good match are diminished ii) There is a small cost of sending a Dcluster header whenever needed (as opposed to sending it just once). 2b) Implicit (many-to-many) disadvantages: i) By allowing many instances to be used as delta-bases, the client may end up using a poor set. ii) Or, the client may send very large If-None-Match request headers iii) Once defined, there is no method of terminating a Dcluster. In particular, a Dcluster may persist long after the instance that originated it has expired. Overall, I lean toward the first interpretation. It's not quite as powerful, but I like the fine grain control it offers. I'm also uncomfortable with the permanence of Dclusters (point 2.b.iii). In any case, whatever interpretation is adopted it needs to be clearly described. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From danielh@crosslink.net Wed Jan 9 15:20:55 2002 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA09121; Wed, 9 Jan 2002 15:20:55 -0800 (PST) From: Received: from mailrelay01.cac.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA21352; Wed, 9 Jan 2002 15:20:54 -0800 Received: by mailrelay01.cac.cpqcorp.net (Postfix, from userid 12345) id 28E13147A; Wed, 9 Jan 2002 15:20:54 -0800 (PST) Received: from ztxmail01.ztx.compaq.com (ztxmail01.nz-cce.cpqcorp.net [161.114.8.205]) by mailrelay01.cac.cpqcorp.net (Postfix) with ESMTP id D2E081642 for ; Wed, 9 Jan 2002 15:20:53 -0800 (PST) Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345) id 578B484C8; Wed, 9 Jan 2002 17:20:53 -0600 (CST) Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id 130C38680 for ; Wed, 9 Jan 2002 17:20:53 -0600 (CST) Received: from danielh (mail.dannyh.org [209.147.90.202]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id SAA29679 for ; Wed, 9 Jan 2002 18:20:52 -0500 Message-Id: <200201092320.SAA29679@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Wed, 09 Jan 2002 18:20:16 -0500 To: http-delta@pa.dec.com Subject: A note on delta and range X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.25/25 8 Jan 2001 A comment on delta and ranges (arising from my ongoing programming of a delta-encoding module). 1) It would be useful to include a short note on how the usual rule of client decoding, that one should do the last first, don't quite apply in some cases. In particular, when a delta (say, DIFFE) follows a range. Example: A request: GET /foo.html HTTP/1.1 Host: bar.example.net If-None-Match: "abc" A-IM: range,diffe,gzip Range: bytes=100-1000 yielding: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "def" Delta-Base: "abc" IM: range,diffe,gzip Content-Range: 100-1000/2000 Content-Length: 901 Content-Type: text/html When the client decodes this entity-body, it should: 1) UNGZIP it 2) extract bytes 100-1000 from the "abc" base-instance 3) Use this UnGzipped entity body as a difference file, and apply it to bytes 100-1000 of "abc" 4) This yields bytes 100-1000 of "def" Note that the client has to "look ahead", to the range token of IM, and the Content-Range header (so as to know that the delta is meant to be used against only a portion of "abc"). This bends the rule of "apply encodings from last to first", hence warrants a warning note. 2) In multi-part responses with delta-encoding, it's left up in the air what (if any) "part headers" should be used, especially the content-type. I assume that content-type "part headers" should be that of the current instance; regardless of where RANGE may appear in the A-IM header. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Wed Jan 9 16:53:25 2002 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA28785; Wed, 9 Jan 2002 16:53:25 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA23527; Wed, 9 Jan 2002 16:53:25 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA16244; Wed, 9 Jan 2002 16:53:24 -0800 (PST) From: Jeffrey Mogul Message-Id: <200201100053.QAA16244@wera.pa.dec.com> To: danielh@crosslink.net Cc: http-delta@pa.dec.com Subject: Re: A note on delta and range In-Reply-To: Your message of "Wed, 09 Jan 2002 18:20:16 EST." <200201092320.SAA29679@lycanthrope.crosslink.net> Date: Wed, 09 Jan 2002 16:53:24 -0800 X-Mts: smtp Thanks for your note. 1) It would be useful to include a short note on how the usual rule of client decoding, that one should do the last first, don't quite apply in some cases. In particular, when a delta (say, DIFFE) follows a range. Example: A request: GET /foo.html HTTP/1.1 Host: bar.example.net If-None-Match: "abc" A-IM: range,diffe,gzip Range: bytes=100-1000 yielding: HTTP/1.1 200 OK Date: Sun, 06 Nov 1994 08:49:37 GMT Etag: "def" Delta-Base: "abc" IM: range,diffe,gzip Content-Range: 100-1000/2000 Content-Length: 901 Content-Type: text/html When the client decodes this entity-body, it should: 1) UNGZIP it 2) extract bytes 100-1000 from the "abc" base-instance 3) Use this UnGzipped entity body as a difference file, and apply it to bytes 100-1000 of "abc" 4) This yields bytes 100-1000 of "def" Note that the client has to "look ahead", to the range token of IM, and the Content-Range header (so as to know that the delta is meant to be used against only a portion of "abc"). This bends the rule of "apply encodings from last to first", hence warrants a warning note. Are you sure about this? I don't have time right now to do a careful analysis of your example, but I believe we tried to cover this quite carefully in section 2) In multi-part responses with delta-encoding, it's left up in the air what (if any) "part headers" should be used, especially the content-type. I assume that content-type "part headers" should be that of the current instance; regardless of where RANGE may appear in the A-IM header. Doesn't Section 10.10 (Delta encoding and multipart/byteranges) cover this? Remember that (RFC 2616, section 3.7.2, Multipart Types): In general, HTTP treats a multipart message-body no differently than any other media type: strictly as payload. The one exception is the "multipart/byteranges" type (appendix 19.2) when it appears in a 206 (Partial Content) response, which will be interpreted by some HTTP caching mechanisms as described in sections 13.5.4 and 14.16. In all other cases, an HTTP user agent SHOULD follow the same or similar behavior as a MIME user agent would upon receipt of a multipart type. The MIME header fields within each body-part of a multipart message- body do not have any significance to HTTP beyond that defined by their MIME semantics. so I think the *only* case of "multipart/*" that the Delta encoding specification needs to cover is "multipart/byteranges". -Jeff From danielh@crosslink.net Wed Jan 9 19:05:48 2002 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id TAA22231; Wed, 9 Jan 2002 19:05:48 -0800 (PST) From: Received: from mailrelay01.cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA22746; Wed, 9 Jan 2002 19:05:47 -0800 Received: by mailrelay01.cce.cpqcorp.net (Postfix, from userid 12345) id 48D511EB0; Wed, 9 Jan 2002 21:05:47 -0600 (CST) Received: from zmamail01.zma.compaq.com (zmamail01.nz-tay.cpqcorp.net [161.114.72.101]) by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id EAA261F48 for ; Wed, 9 Jan 2002 21:05:46 -0600 (CST) Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345) id 6645D36B8; Wed, 9 Jan 2002 22:05:46 -0500 (EST) Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by zmamail01.zma.compaq.com (Postfix) with ESMTP id 02F443625 for ; Wed, 9 Jan 2002 22:05:45 -0500 (EST) Received: from danielh (mail.dannyh.org [209.147.90.202]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id WAA17001 for ; Wed, 9 Jan 2002 22:05:45 -0500 Message-Id: <200201100305.WAA17001@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Wed, 09 Jan 2002 22:06:09 -0500 To: http-delta@pa.dec.com In-Reply-To: <200201100054.QAA32396@wera.pa.dec.com> Subject: Last minute nits X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.25/25 >By the way, as you have seen today, the "Proposed Standard" >RFC for Delta Encoding is about to come out. We can't make any technical >changes at this point. (And PLEASE don't try to >delay that RFC any longer!) If you can convince me that you've found >real bugs, those can be addressed before we go to >Draft Standard. What, you mean a two (or 2.3) year delay is a lot? You must not work for the government :> Here are my last points. The first one is longish and refers to my most recent comment. The latter three are shorter. Do what you want, none of them are important enough to stop the clock. ------------------- 1) Regarding range,diffe (or range,vcdiff or range,gdiff): >> 1) It would be useful to include a short note on how >> the usual rule of client decoding, that one should do the last first, >> don't quite apply in some cases. In particular, >> when a delta (say, DIFFE) follows a range. >> >> Example: >> >> A request: >> GET /foo.html HTTP/1.1 >> Host: bar.example.net >> If-None-Match: "abc" >> A-IM: range,diffe,gzip >> Range: bytes=100-1000 >> >> yielding: >> HTTP/1.1 200 OK >> Date: Sun, 06 Nov 1994 08:49:37 GMT >> Etag: "def" >> Delta-Base: "abc" >> IM: range,diffe,gzip >> Content-Range: 100-1000/2000 >> Content-Length: 901 >> Content-Type: text/html >> >> When the client decodes this entity-body, it should: >> 1) UNGZIP it >> 2) extract bytes 100-1000 from the "abc" base-instance >> 3) Use this UnGzipped entity body as a difference file, >> and apply it to bytes 100-1000 of "abc" >> 4) This yields bytes 100-1000 of "def" >> >> Note that the client has to "look ahead", to the range token >> of IM, and the Content-Range header (so as to know that the delta >> is meant to be used against only a portion of "abc"). >> This bends the rule of "apply encodings from last to first", >> hence warrants a warning note. >Are you sure about this? I don't have time right now to do a >careful analysis of your example, but I believe we tried to cover >this quite carefully in section I looked through rfc3329 and here's what I found that is relevant: i) Section 5.7 (Examples of requests combining Range and delta encoding) discusses range and delta, but does not contain any examples of a server response that contains Range in the IM header. ii) Section 10.5.2 mentions Range and IM, as follows: As a special case, if the instance-manipulations include both range selection and at least one other non-identity instance-manipulation, the IM header field MUST be used to indicate the order in which all of these instance-manipulations, including range selection, were applied. If the IM header lists the "range" instance-manipulation, the response MUST include either a Content-Range header or a multipart/byteranges Content-Type in which each part contains a Content-Range header. (See section 10.10 for specific discussion of combining delta encoding and multipart/byteranges.) Responses that include an IM header MUST carry a response status code of 226 (IM Used), as specified in section 10.4.1. The server SHOULD omit the IM header if it would list only the "range" instance-manipulation. Such responses would normally be sent with response status code 206 (Partial Content), as specified by HTTP/1.1 [10]. iii) also in 10.5.2 IM: range, vcdiff This example indicates that one or more ranges of the instance have been selected, and the result has then been delta encoded against identical ranges of a previous base instance. Note of these address my point: A client MUST check that a differencing (say, a diffe) may be done against a range; this check consisting of looking for a RANGE token preceding one of the delta tokens (vcdiff, gdiff, and diffe). If this is the case, the the client should first extract the range (as provided in a Content-Range header) of the base-instance, and apply the difference to this extracted range, thereby creating the r ange of the current instance. I also note that a similar language problem occurs when taking a "range of a difference". When considering the range of a difference: the range is NOT of the instance, it is of the entity body that will be used to create an instance. So... we can either punt, and hope that the client implementors are smart enough to realize the "last to first" decoding rule should NOT be slavishly followed (and that the "range as range of instance" rule is not always the case). Personally, I think a few "implementation notes" would suffice; perhaps something like the preceding two paragraphs added to 10.5.2. ------------------- 2)Section 10.7.3 The phrase: (i.e., without any content coding), after recovering that entity by applying the delta to it's previous cache entry. since the previous cache entry, "abc", has a GZIP content coding, this should say: (i.e., without any content coding), after recovering that entity by applying the delta to an unGZIP'ed version of it's previous cache entry. Or, to be pedantic: (i.e., without any content coding), after recovering that entity by applying the delta to an unGZIP'ed version of the "abc" cache entry (which as a GZIP content-coding). ------------------- 3) In 10.5.3 The server's choice about whether to apply an instance-manipulation SHOULD be independent of its choice to apply any subsequent two-input instance-manipulations to the response. (Two-input instance- manipulations include delta-codings, because they take two different values as input. Compression and "range" instance-manipulations take only one input. Other instance-manipulations may be defined in the future.) Note: the intent of this requirement is to prevent the server from generating a delta-encoded response that the client can only decode by first applying an instance-manipulation encoding to its cached base instance. A server implementor might wish to consider what the client would logically have in its cache, when deciding which instance-manipulations to apply prior to a delta-coding. A forward reference to 10.7.1 would help. BTW: I still find this obtuse. For example, suppose the client says (and yes, I realize this is a peculiar example) A-IM: gzip,gdfiff and the server acquieseces, returning IM: gzip,gdiff Well... this requires that the client "applies an instance-manipulation encoding to its cached base instance". In fact, that's what the client explicitily requested! So the word "prevent" is kind of contradictory. ------------------- 4) In 10.7.2 When a client receives a delta response with one or more non-identity content codings: This should be: When a client receives a delta response with one or more non-identity content codings, or the base-instance has one or more non-identity content coding: ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From danielh@crosslink.net Sat Jan 12 10:12:21 2002 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA01427; Sat, 12 Jan 2002 10:12:21 -0800 (PST) From: Received: from mailrelay01.cac.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA25204; Sat, 12 Jan 2002 10:12:21 -0800 Received: by mailrelay01.cac.cpqcorp.net (Postfix, from userid 12345) id 28E1C15ED; Sat, 12 Jan 2002 10:12:21 -0800 (PST) Received: from zmamail01.zma.compaq.com (zmamail01.nz-tay.cpqcorp.net [161.114.72.101]) by mailrelay01.cac.cpqcorp.net (Postfix) with ESMTP id 018101723 for ; Sat, 12 Jan 2002 10:12:20 -0800 (PST) Received: by zmamail01.zma.compaq.com (Postfix, from userid 12345) id 70893355F; Sat, 12 Jan 2002 13:12:20 -0500 (EST) Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by zmamail01.zma.compaq.com (Postfix) with ESMTP id 39C183625 for ; Sat, 12 Jan 2002 13:12:20 -0500 (EST) Received: from danielh (mail.dannyh.org [209.147.90.202]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id NAA14739 for ; Sat, 12 Jan 2002 13:12:19 -0500 Message-Id: <200201121812.NAA14739@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Sat, 12 Jan 2002 12:53:36 -0500 To: http-delta@pa.dec.com Subject: On identical instances, different etags X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.25/25 12 Jan 2002 From: Daniel Hellerstein Re: What to do with exact matches Now that the 48 hr comment period on the draft standard had passed, this comment relates to issues that came up while implementing rfc3229. When there is an exact match between the current instance and a base-instance, it's not always easy to produce an appropriate difference file. In particular, if two identical binary instances are compared, and DIFFE is used, then there is no obvious DIFFE difference file. BTW: * for identical text instances, a simple 0a . will usually work (though a termating ^Z may be dropped) * I understand that diff should not be used against binary files, but what if that's all that is supported, and the instances match exactly? Having some means of telling the client to use his "exactly matching" base instance would be useful. One solution is to ignore the problem -- it is likely to be rare. Another is to add a new "instance-manipulation value": MATCH MATCHs would mean "the base-instance (say, as specified in a Delta-Base response header) exactly matches the current instance". In this case, there would be no need to send a response body. ---------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org From danielh@crosslink.net Sat Jan 12 11:30:38 2002 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id LAA19840; Sat, 12 Jan 2002 11:30:38 -0800 (PST) From: Received: from mailrelay01.cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA08351; Sat, 12 Jan 2002 11:30:37 -0800 Received: by mailrelay01.cce.cpqcorp.net (Postfix, from userid 12345) id E6A1A1D2B; Sat, 12 Jan 2002 13:30:36 -0600 (CST) Received: from zmamail02.zma.compaq.com (zmamail02.nz-tay.cpqcorp.net [161.114.72.102]) by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id AABA21F27 for ; Sat, 12 Jan 2002 13:30:36 -0600 (CST) Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345) id 35F8D3B92; Sat, 12 Jan 2002 14:30:36 -0500 (EST) Received: from lycanthrope.crosslink.net (lycanthrope.crosslink.net [206.246.124.36]) by zmamail02.zma.compaq.com (Postfix) with ESMTP id 06D043A3C for ; Sat, 12 Jan 2002 14:30:36 -0500 (EST) Received: from danielh (mail.dannyh.org [209.147.90.202]) by lycanthrope.crosslink.net (8.9.3/) with SMTP id OAA15366 for ; Sat, 12 Jan 2002 14:30:35 -0500 Message-Id: <200201121930.OAA15366@lycanthrope.crosslink.net> X-Really-To: Reply-To: danielh@crosslink.net Date: Sat, 12 Jan 2002 14:30:18 -0500 To: http-delta@pa.dec.com Subject: Entity-headers, instance-headers, and differences X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.25/25 12 Jan 2002 From: Daniel Hellerstein Re: Entity headers, instance headers, and what to difference Now that the 48 hr comment period on the draft standard had passed, this comment relates to issues that came up while implementing rfc3229. It is not clear just how entity headers should be considered when determining differences between two instances. Let's start with the definition of "entity" from rfc2616 (section 7) a) 7 Entity Request and Response messages MAY transfer an entity if not otherwise restricted by the request method or response status code. An entity consists of entity-header fields and an entity-body, although some responses will only include the entity-headers. RFC3229 extends this by definining an instance (from section 3): instance The entity that would be returned in a status-200 response to a GET request, at the current time, for the selected variant of the specified resource, with the application of zero or more content-codings, but without the application of any instance manipulations (see below) or transfer-codings. It is convenient to think of an entity tag, in HTTP/1.1, as being associated with an instance, rather than an entity. That is, for a given resource, two different response messages might include the same entity tag, but two different instances of the resource should never be associated with the same (strong) entity tag. The above implies that an instance consists of BOTH entity headers (and NOT general and response headers), as well as the body. However, from section 4 of RFC3229 (from the "sequence of transformations") 4. The result of the first three steps, at the time when the request is processed, is an instance. The instance includes a body (possibly empty) and possibly some instance headers. The entity tag, if any, is assigned at this point. That is, an entity tag is associated with an instance, NOT an entity. 5. ... 6. The result of the fifth step becomes the entity, consisting of entity headers and an entity body. This introduces "instance headers" (the only mention of "instance headers" in the document). The question is what, if any, headers should be considered to be part of an instance. There are several possibilities: 1) an instance includes all entity headers. If so, "instance headers" are just entity-headers, though perhaps they are specific to this instance. 2) an instance header only includes a special class of "instance headers", most entity headers are not considered to be "instance headers" 3) an instance does not include any headers This is more then a symantic concern, it effects just what should be included when computing differences. Consider, for the preceding possibilities: 1) When computing a difference between two instances of a request-uri (say "abc" and "def"), then one MUST include both the body and the entity-headers. This includes the headers (from RFC2616): entity-header = Allow | Content-Encoding | Content-Language | Content-Length | Content-Location | Content-MD5 | Content-Range | Content-Type | Expires | Last-Modified Also, from RFC3229, Delta-Base and IM. 2 and 3) If there are no "instance headers", 2 and 3 are identical. Otherwise, under 2 the instance headers would have to be included when computing a difference. Note that under interpretation 1, the ordering of entity-headers can effect the length of the delta -- "abc" and "def" may have the same set of entity-headers, but if they appear in a different order the resulting difference may be lengthy. However, if entity-headers are contained in the difference, there is no need to include entity headers in the actual response. This could result in substantial savings (especially for small files where the entity-headers may be a substantial potion of the response). This interpretation would also require defining Delta-Base and IM as response-headers. This is somewhat problemmatic given from RFC2616, section 6.2) Response-header field names can be extended reliably only in combination with a change in the protocol version. However, new or experimental header fields MAY be given the semantics of response- header fields if all parties in the communication recognize them to be response-header fields. Unrecognized header fields are treated as entity-header fields. Considering these difficulties, and considering implementation hassles of #1 and #2, I advocate #3 -- that headers should not be included when computing differences between instances. Furthermore, the entity-headers included in a 226 response should be used, but not any of the entity-headers associated with the base instance. This does imply that for delta, two instances that differ in their entity-headers but not entity-body (or, more precisely, their instance-body), will yield equivalent responses (that only differ in their headers). ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From danielh@crosslink.net Wed Jan 16 09:52:19 2002 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id JAA05291; Wed, 16 Jan 2002 09:52:18 -0800 (PST) From: Received: from mailrelay01.cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA28258; Wed, 16 Jan 2002 09:52:18 -0800 Received: by mailrelay01.cce.cpqcorp.net (Postfix, from userid 12345) id 2E5C51C6B; Wed, 16 Jan 2002 11:52:14 -0600 (CST) Received: from zmamail02.zma.compaq.com (zmamail02.nz-tay.cpqcorp.net [161.114.72.102]) by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id B88901E8A for ; Wed, 16 Jan 2002 11:52:13 -0600 (CST) Received: by zmamail02.zma.compaq.com (Postfix, from userid 12345) id 5092B398A; Wed, 16 Jan 2002 12:52:13 -0500 (EST) Received: from relayout.ers.usda.gov (relayout.ers.usda.gov [151.121.68.20]) by zmamail02.zma.compaq.com (Postfix) with ESMTP id 1814C3850 for ; Wed, 16 Jan 2002 12:52:13 -0500 (EST) Received: from router-3.ers.usda.gov (ers-68-17.ers.usda.gov) by relayout.ers.usda.gov (LSMTP for Windows NT v1.1b) with SMTP id <0.000B018A@relayout.ers.usda.gov>; Wed, 16 Jan 2002 12:52:10 -0500 Received: from danielh (z_a082.ers.usda.gov) by email.ers.usda.gov (LSMTP for Windows NT v1.1b) with SMTP id <0.000619F0@email.ers.usda.gov>; Wed, 16 Jan 2002 12:52:09 -0500 Date: Wed, 16 Jan 2002 12:52:06 -0500 To: http-delta@pa.dec.com Subject: Etag, entity headers, and delta X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.25/25 Message-Id: <20020116175213.5092B398A@zmamail02.zma.compaq.com> 16 Jan 2002 From: Daniel Hellerstein Re: Etag, entity-headers, and delta In an earlier message I raised a few questions about choosing an etag value when entity-headers, but not the entity body, changes. Having perused RFC2616, shared thoughts with Koen Holtman, and having gone ahead with my implementation; my conclusion is that: RFC2616's treatment of the use of entity headers when choosing an etag value is, at best, muddled. Hence, the proper approach to choosing an etag should be based on the premise that where there is no clear rule, the origin server should consider what behavior is best for its needs, with careful consideration of what downstream caches may do. For delta-aware servers (and for other instance manipulations) this may imply a loosening of the rule (as stated in 13.3.3 of RFC2616) that "if an entity-header changes, so should the etag". For example, in order to reduce cache storage requirements, a delta aware origin server may use the same etag for two instances; even though an entity-header (say, the last-modified, or expires entity-headers), but not the instance body, has changed. I note that a possible solution to the problem (of using the same etag value even though entity-headers have changed) is to use a "weak" etag. However, this is not a fully satisfactory solution, since RFC2616 (again, 13.3.3) places strong restrictions on the use of weak validators in sub-range retrieval. This raises a few questions regarding modifications to RFC3229, the most important being #3 1) Should the above (or something like it) be noted. I advocate doing so, but can live with a tactful silence. 2) The phrase "weak etag" appears nowhere in RFC3229. Does that imply agreement with 13.3.3 (that weak etags should not be used in 226 instannce manipulation responses, just as they should not be used in 206 range responses)? 3) My interpretation is that the entity-headers from a client's cached base instance should NOT be used as the entity headers for newly recieved, delta-encoded, instance. That is, the entity headers should be only those contained in the current response. This does reduce delta efficiency, since it requires resending entity headers that have not changed. Perhaps the rule should be "you can use cached entity headers if they are not overridden by an entity-header in the current response" One or the other must be stated somewhere. I need the guidance! BTW: Here's the beginning part of RFC2616 13.3.3 13.3.3 Weak and Strong Validators Since both origin servers and caches will compare two validators to decide if they represent the same or different entities, one normally would expect that if the entity (the entity-body or any entity- headers) changes in any way, then the associated validator would change as well. If this is true, then we call this validator a "strong validator." However, there might be cases when a server prefers to change the validator only on semantically significant changes, and not when insignificant aspects of the entity change. A validator that does not always change when the resource changes is a "weak validator." Entity tags are normally "strong validators," but the protocol provides a mechanism to tag an entity tag as "weak." One can think of a strong validator as one that changes whenever the bits of an entity changes, while a weak value changes whenever the meaning of an entity changes. Alternatively, one can think of a strong validator as part of an identifier for a specific entity, while a weak validator is part of an identifier for a set of semantically equivalent entities. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul Wed Jan 16 16:38:49 2002 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA16130; Wed, 16 Jan 2002 16:38:49 -0800 (PST) From: Jeffrey Mogul Message-Id: <200201170038.QAA16130@wera.pa.dec.com> To: http-delta Subject: Success, at last: RFCs 3229 and 3230 Date: Wed, 16 Jan 2002 16:38:49 -0800 X-Mts: smtp The "Proposed Standard" RFCs have (finally!) been released: RFC 3229 Title: Delta encoding in HTTP Author(s): J. Mogul, B. Krishnamurthy, F. Douglis, A. Feldmann, Y. Goland, A. van Hoff, D. Hellerstein Status: Standards Track Date: January 2002 Mailbox: JeffMogul@acm.org, bala@research.att.com, douglis@research.att.com, anja@cs.uni-sb.de, yaron@goland.org, avh@marimba.com, danielh@crosslink.net Pages: 49 Characters: 111953 Updates/Obsoletes/SeeAlso: None I-D Tag: draft-mogul-http-delta-10.txt URL: ftp://ftp.rfc-editor.org/in-notes/rfc3229.txt and RFC 3230 Title: Instance Digests in HTTP Author(s): J. Mogul, A. Van Hoff Status: Standards Track Date: January 2002 Mailbox: JeffMogul@acm.org, avh@marimba.com Pages: 13 Characters: 26846 Updates/Obsoletes/SeeAlso: None I-D Tag: draft-mogul-http-digest-05.txt URL: ftp://ftp.rfc-editor.org/in-notes/rfc3230.txt Thanks to *all* of you who helped with this long process. I know there are already suggestions for improvements. However, I have to spend all of my available time working on something else for the next few weeks, so please excuse my apparent lack of interest in delta-related issues! I'll respond when I can. -Jeff From danielh@crosslink.net Tue Jan 22 15:27:00 2002 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id PAA05172; Tue, 22 Jan 2002 15:27:00 -0800 (PST) From: Received: from mailrelay01.cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA14110; Tue, 22 Jan 2002 15:26:59 -0800 Received: by mailrelay01.cce.cpqcorp.net (Postfix, from userid 12345) id F005912EE; Tue, 22 Jan 2002 17:26:58 -0600 (CST) Received: from ztxmail01.ztx.compaq.com (ztxmail01.nz-cce.cpqcorp.net [161.114.8.205]) by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id EAC5711FA for ; Tue, 22 Jan 2002 17:26:58 -0600 (CST) Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345) id A7818264D; Tue, 22 Jan 2002 17:26:58 -0600 (CST) Received: from relayout.ers.usda.gov (relayout.ers.usda.gov [151.121.68.20]) by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id 52B7424B8 for ; Tue, 22 Jan 2002 17:26:58 -0600 (CST) Received: from router-3.ers.usda.gov (ers-68-17.ers.usda.gov) by relayout.ers.usda.gov (LSMTP for Windows NT v1.1b) with SMTP id <0.000B2CC9@relayout.ers.usda.gov>; Tue, 22 Jan 2002 18:26:57 -0500 Received: from danielh (z_a082.ers.usda.gov) by email.ers.usda.gov (LSMTP for Windows NT v1.1b) with SMTP id <0.000640D8@email.ers.usda.gov>; Tue, 22 Jan 2002 18:26:52 -0500 Date: Tue, 22 Jan 2002 18:14:27 -0500 To: http-delta@pa.dec.com Subject: An implementation of rfc3229 X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.25/25 Message-Id: <20020122232658.A7818264D@ztxmail01.ztx.compaq.com> I've pretty much got IM and delta encoding, as spec'ed in rfc3229, implemented under my server. At the moment, I've only tested it behind my firewall. If there is any interest, I can open up a port to a machine that is running this delta-aware server. Since this is a bit of a hassle, I won't bother unless someone asks (when I get the rest of the pieces of this server finished, in a month or two, my public server will be delta aware). I also wrote a simple (command line) client that supports delta encoding. So I can use this to test someone else's delta aware server. Notes: * both client and server are written in REXX, and run under OS/2. They use a set of DLLs, otherwise it would be very easy to port the client to other platforms. If interest is expressed, I can try porting the client to win98. * DIFF -e and GDIFF are supported (not VCDIFF) * support for multiple-ranges is limited to "multiple ranges AFTER a delta". For now, it was just too much of a pain to have to deal with seperate deltas for each of several seperate ranges. * Lacking a definitive answer, I assume that entity-headers are NOT included in the delta comparison. Which also means that all entity headers are sent to the client (even if they have not changed). On small files (say, where just a date or textual hit counter change), these entity headers are a signficant fraction of the entire 226 response. ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From kpv@research.att.com Tue Jan 22 16:09:08 2002 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id QAA09398; Tue, 22 Jan 2002 16:09:07 -0800 (PST) Received: from mailrelay01.cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA18548; Tue, 22 Jan 2002 16:09:07 -0800 Received: by mailrelay01.cce.cpqcorp.net (Postfix, from userid 12345) id 836B41807; Tue, 22 Jan 2002 18:09:01 -0600 (CST) Received: from ztxmail01.ztx.compaq.com (ztxmail01.nz-cce.cpqcorp.net [161.114.8.205]) by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id 4D8331850 for ; Tue, 22 Jan 2002 18:09:01 -0600 (CST) Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345) id 092CF24CD; Tue, 22 Jan 2002 18:09:01 -0600 (CST) Received: from mail-green.research.att.com (H-135-207-30-103.research.att.com [135.207.30.103]) by ztxmail01.ztx.compaq.com (Postfix) with ESMTP id D4C5F24DF for ; Tue, 22 Jan 2002 18:08:22 -0600 (CST) Received: from raptor.research.att.com (raptor.research.att.com [135.207.23.32]) by mail-green.research.att.com (Postfix) with ESMTP id D3EFB1E176; Tue, 22 Jan 2002 19:08:21 -0500 (EST) Received: (from kpv@localhost) by raptor.research.att.com (SGI-8.9.3/8.8.7) id TAA57151; Tue, 22 Jan 2002 19:08:21 -0500 (EST) Date: Tue, 22 Jan 2002 19:08:21 -0500 (EST) From: Phong Vo Message-Id: <200201230008.TAA57151@raptor.research.att.com> Organization: AT&T Research X-Mailer: mailx (AT&T/BSD) 9.9 2002-01-16 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: danielh@crosslink.net, http-delta@pa.dec.com Subject: Re: An implementation of rfc3229 Daniel, The source code for Vcdiff is available at www.research.att.com/sw/tools in case you'd like to add it. Phong > From danielh@crosslink.net Tue Jan 22 18:23 EST 2002 > To: http-delta@pa.dec.com > Subject: An implementation of rfc3229 > I've pretty much got IM and delta encoding, as spec'ed in rfc3229, > implemented under my server. At the moment, I've only tested it > behind my firewall. If there is any interest, I can open > up a port to a machine that is running this delta-aware > server. Since this is a bit of a hassle, I won't bother > unless someone asks (when I get the rest of the pieces of this > server finished, in a month or two, my public server will be > delta aware). > I also wrote a simple (command line) client that supports delta > encoding. So I can use this to test someone else's delta > aware server. > Notes: > * both client and server are written in REXX, and run under OS/2. > They use a set of DLLs, otherwise it would be very easy to > port the client to other platforms. If interest > is expressed, I can try porting the client to win98. > * DIFF -e and GDIFF are supported (not VCDIFF) > > * support for multiple-ranges is limited to "multiple ranges > AFTER a delta". For now, it was just too much of a pain to have to > deal with seperate deltas for each of several seperate ranges. > * Lacking a definitive answer, I assume that entity-headers > are NOT included in the delta comparison. Which also means > that all entity headers are sent to the client (even if they > have not changed). On small files (say, where just a date or > textual hit counter change), these entity headers are a signficant fraction > of the entire 226 response. > > ----------------------------------------------------------- > Daniel Hellerstein > danielh@crosslink.net > http://www.srehttp.org > ----------------------------------------------------------- From mogul Mon Feb 18 17:45:20 2002 Return-Path: Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id RAA04669; Mon, 18 Feb 2002 17:45:20 -0800 (PST) From: Jeffrey Mogul Message-Id: <200202190145.RAA04669@wera.pa.dec.com> To: http-delta Subject: FYI: IESG approves vcdiff spec. as Proposed Standard Date: Mon, 18 Feb 2002 17:45:20 -0800 X-Mts: smtp From: iesg-secretary@ietf.org (The IESG) Subject: Protocol Action: The VCDIFF Generic Differencing and Date: Fri, 15 Feb 2002 23:00:02 +0000 (UTC) The IESG has approved the Internet-Draft 'The VCDIFF Generic Differencing and Compression Data Format' as a Proposed Standard. This has been reviewed in the IETF but is not the product of an IETF Working Group. The IESG contact persons are Patrik Faltstrom and Ned Freed. Technical Summary The memo describes a general and efficient data format suitable for encoding compressed and/or differencing data so that they can be easily transported among computers. It is used as one of the proposed format for transfer of differencing data over HTTP. Working Group Summary This is an individual submission to the IETF. The document was discussed on various mailing lists, including , about the HTTP protocol. Protocol Quality The protocol was reviewed for the IESG by Patrik Faltstrom. Congratulations to David Korn and Phong Vo for their success on this! What this means is that we now have a reasonably well-documented and space-efficient encoding format for deltas. It would be nice to see multiple implementations of both the encoder and decoder, since we will need that for "Draft Standard" status. There is some source code available from AT&T (see draft-korn-vcdiff-06.txt for details). If we want to make vcdiff the recommended format for delta encoding, we will need vcdiff to be advanced to Draft Standard status *before* we can advance the delta spec to Draft Standard (by IETF rules). -Jeff From danielh@crosslink.net Tue Feb 19 09:18:35 2002 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id JAA17358; Tue, 19 Feb 2002 09:18:35 -0800 (PST) From: Received: from mailrelay01.cac.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA30746; Tue, 19 Feb 2002 09:18:34 -0800 Received: from ztxmail02.ztx.compaq.com (ztxmail02.nz-cce.cpqcorp.net [161.114.8.206]) by mailrelay01.cac.cpqcorp.net (Postfix) with ESMTP id 6D3B21BAB for ; Tue, 19 Feb 2002 09:18:34 -0800 (PST) Received: from relayout.ers.usda.gov (relayout.ers.usda.gov [151.121.68.20]) by ztxmail02.ztx.compaq.com (Postfix) with ESMTP id 60BC23371 for ; Tue, 19 Feb 2002 11:18:33 -0600 (CST) Received: from router-3.ers.usda.gov (ers-68-17.ers.usda.gov) by relayout.ers.usda.gov (LSMTP for Windows NT v1.1b) with SMTP id <0.000BFEB8@relayout.ers.usda.gov>; Tue, 19 Feb 2002 12:18:28 -0500 Received: from danielh (z_a082.ers.usda.gov) by email.ers.usda.gov (LSMTP for Windows NT v1.1b) with SMTP id <0.0007007C@email.ers.usda.gov>; Tue, 19 Feb 2002 12:18:26 -0500 Date: Tue, 19 Feb 2002 12:18:02 -0500 To: http-delta@pa.dec.com In-Reply-To: <200202190145.RAA04669@wera.pa.dec.com> Subject: Re: FYI: IESG approves vcdiff spec. as Proposed Standard X-Mailer: MR/2 Internet Cruiser Edition for OS/2 v2.30a/30 Message-Id: <20020219171833.60BC23371@ztxmail02.ztx.compaq.com> >If we want to make vcdiff the recommended format for delta >encoding, we will need vcdiff to be advanced to Draft Standard status >*before* we can advance the delta spec to Draft Standard (by IETF rules). >From my very narrow point of view, having a broad (in terms of platforms) availability of VCDIFF is crucial if we are to "make vcdiff the recommended format". I trust this will happen, but let's not put the cart before the horse. (BTW: Phong... any progress on the os/2 version?) ----------------------------------------------------------- Daniel Hellerstein danielh@crosslink.net http://www.srehttp.org ----------------------------------------------------------- From mogul@pa.dec.com Tue Feb 19 10:20:07 2002 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA21568; Tue, 19 Feb 2002 10:20:07 -0800 (PST) Received: from wera.pa.dec.com by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA02180; Tue, 19 Feb 2002 10:20:07 -0800 Received: from localhost by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA21921; Tue, 19 Feb 2002 10:20:06 -0800 (PST) From: Jeffrey Mogul Message-Id: <200202191820.KAA21921@wera.pa.dec.com> To: Cc: http-delta@pa.dec.com Subject: Re: FYI: IESG approves vcdiff spec. as Proposed Standard In-Reply-To: Your message of "Tue, 19 Feb 2002 12:18:02 EST." <20020219171833.60BC23371@ztxmail02.ztx.compaq.com> Date: Tue, 19 Feb 2002 10:20:06 -0800 X-Mts: smtp >If we want to make vcdiff the recommended format for delta >encoding, we will need vcdiff to be advanced to Draft Standard status >*before* we can advance the delta spec to Draft Standard (by IETF rules). From my very narrow point of view, having a broad (in terms of platforms) availability of VCDIFF is crucial if we are to "make vcdiff the recommended format". I trust this will happen, but let's not put the cart before the horse. Of course we want a wide set of platforms covered, but the IETF standards process lives and dies by the "rough consensus" model, not the "no platform left behind model." I certainly don't want to leave out popular but minority platforms such as OS/2 and MacOS, but let's please not wait until we have vcdiff ported to AmigaOS and TRS-80, OK? And vcdiff would be RECOMMENDED, not MANDATORY, in any case. -Jeff From kpv@research.att.com Tue Feb 19 10:52:43 2002 Return-Path: Received: from pobox1.pa.dec.com by wera.pa.dec.com; (8.8.8/1.1.8.2/06Jun96-0357PM) id KAA25519; Tue, 19 Feb 2002 10:52:43 -0800 (PST) Received: from mailrelay01.cce.cpqcorp.net by pobox1.pa.dec.com (5.65v3.2/1.1.10.5/07Nov97-1157AM) id AA11686; Tue, 19 Feb 2002 10:52:42 -0800 Received: from zmamail02.zma.compaq.com (zmamail02.nz-tay.cpqcorp.net [161.114.72.102]) by mailrelay01.cce.cpqcorp.net (Postfix) with ESMTP id 56F401402 for ; Tue, 19 Feb 2002 12:52:42 -0600 (CST) Received: from mail-green.research.att.com (H-135-207-30-103.research.att.com [135.207.30.103]) by zmamail02.zma.compaq.com (Postfix) with ESMTP id 35F691EC1 for ; Tue, 19 Feb 2002 13:52:41 -0500 (EST) Received: from raptor.research.att.com (raptor.research.att.com [135.207.23.32]) by mail-green.research.att.com (Postfix) with ESMTP id CED9E1E0C7; Tue, 19 Feb 2002 13:52:39 -0500 (EST) Received: (from kpv@localhost) by raptor.research.att.com (SGI-8.9.3/8.8.7) id NAA80024; Tue, 19 Feb 2002 13:52:39 -0500 (EST) Date: Tue, 19 Feb 2002 13:52:39 -0500 (EST) From: Phong Vo Message-Id: <200202191852.NAA80024@raptor.research.att.com> Organization: AT&T Research X-Mailer: mailx (AT&T/BSD) 9.9 2002-01-31 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: danielh@crosslink.net, http-delta@pa.dec.com Subject: Re: FYI: IESG approves vcdiff spec. as Proposed Standard > From danielh@crosslink.net Tue Feb 19 12:15 EST 2002 > To: http-delta@pa.dec.com > Subject: Re: FYI: IESG approves vcdiff spec. as Proposed Standard > >If we want to make vcdiff the recommended format for delta > >encoding, we will need vcdiff to be advanced to Draft Standard status > >*before* we can advance the delta spec to Draft Standard (by IETF rules). > >From my very narrow point of view, having a broad (in terms of platforms) > availability of VCDIFF is crucial if we are to "make vcdiff the recommended > format". I trust this will happen, but let's not put the cart before the horse. My code is carefully crafted to be portable with respect to all standard flavors of C and C++. So in this sense, I believe that the code is available on most platforms that we care about. On the other hand, portable C code is not the same as a buildable package. The build procedure for the distributed package on the AT&T site (www.research.att.com/sw/tools) requires the make & shell tools. As far as I know, this can be built and run transparently on all flavors of Unix including Linux, BSD, Irix, Solaris, etc. On Windows varieties, using something like Dave Korn's Uwin system as a base would work. For people who like to read code, the core encoding algorithm is in the file src/lib/vcodex/Vcdiff/vcddiff.c. The fast string matcher is in the routine vcdfold() in the same file. The core decoding algorithm is in the file src/lib/vcodex/Vcdiff/vcdundiff.c. By "core", I mean the code that deals with each window of data as described in the Proposed Standard. For file handling level, read src/lib/sfvcodex/sfwindow.c to see the different strategies whereby windows are selected. src/lib/sfvcodex/sfvcdiff.c and src/lib/sfvcodex/sfvcundiff.c do file level encoding and decoding. > (BTW: Phong... any progress on the os/2 version?) Not yet. First we need to bring up an os/2 machine and that's taking time. Phong From apache@ns3.super-hosts.com Tue Jul 30 00:54:06 2002 Return-Path: Received: from mailrelay01.cac.cpqcorp.net (mailrelay01.cac.cpqcorp.net [16.47.132.152]) by wera.hpl.hp.com (8.12.3/8.12.2) with ESMTP id g6U7s6ho005303 for ; Tue, 30 Jul 2002 00:54:06 -0700 (PDT) Received: from zmamail01.zma.compaq.com (zmamail01.nz-tay.cpqcorp.net [161.114.72.101]) by mailrelay01.cac.cpqcorp.net (Postfix) with ESMTP id F3DA8E05 for ; Tue, 30 Jul 2002 00:54:07 -0700 (PDT) Received: from ns3.super-hosts.com (unknown [216.12.213.215]) by zmamail01.zma.compaq.com (Postfix) with ESMTP id 440738EDB for ; Tue, 30 Jul 2002 03:54:05 -0400 (EDT) Received: (from apache@localhost) by ns3.super-hosts.com (8.11.6/8.11.6) id g6U83HN20574; Tue, 30 Jul 2002 04:03:18 -0400 Date: Tue, 30 Jul 2002 04:03:18 -0400 Message-Id: <200207300803.g6U83HN20574@ns3.super-hosts.com> To: scoya@cnri.reston.va.us, sigtran@standards.nortelnetworks.com, ietf-languages@alvestrand.no, ietf-languages@iana.org, http-delta@pa.dec.com, yoakum@nortelnetworks.com, 200104121657.JAA28693@jet.isi.edu, isis-wg@juniper.net, mibs@ops.ietf.org, gsmp@revnetworks.com From: mike@winners.com () Subject: GET OVER $500 CASH in 2 minutes!!! Below is the result of your feedback form. It was submitted by (mike@winners.com) on Tuesday, July 30, 19102 at 04:03:17 --------------------------------------------------------------------------- message: GET OVER $500 DOLLARS CASH!!! Just Click Here!!!! - http://www.reelten.com/redirect/500/index.htm - If you have received this Email in error please contact: lon_chaney_jr@hotmail.com with subject: remove. ---------------------------------------------------------------------------