6.15.2.2. Implementation and Use Note

INTERNET-DRAFT                               Charles H. Lindsey
Usenet Format Working Group                  University of Manchester
                                             July 2001

6.15.2.2. Implementation and Use Note

Previous Up Next

6.15.2.2.  Implementation and Use Note
[Here is the implementation technique that we discussed, based on the
use of a conventional History file. This is a sanity check for our own
use, not intended to go in the final text. There are two cases to
consider:
A. Traditional implementations (e.g. CNEWS) where each History file line
includes a full message-identifier plus an item for each group in which
the article appears. Thus History file entries are of variable length,
and it is impractical to update them in situ.
B. History files made up of fixed length records (e.g. as proposed for
INN), which enables entries to be overwritten in situ. The History line
typically contains a hash of the message identifier plus some pointer to
an object representing the article as stored.

We consider the traditional case first:

1A. Ensure that the implementation of DBZ is not upset if the same key
is attempted to be stored a second time, and that such a key always
retrieves the latest record indexed by that key.

2A. Additions to the History file are always made at the end. Removals
or changes to existing entries are only made by the expire program. An
entry for a Replaced (or otherwise cancelled) article will remain until,
first, the expire program removes the links to the articles that are no
longer stored, and later on removes the entire entry according to its
expiry date. For every entry containing a '$v=n' followed by random-
dollars-sequences there will be an immediately following entry identical
but for the omission of that '$v=n' and of the random-dollars-sequences.
Thus there may be several entries with identical message-ids but,
because of the change to DBZ just described, only the most recent will
ever be seen except by programs that access the History file directly,
rather than by its index.
3A. When an article is Replaced, at the same time as the successor
article is entered into the History file, with '$v=7' say, a duplicate
entry (same article list) is entered under the same key, modified by
removing any leftmost '$v=n' and the following random-dollars-sequences
from it.

For the fixed length implementations, these steps become:

1B. DBZ does not need to be changed.
2B. History file entries may be updated in situ. An entry for a Replaced
(or otherwise cancelled) article can be overwritten with that for the
new article (or with a suitable indication of cancellation). For every
entry containing a '$v=n' followed by random-dollars-sequences there
will always exist a second entry identical but for the omission of that
'$v=n' and of the random-dollars-sequences, both entries pointing to the
same article object.
3B. When an article is Replaced, at the same time as the successor
article is entered into the History file, with '$v=7' say, the existing
entry without the leftmost '$v=n' and the following random-dollars-
sequences is overwritten (with the new article and new expiry date,
after destroying the old article, of course).  If no such entry exists,
one is created.

From here on, the two cases are the same:

4. Provide a call to a routine which, if asked to retrieve any message
identifier with '$v=n' and finding it missing (or rather linked to no
stored groups), immediately tries again without the '$v=n' and its
random-dollars-sequences.  NOTE. We don't want this behaviour when
checking whether we already have an article offered to us by IHAVE, only
in response to an ARTICLE command. So this needs to be an extra call in
DBZ, in addition to the 'fetch' or 'dbzfetch' calls, to be used in the
proposed extension to the NNTP ARTICLE command. Observe that if the
requested '$v=n' is present and linked to stored articles (for whatever
reason) then you will be given exactly that version, even if later ones
are stored as well.

5. NOTE that I have dropped the idea of having '$v=0', because you can
never be sure that the very first issue of the FAQ used it, so you have
to provide the versionless root as well. If someone asks for '$v=0' (or
any '$v=n') the algorithm I gave will still find it via the root. So we
don't care what people put in URLs.

6. You are supposed to cancel the replaced/superseded article. If you
REALLY want to keep the old ones around a little longer, then this
implementation will not work if you want the latest to be retrieved
automatically - you will have to invent something much more complicated.

7. Having said all that, here follows a brief account of the same thing,
but short enough to be included in our document (the convention being
that implementation issues are hinted at, rather than being described in
full detail).]

   Typically, a news database will index a Replacement article both by
   its "version-number" message identifier (containing a "$v=" tag
   followed by a random-dollars-sequence) and by its "root" version
   (without the "$v=" tag or any following random-dollars-sequence).
   Thus when a request for an article comes in that is not present under
   the version-number requested, any article that is present and indexed
   by the corresponding root version can be retrieved instead. The
   indexing mechanism needs to be such that, although the root version
   may have at times referred to many different articles, it is always
   the current one that is retrieved.

        NOTE: The presence of a version-number in the message identifier
        of an article without a Replaces or Supersedes header causes no
        extra action (it is just an ordinary article). Observe also that
        if an article with the exact message identifier (even though it
        contains a version-number) is, for whatever reason, already
        present on the serving agent, that article will always be
        retrieved in preference to the one indexed by any root version.

Previous Up Next

Previous draft (04): 6.15.2.2. Implementation and Use Note

Diffs to previous draft

--- {draft-04}	Wed Jul 11 21:55:41 2001
+++ {draft-05}	Wed Jul 11 21:55:42 2001
@@ -30,3 +30,75 @@
 because of the change to DBZ just described, only the most recent will
 ever be seen except by programs that access the History file directly,
 rather than by its index.
+3A. When an article is Replaced, at the same time as the successor
+article is entered into the History file, with '$v=7' say, a duplicate
+entry (same article list) is entered under the same key, modified by
+removing any leftmost '$v=n' and the following random-dollars-sequences
+from it.
+
+For the fixed length implementations, these steps become:
+
+1B. DBZ does not need to be changed.
+2B. History file entries may be updated in situ. An entry for a Replaced
+(or otherwise cancelled) article can be overwritten with that for the
+new article (or with a suitable indication of cancellation). For every
+entry containing a '$v=n' followed by random-dollars-sequences there
+will always exist a second entry identical but for the omission of that
+'$v=n' and of the random-dollars-sequences, both entries pointing to the
+same article object.
+3B. When an article is Replaced, at the same time as the successor
+article is entered into the History file, with '$v=7' say, the existing
+entry without the leftmost '$v=n' and the following random-dollars-
+sequences is overwritten (with the new article and new expiry date,
+after destroying the old article, of course).  If no such entry exists,
+one is created.
+
+From here on, the two cases are the same:
+
+4. Provide a call to a routine which, if asked to retrieve any message
+identifier with '$v=n' and finding it missing (or rather linked to no
+stored groups), immediately tries again without the '$v=n' and its
+random-dollars-sequences.  NOTE. We don't want this behaviour when
+checking whether we already have an article offered to us by IHAVE, only
+in response to an ARTICLE command. So this needs to be an extra call in
+DBZ, in addition to the 'fetch' or 'dbzfetch' calls, to be used in the
+proposed extension to the NNTP ARTICLE command. Observe that if the
+requested '$v=n' is present and linked to stored articles (for whatever
+reason) then you will be given exactly that version, even if later ones
+are stored as well.
+
+5. NOTE that I have dropped the idea of having '$v=0', because you can
+never be sure that the very first issue of the FAQ used it, so you have
+to provide the versionless root as well. If someone asks for '$v=0' (or
+any '$v=n') the algorithm I gave will still find it via the root. So we
+don't care what people put in URLs.
+
+6. You are supposed to cancel the replaced/superseded article. If you
+REALLY want to keep the old ones around a little longer, then this
+implementation will not work if you want the latest to be retrieved
+automatically - you will have to invent something much more complicated.
+
+7. Having said all that, here follows a brief account of the same thing,
+but short enough to be included in our document (the convention being
+that implementation issues are hinted at, rather than being described in
+full detail).]
+
+   Typically, a news database will index a Replacement article both by
+   its "version-number" message identifier (containing a "$v=" tag
+   followed by a random-dollars-sequence) and by its "root" version
+   (without the "$v=" tag or any following random-dollars-sequence).
+   Thus when a request for an article comes in that is not present under
+   the version-number requested, any article that is present and indexed
+   by the corresponding root version can be retrieved instead. The
+   indexing mechanism needs to be such that, although the root version
+   may have at times referred to many different articles, it is always
+   the current one that is retrieved.
+
+        NOTE: The presence of a version-number in the message identifier
+        of an article without a Replaces or Supersedes header causes no
+        extra action (it is just an ordinary article). Observe also that
+        if an article with the exact message identifier (even though it
+        contains a version-number) is, for whatever reason, already
+        present on the serving agent, that article will always be
+        retrieved in preference to the one indexed by any root version.
+