Grouping Entries
Willow lets authors place Entries in namespaces, and within each namespace, Entries are arranged according to three orthogonal dimensions: subspace_id, path, and timestamp. This suggests a powerful way of thinking about Willow: a namespace is a collection of points (Entries) in a three-dimensional space. Or more accurately, a namespace is a mapping from points in this three-dimensional space to hashes and sizes of Payloads.
This viewpoint enables us to meaningfully group Entries together. An application might want to access all chess games that a certain author played in the past week. This kind of query corresponds to a box (a rectangular cuboid to use precise terminology) in the three-dimensional Willow space.
In this document, we develop and define a vocabulary for grouping Entries based on their subspace_ids, paths, and timestamps. These definitions are not necessary for defining and understanding the core data model, but we make heavy use of them in our recommended capability system and our recommended synchronisation protocol.
Ranges
Ranges are simple, one-dimensional ways of grouping Entries, they can express groupings such as “last week’s Entries”. A range is either a closed range or an open range. A closed range consists of a start value and an end value, an open range consists only of a start value. A range includes all values greater than or equal to its start value and strictly less than its end value (if it is has one). A range is empty if it includes no values.
The Willow protocols use three types of ranges:
A range of SubspaceIds.struct SubspaceRangeA SubspaceId must be greater than or equal to this to be included in the range. The ordering must be given by a protocol parameter.If open, the range is an open range. Otherwise, a SubspaceId must be numerically strictly less than this to be included in the range. The ordering must be given by a protocol parameter.struct PathRangeIf open, the range is an open range. Otherwise, a Path must be lexicographically strictly less than this to be included in the range.A range of Timestamps.struct TimeRange
Let r1 and r2 be ranges (over the same types of values). The intersection of r1 and r2 is the range whose start value is the greater of the start values of r1 and r2, and whose end value is the lesser of the end values of r1 and r2 (if both are closed ranges), the one end value among r1 and r2 (if exactly one of them is a closed range), or no end value (if both r1 and r2 are open ranges).
When we combine ranges of all three dimensions, we can delimit boxes in Willow space.
struct 3dRange
A 3dRange includes every Entry whose subspace_id, path, and timestamp are all included their respective range.
We define default_3d_range(default_subspace)
to denote the 3dRange with the following members:
- subspaces is the open SubspaceRange with start default_subspace,
- paths is the open PathRange whose start is the empty Path, and
- times is the open TimeRange with start zero.
Areas
3dRanges are a natural way of grouping Entries, but they have certain drawbacks around encrypted data in Willow: when encrypting Paths, for example, the lexicographic ordering of the encrypted Paths is inconsistent with the ordering of the unencrypted Paths. Similarly, SubspaceRanges do not preserve their meaning under encryption either. Hence, user-specified 3dRanges are close to useless when dealing with encrypted data.
Fortunately, there do exist encryption techniques that preserve some weaker properties than arbitrary orderings.See here for information on encrypting Willow. Without going into the cryptographic details, we now define an alternative to 3dRanges that can be used even when encrypting Paths and SubspaceIds.
Every Area can be expressed as a 3dRange, but not the other way around. Areas always denote boxes in Willow space, but some boxes do not correspond to any Area.A grouping of Entries.struct AreaTo be included in this Area, an Entry’s subspace_id must be equal to the subspace_id, unless it is any.
An Area a includes an Entry e if
a.subspace_id == any
ora.subspace_id == e.subspace_id
,- a.path id a prefixes e.path, and
- a.times includes e.timestamp.
An Area is empty if it includes no Entries. This is the case if and only if its times is empty.
An Area includes another Area if the first Area includes all Entries that the second Area includes. In particular, every Area includes itself.
The full area is the Area whose subspace_id is any, whose path is the empty Path, and whose times is the open TimeRange with start It includes all Entries.
The subspace area of the SubspaceId sub is the Area whose subspace_id is sub, whose path is the empty Path, and whose times is the open TimeRange with start It includes exactly the Entries with subspace_id sub.
If two Areas overlap, the overlap is again an Area. Let a1 and a2 be Areas. If there exists at least one Entry included in both a1, and a2, then we define the (nonempty) intersection of a1, and a2 as the Area whose
- subspace_id is any, if a2.subspace_id is any, or a1.subspace_id, otherwise, whose
- path is the longer of a1.path and One is a prefix of the other, otherwise the intersection would be empty.a2.path, and whose
- times is the intersection of a1.times and a2.times.
Areas of Interest
Occasionally, we wish to group Entries based on the contents of some store. For example, a space-constrained peer might ask for the 100 newest Entries when synchronising data.
We serve these use cases by combining an Area with limits to restrict the contents to the Entries with the greatest timestamps.
struct AreaOfInterestTo be included in this AreaOfInterest, an Entry’s timestamp must be among the max_count greatest Timestamps, unless max_count is zero.
An AreaOfInterest aoi includes an Entry e from a store store if
- aoi.area includes e,
- aoi.max_count is zero, or e is among the aoi.max_count newest Entries of store, and
- aoi.max_size is zero, or the sum of the payload_lengths of e and all newer Entries in store is less than or equal to aoi.max_size.
Let aoi1 and aoi2 be AreaOfInterests. If there exists at least one Entry included in both aoi1.area, and aoi2.area, then we define the (nonempty) intersection of aoi1, and aoi2 as the AreaOfInterest whose
- area is the intersection of aoi1.area and aoi2.area, whose
- max_count is zero if any of aoi1.max_count or aoi2.max_count is zero, or the minimum of aoi1.max_count and aoi2.max_count otherwise, and whose
- max_size is zero if any of aoi1.max_size or aoi2.max_size is zero, or the minimum of aoi1.max_size and aoi2.max_size otherwise.