Characteristic Datatypes: An alternative functional representation of ADTs

Posted on April 11, 2018

Background: Church

There are several well-known methods of representing data types in the lambda calculus, like the Church encoding:

Given the usual ADT definition of a List, we can produce its Church encoding and vice versa:

The Church encoding identifies a list with its right-fold; this works because, as we’ve seen, all the list’s information is carried in the right-fold.

As such, it’s possible to define a Data.CList module that uses the Church encoding internally but exports the same API as Data.List.

It’s this property of the right-fold that I’ll call characteristic.

Background: Sets

Usually a characteristic function is defined for sets: Given a set X, its characteristic function χ : X -> 2 is defined as χ(a) = 1 if a ∈ X, χ(a) = 0 otherwise.

Likewise, we can construct any set from its characteristic function: X = {a | χ(a) = 1}.

Purescript: Sets

This suggests the following definition:

This has some interesting differences from Data.Set.Set: for one, it’s contravariant in its argument. In fact, it’s equivalent to Data.Predicate.Predicate. Therefore it has instances for the whole contravariant hierarchy: Divide, Decide, etc.

Moreover, some very cute looking definitions fall out as well:

As an example, union reads like it isn’t even written in code: “The union of x and y is the set of all a such that a is in x or a is in y”.

Unfortunately, with this representation it’s almost always impossible to recover the “original” set in PS-land. That’s because we can’t actually translate “X = {a | χ(a) = 1}” into a PS function recover :: Characteristic.Set ~> Traditional.Set.

Purescript: Maps

Data.Map has a characteristic function in its API: lookup :: forall k v. Ord k => k -> Map k v -> Maybe v.

We can translate it into a characteristic definition:

(Yes, I know this is the same as Star Maybe).

This has some interesting properties that aren’t shared with the traditional Map as well:

Unfortunately its representation (specifically the polymorphism of the first type variable in its constructor) means we’re left without important instances like Eq and Show.

We’d need to constrain the type of k so that we could exhaustively search values inhabiting it, but instance eqMap :: (Bounded k, Enum k, Eq k, Eq v) <= Eq (Map k v), while sufficient, still wouldn’t let us check equality between two values of type Map String a.

On the other hand, we can also define maps with infinitely many keys.

Purescript: Sequences

For a general linear, “order and multiplicity matters” structure like a sequence, we’ve already seen that a right-fold is characteristic. But so is another frequently used function: index :: forall a. Sequence a -> Int -> Maybe a.

index is characteristic because, by convention, sequences are indexed starting at 0. So we can merely iterate through the indices, pulling out elements until we hit a Nothing.

This means our definition will be a) equivalent to Map Int, and b) since we’re following convention, Show, Eq and friends suddenly become tractable.

Note that by losing the Maybe we can force a Sequence to be very large – defined for every possible Int value.

Purescript: Trees

Suppose we have the following definition:

Now, there’s no canonical Tree type in the core or contrib libraries, so we’re flying a little blind. But maybe we get some inspiration from our Sequence type.

With Sequence, our characteristic function took an index as input and returned the element at that index, if it exists. We can do something similar for trees. But what would an element’s index be in a tree? Basically, we need it to specify the “location” an element inhabits.

A path is just a sequence of (horizontal) indices that go through a tree.

Imagine we have the following representation of a tree, with root r:

    r
   / \
  a   b
 /   / \
c   d   e

Then the path to r is just the empty sequence (), a gets (0), and e gets (1, 1).

Again, like the case with sequences we’re adopting some conventions here:

  1. Indices start at 0
  2. Indices at any given level are contiguous
  3. If the path (x[0], x[1], ..., x[n], x[n+1]) is defined, so is the path (x[0], x[1], ..., x[n]).

These properties of “well-formed trees” mean we can, again, define eq between trees.