Algebraic Datatypes in Java 8

Sum types, also known as tagged unions are a feature in many modern languages. Traditional object-oriented languages from the 90's, having recently aquired lambda functions, still have to catch up in this area. But with this trick you can get pretty close.

In Haskell, defining a tree structure is straightforward:

data Term = Constant Int | Add Term Term | IfZero Term Term Term

This means that a value of type Term is either Constant, Add or IfZero. Each of these constructors take arguments, such as the Int argument for Constant. The constructor and the arguments can be inspected via pattern matching, eg:

eval term = case term of
    Constant i -> i
    Add e1 e2 -> eval e1 + eval e2
    IfZero e1 e2 e3 -> if eval e1 == 0 then eval e2 else eval e3

This means that you can have all the code required to implement eval in one place, instead of having it spread over three different classes that each implement their little piece.

It also means you can put the datatype in a library, and somebody else can define eval somewhere else without having to modify your code. This is not possible if eval has to be defined as part of the Term class, as is usual in object oriented programming.

In Java, the Visitor pattern is a common workaround for this missing feature. In this post, I'll showcase a very minimal way of implementing it. To continue with the running example, Term will be a class in Java:

public interface Term {

We have the accept method which takes a Visitor:

    public abstract <R> R accept(Visitor<R> visitor);

    public interface Visitor<R> {
        R constant(int value);
        R add(Term left, Term right);
        R ifZero(Term condition, Term then, Term otherwise);
    }

The visitor has a method for each subtype of Term. So far, so good. Not that much worse than Haskell. However, to be able to actually construct Terms, we'll need to write a bit of boilerplate:

    public static Visitor<Term> term = new Visitor<Term>() {

        public Term constant(int value) { return new Term() {
            public <R> R accept(Visitor<R> v) { return v.constant(value); }
        };}

        public Term add(Term left, Term right) { return new Term() {
            public <R> R accept(Visitor<R> v) { return v.add(left, right); }
        };}

        public Term ifZero(Term condition, Term then, Term otherwise) { return new Term() {
            public <R> R accept(Visitor<R> visitor) { return visitor.ifZero(condition, then, otherwise); }
        };}

    };

The static term field acts as a factory for Terms. It could have been a bunch of static methods, but implementing the Visitor interface gives os a guarantee that we have exactly the constructors that the Visitor interface supports - no more and no less.

The methods are all similar. For example, the constant method returns a new Term whose accept method invokes constant on the provided visitor. The add method returns a new Term whose accept method invokes add, etc. And that's it:

}

Now we can construct a Term:

import static Term.term;
...
Term myTerm = term.add(term.constant(5), term.constant(7));

And we can implement eval:

int eval(Term e) {
    return e.accept(new Visitor<Integer>() {
        public Integer constant(int value) { 
            return value; 
        }
        public Integer add(Term left, Term right) { 
            return eval(left) + eval(right); 
        }
        public Integer ifZero(Term condition, Term then, Term otherwise) { 
            return eval(condition) == 0 ? eval(then) : eval(otherwise);
        }
    });
}

eval(myTerm)  // returns 12

There you have it - safe and flexible, if a little verbose.

If you have comments, catch me on Twitter! I'm @Continuational