Saturday, September 12, 2009

Type Lists and Heterogeneously Typed Arrays

In previous blog posts (HList in Scala and HList in Scala Revisited (or Scala Metaprogramming Works!)) I described a way to encode heterogeneously typed lists, HList, in Scala. This data type can be used as a generic replacement for the TupleX types found in the Scala library. In this post I will generalize the concept of type lists and introduce a new data type, HArray.

Note: the MetaScala code linked to in this post must be compiled using a recent Scala 2.8.0 build. I've given up on making it work with 2.7.x.

Type Lists

In HList the types of the elements and the element values are intertwined in the same type. It doesn't have to be this way though and the types of the elements can be tracked separate from the actual element values. To do this we create an purely abstract type (it has no instances) which models a list of types, let's call it TList. Here's a simple implementation:

// Abstract base type
sealed trait TList

// The empty type list
final class TNil extends TList

// A pair of a type and a type list
final class TCons[H, T <: TList] extends TList {
type Head = H
type Tail = T
}

// For infix notation
type ::[H, T <: TList] = TCons[H, T]

which would allow us to build type lists using infix notation, for example:

type SomeTypes = Int :: Boolean :: String :: TNil

This is only the basic definition of TList, in practice you want to add lots of functionality to this type, like append, length, indexing, reverse etc. The full TList code is available in MetaScala. TList can be used for implementing HList, but also heterogeneously typed arrays which I describe below.

Heterogeneously Typed Arrays

With a heterogeneously typed array I mean an array where each element can have a unique type which is tracked at compile time (the length of the array is also tracked at compile time). This ensures that the element values are handled in a type safe way. The values are stored in an Array[Any] and are cast to the correct type when extracted from the array (remember that Java generics are implemented using casts as well so this is no different from using the TupleX classes, or a Java ArrayList). The element types are tracked using a TList.

Here's part of the implementation which shows the prepend, append and nth methods:

final class HArray[L <: TList](private val elems : Array[Any]) extends HSeq[L] {
...

def ::[T](v : T) = {
val a = new Array[Any](elems.length + 1)
a(0) = v
Array.copy(elems, 0, a, 1, elems.length)
new HArray[T :: L](a)
}

def :::[L2 <: TList](l : HArray[L2]) = {
val a = new Array[Any](elems.length + l.elems.length)
Array.copy(l.elems, 0, a, 0, l.elems.length)
Array.copy(elems, 0, a, l.elems.length, elems.length)
new HArray[L2#Append[L]](a)
}

def apply[N <: Nat](implicit nth : INth[N]) : L#Nth[N] = elems(nth.index).asInstanceOf[L#Nth[N]]

...
}

Notice that the method implementations use simple array operations (array copy and indexing) so the performance should be comparable to using the TupleX classes in the Scala library (no benchmarks performed yet though). The full HArray implementation can be found here.

The HArray interface is similar to HList and virtually the same operations are supported. There are some usage examples in MetaScala:

// Create a HArray of an Int, Boolean and a pair
val a1 = 10 :: true :: (10.1, "Hello") :: HArrayNil

// Extract the second element, note that the element type
// information is preserved and we can safely perform a
// boolean and operation
val b = a1(_1) && false

// Create another HArray using alternative syntax (faster)
val a2 = HArray(1.1, "string", false)

// Replace the second element in the list, it used to
// be a String, but now it's an Int
val a3 = a2.removeNth(_1).insert(_1, 14)

// Type information preserved, we can use an Int operation
// on the element
val i = a3(_1) / 2

// Append l2 to l1
val a4 = a1 ::: a2

// Statically check that the length of l4 is 6
type T = Equal[_6, a4.Size]


Other TList Applications

There are other applications for the TList type, for example it can be used for modeling type unions like this:

case class OneOf[TS <: TList](value : Any)

which would be a lot easier to use than Either when you have more than two choices. A problem with this solution is that it wouldn't play well with Scala's pattern matcher, you would to add need unnecessary default cases for example. It should be possible to create type matching functions though to alleviate this problem. More work is needed in this area and it might be material for a future blog post.

If you can think of other applications for type lists and sets I would be interested in hearing them.

8 comments:

Luc Duponcheel said...

cool stuff man!

Anonymous said...
This comment has been removed by a blog administrator.
Jorge Ortiz said...

Awesome!

I've also found something like ScalaNLP's HTMap (map of types to values of that type) very useful. See: http://github.com/dlwh/scalanlp-core/blob/master/src/main/scala/scalanlp/collection/immutable/HTMap.scala

Unknown said...

Metascala is indeed awesome.

I'm having difficulties grasping how to use containsFn and isSubSetFn "predicates" (for lack of a better word). They appear to be tools for statically checking whether a given TList contains a type T or is a superset of another given TList. But

containsFn[Int, MyTList]

just seems to build a new ContainsFn[Int, Int :: MyTList] regardless of whether MyTList contains or doesn't contain Int.

My immediate goal is have a method that takes two TLists as parameters and fails do compile if they don't represent the same list of types. It's not clear to me if the Utils.Equal can help in this case.

Jesper Nordenberg said...

@Rafael:
Yes, containsFn and isSubSetFn are implicits that can are used when creating functions that checks for membership and sub set relationship. I don't think it's possible to write these as pure type functions as there is no way to perform type equality checks (see discussion on the mailing list, http://thread.gmane.org/gmane.comp.lang.scala/17474/focus=17481).

You can find some example uses in OneOfs.scala which models simple generic type unions. To check if two TLists contains the same elements one can do:

def areEqual[T1 <: TList, T2 <: TList](implicit s1 : IsSubSetFn[T1, T2], s2 : IsSubSetFn[T2, T1]) = ...

It's quite unfortunate that this function can't be written on a pure type level.

Unknown said...

Thanks, Jesper. That seems to do the trick.

Ralf Lämmel said...

How would you model TIPs or TICs or Records?
Perhaps you want to add some comments to OO.scala?
I can see from GetByType (HLists.scala) & friends how you deal with type equality.
So do you want to propose a TIP & TIC like structure?
It is not so obvious how you would do this.
But I am counting on it to be feasible and you able to do it!

Less important:

What is the GetByType operation good for?
I think it looks up the n-th occurrence of a given type in an HList.
Is that something useful?
Why not do something like HOccurs early on instead?

Why make a value-level Int a required constructor argument of Succ?
Shouldn't this value-level value always implied (as later done with views)?

Even less important:

Why have both If and If2 in Booleans?
It looks like If2 is more well-behaved.
So is there any reason to keep If?
What is trait IfTrue for?
(It's never used in HLists.scala.)

Why is Head of HNil Nothing?
Why is Tail of HNil HNil?
Wouldn't it be better to leave these type-level operations out of HNil?

Jesper Nordenberg said...

Hello Ralf,

Lots of questions, I'll try to answer to my best ability :)

> How would you model TIPs or TICs or Records?

I'm not familiar with TIPs and TICs. Care to elaborate?

As for records they can be modelled similar to objects (see OO.scala), ie a field is a unique type.

> Perhaps you want to add some comments to OO.scala?

I thought it was self explanitory ;) Basically an object is a HList of methods where each method has a unique type. Method dispatch is simply a lookup by type and a call to the corresponding function object. Overriding methods is simply a "replace by type" operation.

> I can see from GetByType (HLists.scala) & friends how you deal with type equality.

Unfortunately type equality can't be modeled on a pure type level in Scala (this has been discussed on the mailing lists). This makes it impossible to create a generic type set for example.

> What is the GetByType operation good for?
> I think it looks up the n-th occurrence of a given type in an HList.
> Is that something useful?

Yes, it's useful for object method overriding for example.

> Why not do something like HOccurs early on instead?

I'm not sure what you mean by this.

> Why make a value-level Int a required constructor argument of Succ?
> Shouldn't this value-level value always implied (as later done with views)?

I added this constructor arg afterwards because, I think, it eliminates the need for an additional implicit parameter when passing a Nat object as parameter.

> Why have both If and If2 in Booleans?
> It looks like If2 is more well-behaved.
> So is there any reason to keep If?

As you write, If2 is needed in some cases to make code type safe as it specifies a common super type bound. In some cases If is sufficient though and there's no need to use the more complex If2.

> What is trait IfTrue for?
> (It's never used in HLists.scala.)

Good question. Can't remember why I added that.

> Why is Head of HNil Nothing?
> Why is Tail of HNil HNil?
> Wouldn't it be better to leave these type-level operations out of HNil?

Yes, optimally these type members would not be part of HList nor HNil, but I think I ran into some type problems when I tried to model it this way. Don't remember exactly where though.