Scala specialization is maturing and in Scala 2.8 RC2 some parts of the standard library has been specialized (the function and tuple classes). So, it's time to update the stream fusion tests I wrote about in a previous blog post with proper specialization.
It should be quite straight forward to fully specialize the filter and map iterators used in my previous post. This turns out to not be the case as the current specialization implementation contains a quite severe limitation:
A method is only specialized for a type parameter annotated as specialized if the method type signature contains the type parameter in a "naked" position (or an array of the "naked" type parameter).
See this ticket for more information. Here are some examples of how this limitation affects method specialization:
trait A[@specialized T] {
def m1(t : T) : Boolean // Will be specialized for T
def m2(f : T => Boolean) // Will NOT be specialized for T
def m3[@specialized U](fn : T => U) : T // Will be specialized for T, but NOT for U
def m4() // Will NOT be specialized
}
It easy to see how this will cause some problems when implementing specialized iterators and higher order functions.
The iterator trait from my previous post has to be modified slightly as the next method would not get specialized due to the limitation mentioned above:
trait SIterator[@specialized T] {
def hasNext : Boolean
def next() : T
}
This interface is similar to the Java and Scala iterator interfaces.
The implementations of array and map iterators are straight forward, but the filter iterator is quite tricky:
final class FilterIterator[@specialized T](iter : SIterator[T], pred : T => Boolean) extends SIterator[T] {
private var hasElem = false
private var elem : T = findNext()
def hasNext = hasElem
def next() = {
val r = elem
findNext()
r
}
def findNext() : T = {
while (iter.hasNext) {
elem = iter.next()
if (pred(elem)) {
hasElem = true
return elem
}
}
hasElem = false
elem
}
}
Normally findNext wouldn't return anything, but that would cause it to not be specialized, so the return type must be set to T.
The fold method is awkward to use as a dummy parameter has to be added so that all type parameters appear in "naked" positions:
def fold[@specialized T, @specialized U](iter : SIterator[T], fn : (U, T) => U, v : U, dummy : T) = {
var r = v
while (iter.hasNext) {
r = fn(r, iter.next())
}
r
}
This leaves us with the following code for the map-filter-sum calculation:
def mapFilterSum(a : Array[Int]) = {
val ai = new ArrayIterator(a, 0, a.length)
val s = new FilterIterator[Int](new MapIterator[Int, Int](ai, _ * 3 + 7), _ % 10 == 0)
fold[Int, Int](s, _ + _, 0, 0)
}
Now to the benchmarks. Remember that we want to compare the performance to this while loop:
def mapFilterSumLoop(a : Array[Int]) = {
var i = 0
var r = 0
while (i < a.length) {
val v = a(i) * 3 + 7
if ((v % 10) == 0)
r += v
i += 1
}
r
}
Compiling with -optimize and running with -server gives the following benchmarks times:
Java version: 1.6.0_20
Loop: (4936,-423600172)
Specialized iterators: (5935,-423600172)
So, 4936 microseconds for the while loop compared to 5935 microseconds for the specialized iterators, about 20% performance penalty. This relatively small difference in running times indicates that all boxing/unboxing has been eliminated. Nice indeed.
It would be nice to clean up the syntax a bit for the filter and map operations. This turns out to be harder than expected, again due to the limitation in the specialization implementation. Dummy parameters have to be added to the filter and map methods:
trait SIterator[@specialized T] {
def hasNext : Boolean
def next() : T
def filter(pred : T => Boolean, dummy : T) = new FilterIterator[T](this, pred)
def map[@specialized U](fn : T => U, dummy : T, dummy2 : U) = new MapIterator[T, U](this, fn)
}
Even with the dummy parameters the map method is not specialized correctly which leads to boxing and inferior performance.
In conclusion I would say that specialization still needs some work before it can be useful in a collection library. It's a very powerful feature and hopefully it can be improved to reach its full potential in future Scala releases.