Saturday, May 25, 2013

Sets and the Type of 1

I've been pondering the subject of types and values for some time and decided to materialize my thoughts in a blog post. Too keep things simple and easily readable I will reason about types from a set theory point of view, not using traditional type theory.

What is a Type?

Let's start by investigating what a type is. According to Wikipedia a type system can be defined as "a tractable syntactic framework for classifying phrases according to the kinds of values they compute". So, the type of a phrase (whatever that is) is simply the set of the values that can be computed from this phrase. In this post I will use the terms type and set interchangeably as I consider them to be the same thing (mathematically this might not be correct).

The Type of 1

So, what is the type of the value 1? In Scala (and Java, C/C++, C# etc.) the type of 1 is a 32-bit signed integer. This is strange as the value 1 is a member of many sets (in fact an infinite number of them), so why is this set chosen? Probably because of historical reasons and because it's a convenient type to do calculations with on most computers, but mathematically this type doesn't have any special meaning.

In Haskell the type of 1 is (Num t) => t. Indeed confusing, the type is defined in terms of a typeclass. This set includes a lot of values, including user defined ones. Really, Haskell must resort to typeclasses to define the type of 1?

I would argue that there are only two reasonable types for the value 1: the set containing all values and the set containing only the value 1.  The set containing all values is really not useful in static type checking, so that leaves the set which only element is the value 1, {1} in set notation (also called a singleton set). The same reasoning would apply to all values. I find it remarkable that pretty much all commonly used programming languages gets this simple type wrong. How can one expect a type system to be useful if it can't even infer the correct type of the most basic expressions?

The Type of a Function

A (total) function is a relation that maps each element of one set, let's call it the input set, to exactly one element of another (or the same) set, let's call it the output set. This relation can be modelled as a set of pairs a -> b where a is a member of the input set and b is a member of the output set. The additional restriction on the relation set is that there must be exactly one pair a -> * in the set (where * represents any member of the output set).

So, let's apply this model to a simple function in Scala:

val foo = (b : Boolean) => if (b) 1 else "a"

What is the only reasonable type of foo? Well, the input set of foo is Boolean which is equal to {true, false}. The output set of foo is {1, "a"}. If the input value is true, foo will output the value 1 and if the input value is false, foo will output value "a". So we can write the type of foo as {true -> 1, false -> "a"}. In Scala the type of foo is Boolean => Any, which is a pretty useless type as Any is the set of all values (i.e. the type doesn't tell us anything about the value). In Haskell the corresponding definition won't even type check. Seriously, these are state of the art programming languages, is this the best we can do?


Well, what if we have:

val addOne : Int => Int
val x : {1} = 1
addOne(x)   // Type error?

No, this should not be a type error. The first thing to notice is that the set Int contains the value 1 and thus {1} is a subset of Int. This means that it's perfectly safe to apply the function addOne on the input set {1} as we're guaranteed that the relation set addOne contains a mapping for each element in the input set. In type theory one would say that {1} is a subtype of Int.

Final Words

I've only skimmed the surface of how to apply the concept of sets and values to programming languages. The concept can be extended to ADT's, pattern matching, type classes etc. I think there is a lot to gain from thinking about types this way because there is something fundamentally broken in the way types are implemented in most programming languages. Do you agree?

Monday, March 11, 2013

A More Efficient Option

Updated: 2013-03-12

Scala's Option type is a big improvement in type safety over Java's null checking and NullPointerException's. Unfortunately when wrapping a value in Some there is a quite big overhead: creation of one additional object containing a reference to the value. It would be more efficient if we could represent Some as the unboxed pointer value and encode None as the null value, but at the same time preserving the type safety as we get with the Option type (for example Kotlin uses this approach in it's builtin support for nullable types). Well, we can do exactly this with value classes introduced in Scala 2.10:

final case class Option[+T](value: Any = null) extends AnyVal with Product with Serializable {
  private def unsafeGet = value.asInstanceOf[T]
  def isEmpty: Boolean = value == null

The reason that the class parameter is of type Any and not T is that it's not allowed to create a value of type Nothing. We still want to be able to create a None value of type Option[Nothing] though so we delay the unsafe cast to T (using the unsafeGet method) until the value is actually required and we've checked that it's actually there using the isEmpty method.

The code is on GitHub if someone wants to try it out. It's pretty much a dropin replacement for the standard Scala Option type.

Memory Usage Comparisons

Below is a comparison in memory usage between the scala.Option type and the unboxed AnyVal option type when allocating 1M objects each containing one optional value:

scala.Some: 33 MB
scala.None: 19 MB
scala.Some : Any: 34 MB
scala.None : Any: 19 MB
AnyVal Some: 19 MB
AnyVal None: 19 MB
AnyVal Some : Any: 34 MB
AnyVal None : Any: 34 MB

As one would expect, memory usage for Some is almost halved and for None the memory usage is unchanged. The downside of using the AnyVal option is that it uses more memory when a None is upcasted to a super type because then the compiler will box the reference. I would assume this is a quite rare case though.

Tuesday, May 8, 2012

My Take on Haskell vs Scala

I've used both Haskell and Scala for some time now. They are both excellent and beautifully designed functional programming languages and I thought it would be interesting to put together a little comparison of the two, and what parts I like and dislike in each one. To be honest I've spent much more time developing in Scala than Haskell, so if any Haskellers feel unfairly treated please write a comment and I will correct the text.

This is a highly subjective post, so it doesn't contain many references to sources. It helps if the reader is somewhat familiar with both languages.

With that said, let's start the comparison.


When it comes to syntax Haskell wins hands down. The combination of using white space for function and type constructor application, and currying makes the code extremely clean, terse and easy to read. In comparison Scala code is full of distracting parenthesis, curly bracers, brackets, commas, keywords etc. I find that I've started using Haskell syntax when writing code on a white board to explain something to a colleague, which must be an indication of it's simplicity.

Type Inference

The type inference in Haskell just works, and does so reliably. The type inference in Scala is clearly worse, but not too bad considering the complexity of the type system. Most of the time it works quite well, but if you write slightly complicated code you will run into cases where it will fail. Scala also requires type annotations in function/method declarations, while Haskell doesn't. In general I think it's a good idea to put type annotations in your public API anyway, but for prototyping/experimentation it's very handy to not be required to write them.


By being Java compatible, Scala obviously has subtyping, while Haskell has not. Personally I'm on the fence whether subtyping is a good idea or not. On one hand it's quite natural and handy to be able specify that an interface is a subtype of some other interface, but on the other hand subtyping complicates type system and type inference greatly. I'm not sure the benefits outweighs the added complexity, and I don't think I'm the only one in doubt.

Modules vs Objects

The Haskell module system is very primitive, it's barely enough to get by with. In contrast Scala's object/module system is very powerful allowing objects to implement interfaces, mixin traits, bind abstract type members etc. This enables new and powerful ways to structure code, for example the cake pattern. Of course objects can also be passed around as values in the code. Objects as modules just feels natural IMHO.

Typeclasses vs Implicit Parameters

Typeclasses in Haskell and implicit parameters in Scala are used to solve basically the same problems, and in many ways they are very similar. However, I prefer Scala's solution as it gives the developer local, scoped control over which instances are available to the compiler. Scala also allows explicit instance argument passing at the call site which is sometimes useful. The idea that there is only one global type class instance for a given type feels too restricted, it also fits very badly with dynamic loading which I discuss in a later section. I also don't like that type class instances aren't values/objects/records in Haskell.

Scala has more advanced selection rules to determine which instance is considered most specific. This is useful in many cases. GHC doesn't allow overlapping instances unless a compiler flag is given, and even then the rules for choosing the most specific one are very restricted.

Lazy vs Strict Evaluation

Haskell has lazy evaluation by default, Scala has strict evaluation by default. I'm not a fan of lazy evaluation by default, perhaps because I've spent much time programming languages like assembly and C/C++ which are very close to the machine. I feel that to able to write efficient software you must have a good knowledge of how the execution machine works and for me lazy evaluation doesn't map well to that execution model. I'm simply not able to easily grasp the CPU and memory utilization of code which uses lazy evaluation. I understand the advantages of lazy evaluation, but being able to get predicable, consistent performance is a too important part of software development to be overlooked. I'm leaning more towards totality checking, as seen in newer languages like Idris, combined with advanced optimizations to get many of the benefits of lazy evaluation.

Type Safety

In Haskell side effects are controlled and checked by the compiler, and all functions are pure. In Scala any function/method can have hidden side effects. In addition, for Java compatibility the dreaded null value can be used for any user defined type (although it's discouraged), often resulting in NullPointerException. Also exceptions are used quite often in Java libraries and they are not checked by the Scala compiler possibly resulting in unhandled error conditions. Haskell is definitely a safer language to program in, but assuming some developer discipline Scala can be quite safe too.

Development Environment

The development environment for Scala is, while not on par with the Java's, quite nice with good Eclipse and IntelliJ plugins supporting code completion, browsing, instant error highlighting, API help and debugging. Haskell doesn't have anything that comes close to this (and no, Leksah is not there :-) ). And I don't buy the argument that you don't need a debugger for Haskell programs, especially for imperative code a good debugger is an invaluable tool. The developer experience in any language is much improved by a good IDE, and Scala is way ahead here.
Update: The EclipseFP Eclipse plugin for Haskell looks promising.

Runtime System

I like the JVM, it's a very nice environment to run applications in. Besides the state of the art code optimizer and garbage collectors, there are lots of tools available for monitoring, profiling, tuning, load balancing for the JVM that just works out of the box with Scala. Being able to easily load and unload code dynamically is also useful, for example in distributed computing.

Haskell has a much more static runtime system. I don't have much experience with runtime tools for Haskell, but my understanding it doesn't have the same amount of tools as the JVM.


The compilers for Haskell (GHC) and Scala (scalac) are very different. GHC performs quite complex optimizations on the code during compilation, while scalac mainly just outputs Java bytecode (with some minor optimizations) which is then dynamically compiled and optimized by Hotspot at runtime. Unfortunately things like value types and proper tailcall elimination, which are supported by GHC, are currently not implemented in the JVM. Hopefully this will be rectified in the future.

One thing Haskell and Scala have in common is that they both use garbage collection. While this is convenient in many cases and sometimes even necessary, it can often be a source of unnecessary overhead and application latency. It also gives the developer very little control over the memory usage of an application. GHC supports annotation of types to control boxing, and Hotspot does a limited form of escape analysis to eliminate heap allocation in some cases. However there is much room for improvement in both environments when it comes to eliminating heap allocations. Much can be learned from languages like Rust, BitCATS and even C++. For some applications it's critical to be able to have full control over memory management, and it's very nice to have that in a high level language.

Final Words

Haskell and Scala are both very powerful and practical programming languages, but with different strengths and weaknesses. Neither language is perfect and I think there is room for improvements in both. I will definitely continue to use both of them, at least until something better shows up. Some new interesting ones on my radar are Rust, ATS and Idris.

Thursday, May 13, 2010

Scala Stream Fusion and Specialization, Part 2

Note: The full source code for this post can be found here.

Scala specialization is maturing and in Scala 2.8 RC2 some parts of the standard library has been specialized (the function and tuple classes). So, it's time to update the stream fusion tests I wrote about in a previous blog post with proper specialization.

It should be quite straight forward to fully specialize the filter and map iterators used in my previous post. This turns out to not be the case as the current specialization implementation contains a quite severe limitation:

A method is only specialized for a type parameter annotated as specialized if the method type signature contains the type parameter in a "naked" position (or an array of the "naked" type parameter).

See this ticket for more information. Here are some examples of how this limitation affects method specialization:

trait A[@specialized T] {
def m1(t : T) : Boolean // Will be specialized for T
def m2(f : T => Boolean) // Will NOT be specialized for T
def m3[@specialized U](fn : T => U) : T // Will be specialized for T, but NOT for U
def m4() // Will NOT be specialized

It easy to see how this will cause some problems when implementing specialized iterators and higher order functions.

The iterator trait from my previous post has to be modified slightly as the next method would not get specialized due to the limitation mentioned above:

trait SIterator[@specialized T] {
def hasNext : Boolean
def next() : T

This interface is similar to the Java and Scala iterator interfaces.

The implementations of array and map iterators are straight forward, but the filter iterator is quite tricky:

final class FilterIterator[@specialized T](iter : SIterator[T], pred : T => Boolean) extends SIterator[T] {
private var hasElem = false
private var elem : T = findNext()

def hasNext = hasElem

def next() = {
val r = elem

def findNext() : T = {
while (iter.hasNext) {
elem =

if (pred(elem)) {
hasElem = true
return elem

hasElem = false

Normally findNext wouldn't return anything, but that would cause it to not be specialized, so the return type must be set to T.

The fold method is awkward to use as a dummy parameter has to be added so that all type parameters appear in "naked" positions:

def fold[@specialized T, @specialized U](iter : SIterator[T], fn : (U, T) => U, v : U, dummy : T) = {
var r = v

while (iter.hasNext) {
r = fn(r,


This leaves us with the following code for the map-filter-sum calculation:

def mapFilterSum(a : Array[Int]) = {
val ai = new ArrayIterator(a, 0, a.length)
val s = new FilterIterator[Int](new MapIterator[Int, Int](ai, _ * 3 + 7), _ % 10 == 0)
fold[Int, Int](s, _ + _, 0, 0)

Now to the benchmarks. Remember that we want to compare the performance to this while loop:

def mapFilterSumLoop(a : Array[Int]) = {
var i = 0
var r = 0

while (i < a.length) {
val v = a(i) * 3 + 7

if ((v % 10) == 0)
r += v

i += 1


Compiling with -optimize and running with -server gives the following benchmarks times:

Java version: 1.6.0_20
Loop: (4936,-423600172)
Specialized iterators: (5935,-423600172)

So, 4936 microseconds for the while loop compared to 5935 microseconds for the specialized iterators, about 20% performance penalty. This relatively small difference in running times indicates that all boxing/unboxing has been eliminated. Nice indeed.

It would be nice to clean up the syntax a bit for the filter and map operations. This turns out to be harder than expected, again due to the limitation in the specialization implementation. Dummy parameters have to be added to the filter and map methods:

trait SIterator[@specialized T] {
def hasNext : Boolean
def next() : T
def filter(pred : T => Boolean, dummy : T) = new FilterIterator[T](this, pred)
def map[@specialized U](fn : T => U, dummy : T, dummy2 : U) = new MapIterator[T, U](this, fn)

Even with the dummy parameters the map method is not specialized correctly which leads to boxing and inferior performance.

In conclusion I would say that specialization still needs some work before it can be useful in a collection library. It's a very powerful feature and hopefully it can be improved to reach its full potential in future Scala releases.

Saturday, March 6, 2010

Scala Stream Fusion and Specialization

Updated: Fixed code links and added view and stream benchmarks.

Inspired by the successful results of Haskell stream fusion (see Evolving Faster Haskell Programs (now with LLVM!) for some impressive optimizations) I was thinking if a similar concept is applicable to Scala collections. It turns out that with a combination of iterators and specialization it's possible to achieve similar optimizations in Scala.

The goal of stream fusion is essentially to optimize code like this:

def scalaLibrarySum(a : Array[Int]) = => i * 3 + 7).filter(i => (i % 10) == 0).foldLeft(0)(_ + _)

into code like this:

def mapFilterSumLoop(a : Array[Int]) = {
var i = 0
var r = 0

while (i < a.length) {
val v = a(i) * 3 + 7

if ((v % 10) == 0)
r += v

i += 1


If you run the scalaLibrarySum method in Scala it will create two intermediate arrays with the results of the map and filter operations. This is totally unnecessary for this calculation as the functions passed to filter and map are side effect free and thus the result of the function applications can be performed lazily just before the result is needed in the fold operation. This is basically how the mapFilterSumLoop method works.

Besides creating intermediate arrays, boxing of primitive values must be avoided if we want to have any chance of competitive performance (the Haskell libraries contain specialized instances to avoid boxing). Fortunately Scala supports specialization of type parameters in version 2.8, which enables us to avoid boxing while still writing generic code. Unfortunately this feature seems to be quite buggy at the moment, just by playing around with a simple example I encountered two bugs (tickets #3148 and #3149). So, the code below contain some specialization done by hand. Hopefully these bugs will be fixed so that the code can be fully generalized.

The biggest difference compared to stream fusion in Haskell is that I use impure iterators in the Scala code. This is not as nice as the pure stream code used in Haskell, but it's a fact that Hotspot isn't nearly as good at optimizing pure functional code as GHC. Hotspot works best if fed imperative style loops.

Here's the definitions of the specialized functions and iterators I use in the benchmark below:

// Specialized Function1
trait Fn1[@specialized I, @specialized O] {
def apply(a : I) : O

// Specialized Function2
trait Fn2[@specialized I1, @specialized I2, @specialized O] {
def apply(a1 : I1, a2 : I2) : O

// Specialized iterator
trait SIterator[@specialized T] {
def hasMore : Boolean
def current : T
def next()

In addition to this I've defined array, filter and map iterators. Unfortunately these are not generic due to the problems with the specialize feature:

class IntArrayIterator(a : Array[Int], var index : Int, endIndex : Int) extends SIterator[Int] {
def next() = index += 1
def current = a(index)
def hasMore = index < endIndex

// Optimally this would be: class FilterIterator[@specialized T](iter : SIterator[T], pred : Fn1[T, Boolean]) extends SIterator[T]
class FilterIterator(iter : SIterator[Int], pred : Fn1[Int, Boolean]) extends SIterator[Int] {
def hasMore = iter.hasMore

def next() = {

def findNext() = {
while (iter.hasMore && !pred(iter.current))

def current = iter.current


// Optimally this would be: class MapIterator[@specialized U][@specialized T](iter : SIterator[T], fn : Fn1[T, U]) extends SIterator[U]
class MapIterator(iter : SIterator[Int], fn : Fn1[Int, Int]) extends SIterator[Int] {
def next() =
def current = fn(iter.current)
def hasMore = iter.hasMore

The fold function is straightforward and generic:

def fold[@specialized T, @specialized U] (iter : SIterator[T], fn : Fn2[U, T, U], v : U) = {
var r = v

while (iter.hasMore) {
r = fn(r, iter.current)


The map-filter-sum function can now be written using iterators:

def mapFilterSum(a : Array[Int]) = {
val filter = new Fn1[Int, Boolean] {def apply(a : Int) = (a % 10) == 0}
val map = new Fn1[Int, Int] {def apply(a : Int) = a * 3 + 7}
val s = new FilterIterator(new MapIterator(new IntArrayIterator(a, 0, a.length), map), filter)
fold(s, new Fn2[Int, Int, Int] {def apply(a1 : Int, a2 : Int) = a1 + a2}, 0)

The full iterator code can be found here. Compile the code using the latest Scala 2.8 build with the -Yspecialize flag. The optimize flag doesn't seem to have much effect on the performance.

I've benchmarked four different implementations of the map-filter-sum calculation:

  • The while loop shown above

  • The while loop split up into map, filter and fold functions with intermediate array results passed between them

  • The version using specialized iterators

  • The Scala library implementation shown above

  • Same as Scala library function but with a view instead

  • Same as Scala library function but with a stream instead

The benchmark is performed by taking the minimum execution time of 200 runs of each of the functions on an array of 1 million integers. Running the application with latest OpenJDK 7 (Hotspot version "build 17.0-b10") and the flags "-server -XX:CompileThreshold=100" I get the following results:

Loop: (4990,-423600172)
Loop with intermediate arrays: (6690,-423600172)
Specialized iterators: (5367,-423600172)
Scala array: (46444,-423600172)
Scala view: (39625,-423600172)
Scala stream: (63210,-423600172)

The first result value is the minimum execution time in microseconds, the second value is the result of the calculation. As you can see the method using specialized iterators is almost as fast as the single while loop. Hotspot has inlined all the iterator code, not bad! Using intermediate arrays is about 25% slower than specialized iterators. Using the Scala library is about 7-9 times slower! Clearly this bad result is a consequence of boxing taking place. Using a view is fastest here as it also avoids intermediate array creation.

The conclusion from this simple experiment is that it's certainly possible to write collection libraries with a nice high level interface and at the same time have excellent performance. When Scala specialization support is improved hopefully this power be available to all Scala programmers.

The full benchmark code can be found here.

Saturday, September 12, 2009

Type Lists and Heterogeneously Typed Arrays

In previous blog posts (HList in Scala and HList in Scala Revisited (or Scala Metaprogramming Works!)) I described a way to encode heterogeneously typed lists, HList, in Scala. This data type can be used as a generic replacement for the TupleX types found in the Scala library. In this post I will generalize the concept of type lists and introduce a new data type, HArray.

Note: the MetaScala code linked to in this post must be compiled using a recent Scala 2.8.0 build. I've given up on making it work with 2.7.x.

Type Lists

In HList the types of the elements and the element values are intertwined in the same type. It doesn't have to be this way though and the types of the elements can be tracked separate from the actual element values. To do this we create an purely abstract type (it has no instances) which models a list of types, let's call it TList. Here's a simple implementation:

// Abstract base type
sealed trait TList

// The empty type list
final class TNil extends TList

// A pair of a type and a type list
final class TCons[H, T <: TList] extends TList {
type Head = H
type Tail = T

// For infix notation
type ::[H, T <: TList] = TCons[H, T]

which would allow us to build type lists using infix notation, for example:

type SomeTypes = Int :: Boolean :: String :: TNil

This is only the basic definition of TList, in practice you want to add lots of functionality to this type, like append, length, indexing, reverse etc. The full TList code is available in MetaScala. TList can be used for implementing HList, but also heterogeneously typed arrays which I describe below.

Heterogeneously Typed Arrays

With a heterogeneously typed array I mean an array where each element can have a unique type which is tracked at compile time (the length of the array is also tracked at compile time). This ensures that the element values are handled in a type safe way. The values are stored in an Array[Any] and are cast to the correct type when extracted from the array (remember that Java generics are implemented using casts as well so this is no different from using the TupleX classes, or a Java ArrayList). The element types are tracked using a TList.

Here's part of the implementation which shows the prepend, append and nth methods:

final class HArray[L <: TList](private val elems : Array[Any]) extends HSeq[L] {

def ::[T](v : T) = {
val a = new Array[Any](elems.length + 1)
a(0) = v
Array.copy(elems, 0, a, 1, elems.length)
new HArray[T :: L](a)

def :::[L2 <: TList](l : HArray[L2]) = {
val a = new Array[Any](elems.length + l.elems.length)
Array.copy(l.elems, 0, a, 0, l.elems.length)
Array.copy(elems, 0, a, l.elems.length, elems.length)
new HArray[L2#Append[L]](a)

def apply[N <: Nat](implicit nth : INth[N]) : L#Nth[N] = elems(nth.index).asInstanceOf[L#Nth[N]]


Notice that the method implementations use simple array operations (array copy and indexing) so the performance should be comparable to using the TupleX classes in the Scala library (no benchmarks performed yet though). The full HArray implementation can be found here.

The HArray interface is similar to HList and virtually the same operations are supported. There are some usage examples in MetaScala:

// Create a HArray of an Int, Boolean and a pair
val a1 = 10 :: true :: (10.1, "Hello") :: HArrayNil

// Extract the second element, note that the element type
// information is preserved and we can safely perform a
// boolean and operation
val b = a1(_1) && false

// Create another HArray using alternative syntax (faster)
val a2 = HArray(1.1, "string", false)

// Replace the second element in the list, it used to
// be a String, but now it's an Int
val a3 = a2.removeNth(_1).insert(_1, 14)

// Type information preserved, we can use an Int operation
// on the element
val i = a3(_1) / 2

// Append l2 to l1
val a4 = a1 ::: a2

// Statically check that the length of l4 is 6
type T = Equal[_6, a4.Size]

Other TList Applications

There are other applications for the TList type, for example it can be used for modeling type unions like this:

case class OneOf[TS <: TList](value : Any)

which would be a lot easier to use than Either when you have more than two choices. A problem with this solution is that it wouldn't play well with Scala's pattern matcher, you would to add need unnecessary default cases for example. It should be possible to create type matching functions though to alleviate this problem. More work is needed in this area and it might be material for a future blog post.

If you can think of other applications for type lists and sets I would be interested in hearing them.

Tuesday, April 21, 2009

Equality, Mutability and Products

The thoughts in this post was sparked by a discussion on the scala-user mailing list. The discussion was about the Object.equals() method and what its semantics should be. There are many different types of equality, but IMO the most usable one is if the expression a.equals(b) returns the same value regardless of when it's evaluated (provided the references a and b are constant), i.e. it's time invariant. If this was not the case, it would be quite impossible to implement Object.hashCode() in a useful way and you wouldn't be able to use objects as keys in a hash map or as members of a set.

The Java standard library breaks this practice in a number of places, for example java.util.ArrayList implements equals() as element equality which of course is not time invariant as the list can change over time (unless it's read only). I think the rationale behind this might be that there are no immutable collection classes in the Java libraries, and thus ArrayList is used for both mutable and immutable lists.

Equality for Mutable Case Classes

When using Scala case classes you automatically get an equals() and hashCode() implementation which calls the corresponding class fields methods. The implementation is the same regardless of whether the fields are mutable or not, so if you use vars the implementations will not be time invariant. For example:

scala> case class A(var i : Int)
defined class A

scala> val a1 = A(1)
a1: A = A(1)

scala> val a2 = A(1)
a2: A = A(1)

scala> a1 == a2
res57: Boolean = true

scala> a1.hashCode
res58: Int = -1085252538

scala> a1.i = 2

scala> a1 == a2
res59: Boolean = false

scala> a1.hashCode
res60: Int = -1085252537

How can we fix this? The Scala compiler only generates equals() and hashCode() implementations if we have not already implemented them in a super type (excluding java.lang.Object). So we can simply define a super trait for all mutable case classes:

trait Mutable {
override def equals(v : Any) = super.equals(v)
override def hashCode = super.hashCode

Now we can easily define mutable case class with the default, time invariant equals() and hashCode() methods:

scala> case class A(var i : Int) extends Mutable
defined class A

scala> val a1 = A(1)
a1: A = A(1)

scala> val a2 = A(1)
a2: A = A(1)

scala> a1 == a2
res61: Boolean = false

scala> a1.hashCode
res62: Int = 22213838

scala> a1.i = 2

scala> a1 == a2
res63: Boolean = false

scala> a1.hashCode
res64: Int = 22213838

A more elegant solution is to define a Var trait and use this instead of vars in case classes, but this is not always a practical solution and it has some performance drawbacks.

Time Variant Equalily for Products

So, what if I want to compare my case classes for element equality at a given moment in time? Turns out it's quite easy to implement such a function in Scala as all case classes implements the Product trait. Here's an example implementation (not thoroughly tested):

def equalElements(v1 : Any, v2 : Any) : Boolean = (v1, v2) match {
case (p1 : Product, p2 : Product) =>
p1.getClass == p2.getClass &&
(0 until p1.productArity).forall(i => equalElements(p1.productElement(i), p2.productElement(i)))
case _ => v1 == v2

This function has some limitations and won't work correctly with nested product and non-product objects. In this case a more type specialized solution is required.

Example usage:

scala> val a1 = A(1)
a1: A = A(1)

scala> val a2 = A(1)
a2: A = A(1)

scala> equalElements(a1, a2)
res68: Boolean = true

scala> a1.i = 2

scala> equalElements(a1, a2)
res69: Boolean = false