Saturday, October 4, 2008

Functional Purity in Scala

An important concept in functional programming is pure functions. The properties of pure functions means that they are easily verifiable and allows concurrent execution (an important advantage on today's multi-core machines). You can write pure functions in pretty much all programming languages, but AFAIK there are only two popular programming languages that verifies the purity at compile time: Haskell and D. In Haskell you can't use side effects or externally visible variables if you don't explicitly indicate so using monads in your function signature. D takes a different approach where the default is that you are allowed to use side effects and variables in your functions, and you must explicitly mark them as pure, see this presentation for more information. The difference in the implementations are surely because Haskell was designed from the start to be pure, but in D pure functions was added in version 2.0 of the language.

Pure Functions in Scala

I've been thinking about the best approach to implement pure function verification in the Scala compiler. An approach similar to the one in D would fit a lot better than the one used in Haskell (which would break all existing code and cause some problems due to strict evaluation). A solution using annotations would be quite simple to implement:

class Pure extends Annotation

In practice you want to define this as a runtime annotation in Java, but let's stick to Scala here. Now you can mark a function/method as pure:

@Pure def pureFunc(x : Int, y : Int) = x + y

There are some rules that the compiler must verify for a pure method/function:
  • Only calls to pure functions are allowed. This requires that a large number of functions in the Scala library must be marked using the pure annotation otherwise it will be impossible to write new pure functions.

  • Non-local vars can't be read or written. Local vars can be used in a pure function (as an accumulator for example), but you can't access static variables or variables reachable from the arguments.

  • A pure method can only be overridden by a pure method, and an interface method defined as pure can only be implemented as a pure method.

So far things are simple, but the restrictions imposed are quite severe, for example we can't create an array inside a pure function and use locally as this would result in calls to the non-pure array apply/update methods. Clearly this has to be allowed, which leads to the concept of "semi-pure" functions.

Semi-Pure Functions

Let's define the concept of a semi-pure function/method:

class SemiPure extends Annotation
class Pure extends SemiPure

As you can see a Pure function is a subtype of SemiPure, so anywhere a SemiPure function is required a Pure function can be used, for example a method marked as SemiPure can be overridden by a Pure method (but not the other way around). Here's an example of a semi-pure method:

case class Var(var value : Int) {
@SemiPure def inc = {
value += 1 // Ok, we can modify fields in this object
value
}
}

The following compilation rules applies to a semi-pure function:
  • It can call pure and semi-pure functions.

  • It can use local variables.

  • It can read and write variables reachable from its arguments. This includes the implicit this argument passed for class methods, so a class method can read/write fields of a class instance.

  • It can only be overridden by/implemented as a semi-pure or pure method.

So what use do we have for semi-pure functions? Well, they allow us to loosen the restrictions placed on pure functions somewhat: a pure function may call a semi-pure function if, and only if, all the argument values passed are created locally or are "invariant". "Invariant" is a term I borrowed from D, it's basically a deeply immutable value, i.e. it's an immutable value that only contain invariant values :). For example, List[Int] is invariant, but List[Array[Int]] is not invariant even though List[Array[Int]] is an immutable type.

So with this loosened restriction you can for example define a function that creates an array and updates it in a loop, and it can still be a pure function. A quite powerful concept that blends the imperative and purely functional programming styles.

Here's an example of a legal pure function that calls a semi-pure function (let's assume the method List[T].head is defined as pure):

case class Var(var value : Int) {
@SemiPure def inc = {
value += 1
value
}
}

@Pure def pureFunc(l : List[Var]) = {
val x = l.head // Ok, pure method called
val v2 = Var(10) // Object created locally
v2.inc // Ok, call of semi-pure function with locally created object
}

However, this is not allowed:

@Pure def pureFunc2(l : List[Var]) = {
val x = l.head // Ok, pure method called
x.inc // Error: semi-pure method called on variant external object
}

as it would result in an externally visible modification.

How to verify that a type is invariant is a problem that needs to be explored further. It should be doable with an addition of an immutable/invariant annotation, but there are some complications with subtyping and type parameters.

Function Objects

One more thing needs to be solved: how to declare a pure function parameter for a higher-order function. A quite simple solution is to create traits for pure and semi-pure functions and mix them with the function types. For example if you want to define a pure map function:

trait TSemiPure
trait TPure extends TSemiPure

@Pure def map[T, U](l : List[T], fn : (T => U) with TPure) : List[U] =
if (l.isEmpty) Nil else fn(l.head) :: map(l.tail, fn)

The TPure type would simply mean that the function objects apply method is considered pure by the compiler. There would be some restrictions on how the (semi-)pure traits would be allowed to be mixed in by the programmer. Another, maybe simpler, option is to create a new set of (Semi)PureFunction0-22 traits that extends the existing Function0-22 traits.

For lambda expressions the compiler could automatically infer if the function is pure, semi-pure or impure, and use the correct trait. During eta expansion the correct trait can be used depending on the annotation of the method/function expanded.

Final Words

Using the constructs I've presented in this post I think it would be feasible to implement checking of functional purity in the Scala compiler without too much effort. Hopefully this will result in a SIP in the near future, so that Scala hackers can utilize the powerful tool of statically checked pure functions.

I'm sure I've made some error or missed something along the way, so I'm grateful for any comments/corrections you might have.

8 comments:

James Iry said...

It's occasionally useful to do something impure in an otherwise pure context. For instance, I might want to log trace or debug info inside a pure function. I'd like a way to be able to embed a bit of impure code and tell the compiler to treat it as if it were pure for purposes of deducing the purity of the function as a whole. Something like Haskell's performUnsafeIO.

In a related use case, there are a lot of Java libraries, e.g. Math, that are in fact pure but which Scala would have to conservatively assume are impure. So I'd like a way to write a Math wrapper that exposes the same functionality as a pure interface. The same mechanism could be used.

Jesper Nordenberg said...

Yes, such a construct would be useful. I'm not sure about the best way to implement it, maybe an annotation that can be applied to an expression to "purify" it.

James Iry said...

Another option is a pseudo-function in the Predef. I say "pseudo" because, of course, it can't be implemented as a true function - it would have to be compiler magic.

@Pure
def unsafePurify[A](exp : @Impure => A) = ...

Jesper Nordenberg said...

Yes, that would be quite clean as long as the function object call is inlined.

Daniel Spiewak said...

Fascinating idea. However, I think that I would probably want the "pure" function concept to allow local impure function calls (eg Array#update). This is tricky though because locally impure methods may have globally impure side-effects. For example, Array#update might log its input, breaking the purity of the method which called it.

What you're talking about sounds vaguely like the const modifier in C++. It sounds like a great idea when you read about it, but in practice it sounds like it would be extremely problematic. With that said, I would love to be proven wrong by a Scala implementation of your idea! :-)

Daniel Spiewak said...

My previous comment was a bit unclear: I would expect the @pure annotation to function much as you describe the @semipure. I see the problems with that approach though, so I understand why you would suggest them as separate restrictions.

Jesper Nordenberg said...

@Daniel:
Array.update and apply are examples of methods that would be semi-pure, i.e. they can't access static variables or call non-Java code (native methods). If you need to log inside a pure or semi-pure method I guess the only solution is to call some performUnsafeIO function.

C++ const is not very useful in regard to pure functions as it only guarantees that the object is not changed by this method, but it can still be changed by other methods at any time. In a pure function you need a guarantee that the object is not modified at any time by any method. That's why the concept of deep (transitive) immutability is required. It's not a trivial topic and I will talk more about this in a later post. I suggest you read this presentation about the D implementation for more information.

phil said...

Is there any way to mark pure functions in Scala now? I would think this would be useful to allow for caching of the results in some cases.