Tuesday, May 8, 2012

My Take on Haskell vs Scala

I've used both Haskell and Scala for some time now. They are both excellent and beautifully designed functional programming languages and I thought it would be interesting to put together a little comparison of the two, and what parts I like and dislike in each one. To be honest I've spent much more time developing in Scala than Haskell, so if any Haskellers feel unfairly treated please write a comment and I will correct the text.

This is a highly subjective post, so it doesn't contain many references to sources. It helps if the reader is somewhat familiar with both languages.

With that said, let's start the comparison.

Syntax

When it comes to syntax Haskell wins hands down. The combination of using white space for function and type constructor application, and currying makes the code extremely clean, terse and easy to read. In comparison Scala code is full of distracting parenthesis, curly bracers, brackets, commas, keywords etc. I find that I've started using Haskell syntax when writing code on a white board to explain something to a colleague, which must be an indication of it's simplicity.

Type Inference

The type inference in Haskell just works, and does so reliably. The type inference in Scala is clearly worse, but not too bad considering the complexity of the type system. Most of the time it works quite well, but if you write slightly complicated code you will run into cases where it will fail. Scala also requires type annotations in function/method declarations, while Haskell doesn't. In general I think it's a good idea to put type annotations in your public API anyway, but for prototyping/experimentation it's very handy to not be required to write them.

Subtyping

By being Java compatible, Scala obviously has subtyping, while Haskell has not. Personally I'm on the fence whether subtyping is a good idea or not. On one hand it's quite natural and handy to be able specify that an interface is a subtype of some other interface, but on the other hand subtyping complicates type system and type inference greatly. I'm not sure the benefits outweighs the added complexity, and I don't think I'm the only one in doubt.

Modules vs Objects

The Haskell module system is very primitive, it's barely enough to get by with. In contrast Scala's object/module system is very powerful allowing objects to implement interfaces, mixin traits, bind abstract type members etc. This enables new and powerful ways to structure code, for example the cake pattern. Of course objects can also be passed around as values in the code. Objects as modules just feels natural IMHO.

Typeclasses vs Implicit Parameters

Typeclasses in Haskell and implicit parameters in Scala are used to solve basically the same problems, and in many ways they are very similar. However, I prefer Scala's solution as it gives the developer local, scoped control over which instances are available to the compiler. Scala also allows explicit instance argument passing at the call site which is sometimes useful. The idea that there is only one global type class instance for a given type feels too restricted, it also fits very badly with dynamic loading which I discuss in a later section. I also don't like that type class instances aren't values/objects/records in Haskell.

Scala has more advanced selection rules to determine which instance is considered most specific. This is useful in many cases. GHC doesn't allow overlapping instances unless a compiler flag is given, and even then the rules for choosing the most specific one are very restricted.

Lazy vs Strict Evaluation

Haskell has lazy evaluation by default, Scala has strict evaluation by default. I'm not a fan of lazy evaluation by default, perhaps because I've spent much time programming languages like assembly and C/C++ which are very close to the machine. I feel that to able to write efficient software you must have a good knowledge of how the execution machine works and for me lazy evaluation doesn't map well to that execution model. I'm simply not able to easily grasp the CPU and memory utilization of code which uses lazy evaluation. I understand the advantages of lazy evaluation, but being able to get predicable, consistent performance is a too important part of software development to be overlooked. I'm leaning more towards totality checking, as seen in newer languages like Idris, combined with advanced optimizations to get many of the benefits of lazy evaluation.

Type Safety

In Haskell side effects are controlled and checked by the compiler, and all functions are pure. In Scala any function/method can have hidden side effects. In addition, for Java compatibility the dreaded null value can be used for any user defined type (although it's discouraged), often resulting in NullPointerException. Also exceptions are used quite often in Java libraries and they are not checked by the Scala compiler possibly resulting in unhandled error conditions. Haskell is definitely a safer language to program in, but assuming some developer discipline Scala can be quite safe too.

Development Environment

The development environment for Scala is, while not on par with the Java's, quite nice with good Eclipse and IntelliJ plugins supporting code completion, browsing, instant error highlighting, API help and debugging. Haskell doesn't have anything that comes close to this (and no, Leksah is not there :-) ). And I don't buy the argument that you don't need a debugger for Haskell programs, especially for imperative code a good debugger is an invaluable tool. The developer experience in any language is much improved by a good IDE, and Scala is way ahead here.
Update: The EclipseFP Eclipse plugin for Haskell looks promising.

Runtime System

I like the JVM, it's a very nice environment to run applications in. Besides the state of the art code optimizer and garbage collectors, there are lots of tools available for monitoring, profiling, tuning, load balancing for the JVM that just works out of the box with Scala. Being able to easily load and unload code dynamically is also useful, for example in distributed computing.

Haskell has a much more static runtime system. I don't have much experience with runtime tools for Haskell, but my understanding it doesn't have the same amount of tools as the JVM.

Performance

The compilers for Haskell (GHC) and Scala (scalac) are very different. GHC performs quite complex optimizations on the code during compilation, while scalac mainly just outputs Java bytecode (with some minor optimizations) which is then dynamically compiled and optimized by Hotspot at runtime. Unfortunately things like value types and proper tailcall elimination, which are supported by GHC, are currently not implemented in the JVM. Hopefully this will be rectified in the future.

One thing Haskell and Scala have in common is that they both use garbage collection. While this is convenient in many cases and sometimes even necessary, it can often be a source of unnecessary overhead and application latency. It also gives the developer very little control over the memory usage of an application. GHC supports annotation of types to control boxing, and Hotspot does a limited form of escape analysis to eliminate heap allocation in some cases. However there is much room for improvement in both environments when it comes to eliminating heap allocations. Much can be learned from languages like Rust, BitCATS and even C++. For some applications it's critical to be able to have full control over memory management, and it's very nice to have that in a high level language.

Final Words

Haskell and Scala are both very powerful and practical programming languages, but with different strengths and weaknesses. Neither language is perfect and I think there is room for improvements in both. I will definitely continue to use both of them, at least until something better shows up. Some new interesting ones on my radar are Rust, ATS and Idris.

30 comments:

Kipton said...

Great post, thanks.

Alejandro Serrano said...

For Haskell development, there exists a Eclipse plug-in called EclipseFP [http://eclipsefp.github.com/]. Nowadays it has basic debugging, autocompletion, autofix for common errors (such as forgotten import or syntax extension), Cabal files editing... You may give it a try: it integrates very well with the Eclipse environment.
(Disclaimer: I've been one of the developers of EclipseFP)

ceii said...

The usual argument for Haskell's restriction to a single, global class instance per type is that it gives you a consistency guarantee. For example, if I have two (Set a) values (represented internally as trees, with an Ord constraint on a), it's straightforward to merge them because I know they were both created using the same ordering on a, which is also the one in scope at the point where I'm trying to merge them.

On the other hand I don't think there's be a "right thing" to do if there were several different instances in play. They might even disagree on which elements are equal and shouldn't both be part of the result!

This guarantee is really the only reason Haskell's type class system is the way it is. If it weren't for it I'm pretty sure GHC would already offer local instance declarations, reifying instances as records, and all the other wazoo. All these things are already possible in GHC's Core language which passes explicit dictionaries around, so not exposing them to the programmer is a conscious design choice.

As far as I understand (but I'm no expert), Scala deals with this by convention. Some types are treated as having unique, global instances and introducing a new one locally is only done with great care. For the rest, library writers are expected to not do toxic things like using a local, private implicit for a type and letting its effects leak through the API boundary.

What are your thoughts on this? It seems the Haskell community's consensus is that having a global consistency guarantee is worth the pain it brings, but I don't think there's ever been a real discussion on the matter.

Ciop - Volothamp said...

Jesper, since you tried Intellij Idea Scala's support, have you tried

http://code.google.com/p/ideah/

It's an haskell plugin.

James Iry said...

@ceii, actually the Scala convention is that where consistency is needed an object must carry around the "dictionary". For example, a Set that used an Equality type class would be constructed with the appropriate definition of equality and would then use that definition for all operations.

None-the-less, it is a convention and it's certainly possible to create a Set that uses a different definition of Equality every time - with disastrous results.

clementd said...
This comment has been removed by the author.
clementd said...

(sorry for the double comment)

Regarding the IDE, I found that with languages as powerful as scala and haskell, all I needed was vim, the build tool and a REPL.

I have configured vim and ctags to allow me to navigate in the code, but I don't use it that much.

sbt is a bit easier to handle for a basic project (it works out of the box) whereas with cabal it took some time to get it running.

One point where haskell wins is documentation, with tools like hoogle.

MCAndre said...

A fair, practical, and fairly complete review.

I too enjoy Haskell's syntax, and it's beginning to affect my pseudocode.

One thing Haskell does that no other language does is enable automatic optimizations such as parallel programming and GPU programming. For instance, if you change map to parMap, you're now using parallel map.

Jesper Nordenberg said...

@Alejandro:
Thanks for the pointer! I think I tried EclipseFP some time ago, but it seems much more polished now. I will definitely give it a try.

@ceii:
Given the dynamic nature of the JVM it seems impractical to be restricted to a single type class instance per type. And in many cases there is no true implementation that is more correct than others (take Show for example). As you say, having multiple incompatible instances can lead to problems, but I think this is mitigated somewhat by the fact that implicit arguments of a method or class constructors are automatically forwarded in calls to other method inside that method/class because they are in scope. The way objects and scoping interact in Scala is really nice and natural.

@Ciop:
No, I will test it. Thanks!

@clementd:
I agree to some extent. Functional programs require less debugging for sure, but there are imperative parts in all programs and there a debugger really helps. Other IDE features like code completion, navigation, find usages, help integration, refactoring etc. are just as important in a language like Scala or Haskell as they are in Java.

@MCAndre:
Thanks. Scala also has support for parallelizing operations over collections, it's even included in the standard library. And there are other projects for running Scala code on GPU's etc. See this presentation for example:

http://channel9.msdn.com/Events/Lang-NEXT/Lang-NEXT-2012/Pervasive-Parallelism-in-Scala

Jon Harrop said...

"However there is much room for improvement in both environments when it comes to eliminating heap allocations"

Don't forget .NET! Thanks to value types and reified generics I was able to write a complete working concurrent garbage collector in F# and benchmark its performance without any latency overhead from .NET's own GC. I think that is a real accomplishment and, in practice, it solves all of my problems. If only the JVM had those two features...

Jesper Nordenberg said...

@Jon:
Yes, the .NET platform has some features that would be nice to have in the JVM.

Ywen said...
This comment has been removed by the author.
Ywen said...

Could there be some input on the runtime systems comparison? I think this specific part is not as fair as it should be.

You qualify JVM's garbage collector as "state-of-the-art". Maybe, I don't know much about that but I've read that it was not very efficient with immutable object (because meant to be used with Java which relies most on big mutable objects), whereas GHC's runtime garbage collection has always been designed and has evolved with that specific objective in mind.

Still concerning the runtime: what about GHC's high-performance threading system, highly valuable in a parallel context (and of course in a concurrent one when used conjointly with STM)?
The JVM does not provide green threads. It used to in the past, but they've been replaced by system threads.
Haskell's runtime provides both, and handles them perfectly: very lightweight green threads (mere closures) automatically scheduled on available system threads.
In that matter, I've also read that Scala's a la Erlang actors system (even when replaced with Akka) doesn't compete with that (because of the design of the JVM), even if I agree it's greatly valuable.

Again, concerning tools, have you read what GHC's runtime provides wrt profiling and such? Because your post suggests it doesn't provide anything in that respect, which is wrong. (E.g. see Real World Haskell http://book.realworldhaskell.org/read/profiling-and-optimization.html for a good introduction to profiling)

Jesper Nordenberg said...

@Ywen:
You bring up some interesting points. I do think the Hotspot GCs handle small immutable objects quite well. Boxing of primitive values is common due to the way generics is (not) implemented in the JVM. The problem is that lack of support for value types, GHC has a definite advantage there.

I don't see much point in supporting green threads. What's important for parallelism on many cores is creating the correct number of system threads, distributing the work among them and applying some kind of work stealing. The JVM using the fork-join framework should be about as effective as GHC in this regard.

I didn't suggest that GHC lacks profiling and debugging support. I'm fully aware of that. What I meant is I'm not aware of Haskell equivalents of tools like VisualVM, Azul Zing, Terracotta, JRebel etc. The JVM tools works on fully optimized, unmodified, running production code. This is crucial for some types of long running applications like application servers. The debugger in GHC runs in interpreted code. I'd be happy if someone can point out similar tools for Haskell.

Ywen said...

@Jesper
I didn't know about the fork-join framework (it is new to Java 7 as I see), but as I see, it's less simple than merely calling forkIO with a closure.
But OK, if it's possible then it's less of an issue.

Concerning the runtime tools, that's maybe a very wild guess, but maybe Haskell being completely static needs less tools at runtime. And why is there a problem with using the debugger on interpreted code? It's way slower, yes, but when you're debugging you're not caring about peformances. Plus, bare in mind that the language itself makes a debugger less of a need (I did not say you could completely do without), while making it really more difficult to implement (the stronger the abstractions, the harder the use of a debugger).

phil said...

Scala has a few other benefits over Haskell.

1) It acts as glue between many different libraries on the JVM. There are so many libraries on the JVM written in Java that its worthwhile to be able to access all of those easily.
Most of the time in programming we are loading libraries and just glueing them together to get a job done.

2) Has OSGi, so you can load modules dynamically at runtime and manage modules. Has SBT for builds, allows easy adding of Scala artifacts and Java artifacts to builds.

3) It can do a lot of what other languages claim to do, that is, it can act as a super-set of a lot of other languages. Can also extend its language via DSL's, its reasonable to write a language as a DSL in Scala.

Personally, I'm always on the fence between Scala and Haskell as to which one to write in...as Haskell is so nice, its difficult to part with.

Dark Magus said...

Greetings. Your post was translated into Russian: https://docs.google.com/document/d/1cvUniy5P0fb06e_7GeYTFnyE-2c-BLA1pIdH5mA7QiY/edit

Jesper Nordenberg said...

Cool. My Russian is practically non-existent though so I can't read the document. :)

migmit said...

I think I'll add a few bits, if you don't mind.

> Subtyping

You claim that Haskell doesn't have subtyping. That's not really true. The type "forall a. a -> a" is certainly a subtype of "Int -> Int", meaning that whenever you encounter something of the second type, you can substitute anything of the first one, and it would typecheck (well, maybe you'd have to specify the type explicitly).

Another example, which is, I think, a bit closer to OOP subtyping, is this:

class B x where ...
class B x => D x where...
type Base = forall x y. B x => (x -> y) -> y
type Descendant = forall x y. D x => (x -> y) -> y

> Typeclasses vs Implicit Parameters

Actually, you can use different instances in Haskell (not Haskell 98, of course), it's just not easy. The code is somewhat lengthy, but the idea is to create a newtype, define an instance for it and then cast everything back and forth using GeneralizedNewtypeDeriving extension.

> In addition, for Java compatibility the dreaded null value can be used for any user defined type (although it's discouraged)

Well, in Haskell you can use undefined wherever you want, and you can even check for it (by forcing it and checking for exception).

> And I don't buy the argument that you don't need a debugger for Haskell programs

Well, it's true whether you believe it or not.

Jesper Nordenberg said...

@migmit:
Yes, you can emulate subtyping to some extent using the pattern you describe. However, when you add multiple super types and deep type hierarchies it becomes very cumbersome to use.

Bottom is indeed a troublesome construct in Haskell. I'm more inclined to have the compiler assume by default that all functions are total (even if it can't prove it) unless annotated as partial (>99% of all functions should be total in a normal program). One could include a more or less advanced totality checker in the compiler which can find cases where totality status is incorrect.

migmit said...
This comment has been removed by the author.
roger said...

I'm curious what problem domains you have been using Scala vs Haskell for. Ie, are you using the right tool for the right job?

roger said...
This comment has been removed by the author.
migmit said...

> I'm curious what problem domains you have been using Scala vs Haskell for. Ie, are you using the right tool for the right job?

Well, for example, one of our programs (in C++) runs a home-grown VM (also written in C++). Sometimes VM crashes, but the entire program doesn't (and we can't have it crashed as well, it's embedded in the system too deeply). So, we made it dump all the VM's memory if it detects a crash.

Now, writing an utility in Haskell to parse this dumps and present them as interlinked web pages was a joy.

Jesper Nordenberg said...

I don't really buy the "right tool for the right job" argument when it comes to programming languages. IMHO there's no reason why a well designed programming language can't be used for any programming job.

Terefang said...

there must be a reason for the grep/sed/awk/perl evolution branch ;-)

roger said...

"Well designed programming language" is a little vague. I could argue that Fortran is a well designed language for what it was meant for. ("that is especially suited to numeric computation and scientific computing.") It is however not well suited for parsing dumps and creating web pages.

Maybe I'm just not good enough of a programmer of Fortran to see the beauty and well suitedness for this problem.

I wrote a recursive program in COBOL once, that doesn't make it well suited for graph traversal.

Jesper Nordenberg said...

I mean I don't see a reason why it isn't possible to create a programming language suitable for any programming task.

geetha said...

Ecorptrainings.com provides JAVA SWING in hyderabad with best faculties on real time projects. We give the best online trainingamong the JAVA SWING in Hyderabad. Classroom Training in Hyderabad India

Don Dikaio said...

really, more india spam... Please remove the comment above.