First steps with Scala — let’s do some Euler problems!

So, I need to quickly get started in Scala, stat! Yes, I need to, although that doesn’t mean that I wouldn’t want to either, the need has just made me want it a little earlier than I had planned. Never mind the reasons why, but I thought this would make an interesting blog post.

A great way to torture oneself into understanding a language is to attempt a few Project Euler problems. So let’s get on with it!

Starting from scratch

I will not delve into explaining Scala in this blog post. All I will say is that I view it as a way to have ones cake and eat it — you get much of the expressivity of Haskell coupled with the JVM and its impressive library base. You can also cheat and loosen up the straitjackets Haskell needs you to wear, which is great but only after you have learned why and when to artificially force yourself back into them when using more permissive languages.

As it turns out, my laptop is pretty much clean of anything Scala-related (besides the JDK), so I need to start from scratch.

What do I need

I assume that Typesafe Activator is the easy choice, but since at the moment I am not worried about Play and Akka, I’m hoping I can download a little bit less that half a gigabyte by instead installing Scala and SBT manually.

Scala is pretty straight-forward to install on Ubuntu:

sudo apt-get install scala

Unsurprisingly (since I’m relying on Debian’s packaging system rather than downloading the software directly), this installs a slightly older version of Scala (2.9.2) than the current bleeding-edge version (2.11.7).

As with most other languages, in order to preserve my sanity I probably need a package manager, but the package manage does not come with the language itself, so I need to install SBT. SBT is not-so-straightforward, though, since the package does not live on the default repos which come pre-configured on Ubuntu:

echo "deb http://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
sudo apt-get update
sudo apt-get install sbt

Next, just for fun, we need to run sbt for the first time so that it can bootstrap itself:

sbt

This downloads a tonne of stuff from the primary maven repo (repo1.maven.org/maven2) as well as from Typesafe. So much for trying to minimise how much data I use on my expensive mobile internet…

My first Scala program!

Let’s solve a few Poject Euler problems.

I have a bit of experience with Project Euler, having solved a few dozen problems in Haskell a while back. One feature of Haskell I grew to love was its lazy-by-default semantics coupled with its lists; two features which, when used together, allow one to simply say something crazy like “Create all the prime numbers in the world, create all the pentagonal numbers, add 1 to all the pentagonal numbers, remove these from the list of primes, and then find the first number in the remaining list which is greater than 1000 000” in simple expressive code, and then wait a few seconds (or dozens of minutes) while your Haskell program figures out the answer.

I have written a few Haskell programs for Project Euler where I had to sit and think for a few minutes about how the hell this thing gave me the right answer as soon as I could get it to typecheck and compile.

I have a feeling things will be a lot different in Scala.

Euler 1

Apparently SBT does not have a subcommand for creating a new skeleton project. That’s not necessarily a bad thing though. So we have to create a quick scala source file in its own directory.

object Prob1 {
  def main (args : Array[String]) {
    println("Um, I'm not sure what the answer is...")
  }
}  

It might be prudential to also create an .sbt file, although it is strictly not necessary to run such a simple program.

lazy val root = (project in file(".")).
  settings(
      name := "Problem 1",
      version := "1.0"
  )                                                           

Nope. I have no idea what half of that means, either.

Now, let’s run our little project. Using sbt run seems to display a bunch of crud before and after running our code, so let’s see if we can run the project using Scala without the help of SBT:

$ scalac prob1.scala
error: error while loading CharSequence, class file '/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar(java/lang/CharSequence.class)' is broken
(bad constant pool tag 18 at byte 10) 
one error found

Oops. Remember what I said above about the fact that apt had installed an old version of Scala? Well, as it just so happens, the version of Scala I installed was too old for the version of SBT I installed. Okay, so I’m gonna have to burn some more moola by uninstalling SBT and Scala and installing Typesafe Activator instead.

Incidentally, this reminds me of how Haskell’s Cabal tool would helpfully suggest that you install the latest version every time it ran, and then when you did your computer would explode because the latest version of Cabal was only compatible with a not-yet-released version of GHC. It’s a completely unrelated issue, but it just reminds me of that tale from the trenches…

Euler 1, second attempt

I promise I’ll be writing some code soon!

Okay, after setting up activator lite, we need to create a new skeleton project (yay!):

$ activator new

I chose the minimal Scala option. The we change into the generated directory and run the generated project, which incidentally bootstraps Activator and downloads who-knows-how-many gigabytes of once-off dependencies:

$ activator run
[info] Loading project definition from /home/thejanet/dev/scala/euler/prob1/project
[info] Set current project to prob1 (in build file:/home/thejanet/dev/scala/euler/prob1/)
[info] Running com.example.Hello 
Hello, world!
[success] Total time: 0 s, completed 08 Jul 2015 2:55:44 PM

Let’s take a look at the shell application that was generated.

package com.example                                                    
object Hello {                                               
  def main(args: Array[String]): Unit = {                         
    println("Hello, world!") 
  }                                                               
} 

Looks okay, but it would have been nice if Activator had not assumed the package name. I have no idea what “: Unit =” means, in case you’re also wondering the same thing…

Hmm… So we run into another minor annoyance: neither Vim nor Emacs (Spacemacs, to be exact) seem to recognise Scala source and enable syntax highlighting by default. Luckily, I happen to have IntelliJ IDEA lying around, so I pulled out the Big Boy (because that’s what everyone uses) and installed the canonical Scala plugin. Then it gobbled up all my RAM and made the rest of the computer (including the command line) unresponsive. This evening is gonna be even more interesting than I thought…

Show me tha codez plz

Euler #1 is pretty basic, and does not require much in the way of fancy features, so I will first attempt to do it like a naïve Java developer getting acquainted with Scala syntax (which is exactly what I am). We simply need to loop from 1 to 1000, and keep an accumulator to which we add all multiples of 3 or 5.

After much faffing about (and IntelliJ warming up the laptop nicely while trying to help me out), I came up with this:

package com.example

object Hello {
  def main(args: Array[String]) {
    var result = 0

    for (x <- 1 to 999){
      if((x % 3 == 0) || (x % 5 == 0)) result += x
    }

    println(result)
  }
}

This produces the correct answer, but it doesn’t feel right. We learn new languages to learn new ways of thinking, not just new syntax, right? So let me try to do this The Functional Way™.

In a purely functional language like Haskell you are almost forced to solve this problem properly, since there are no mutable variables (so no accumulators) and no for-loops (at least, you need to learn a whole lot of the language before you get the tools necessary to create mutable variables and simulate for-loops). So we need to figure out a way to

create a list of all natural numbers → filter based on the condition → filter to the required range → sum

I’m not sure if we can create a list of all the natural numbers in Scala, so we will need to create a range instead, but the other steps should be doable for any language which dares call itself a functional language. In particular, the syntax “a to b” is Scala sugar for creating a range (which is ostensibly a Sequence, so I guess it is more like a lazy list in Haskell than a Java array).

Since filter is a member method of AbstractSeq, and Sequence inherits from AbstractSeq, we put our OOP hats on and start inserting full stops to chain actions on our initial list (with IntelliJ’s autocomplete and suggestions helping out):

package com.example

object Hello {
  def main(args: Array[String]) {
    println( (1 to 999).filter(x => (x % 3 == 0) || (x % 5 == 0)).sum )
  }
}

This is pretty much what one would expect, really. The only things that are new to me are the lambda syntax (=>), and the fact that I didn’t need to put round brackets on the sum method call since they are not mandatory if the method does not have any parameters.

Okay, I lied. Putting round brackets at the end of sum causes Scala to complain.

$ activator run
[info] Loading project definition from /home/thejanet/dev/scala/euler/prob1/project
[info] Set current project to prob1 (in build file:/home/thejanet/dev/scala/euler/prob1/)
[info] Compiling 1 Scala source to /home/thejanet/dev/scala/euler/prob1/target/scala-2.11/classes...
[error] /home/thejanet/dev/scala/euler/prob1/src/main/scala/com/example/Hello.scala:5: not enough arguments for method sum: (implicit num: Numeric[B])B.
[error] Unspecified value parameter num.
[error]     println((1 to 999).filter(x => (x % 3 == 0) || (x % 5 == 0)).sum())
[error]                                                                     ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 3 s, completed 08 Jul 2015 4:04:16 PM

Unspecified value parameter num“?!? Implicit parameters?!? Spooky action at a distance?!? I’d rather not think about that right now.

For completeness sake, here is my solution to the problem in Haskell:

import Data.List

mult35 = union (takeWhile (< 1000) $ iterate (+3) 0)
 (takeWhile (< 1000) $ iterate (+5) 0)

answer = sum . takeWhile (<= 1000) $ mult35 
main = do 
    print answer

Note that this is hardly a canonical answer. Instead of creating a list of natural numbers and then filtering them out, I am actually creating two lists — all multiples of 5 and all multiples of 3 — and then merging the two and truncating. I’m not sure why, but I guess that two years ago I thought it would be a good idea to write it that way (I’m sure the cost of merging two lists while eliminating duplicates — a quadratic time complexity operation — is a lot higher than linearly filtering a list once)?

Here’s a more canonical answer in Haskell:

answer = sum
   . filter (\x -> (x `mod` 5 == 0) || (x `mod` 3 == 0))
   $ [1..999]

main =
    print answer

Note how the code looks almost exactly like the Scala version, except that the operation order is reversed (Haskell uses function composition, which goes from right to left, while OOP-style method invocation goes from left to right). Scala seems like a nice compromise between nouns and verbs.

Euler 2

Euler problem 2 is just interesting enough as to not be trivially doable in a functional programming language (most of Project Euler is not trivially doable in any language — problem 1 was the bait, the rest of the site is the switch). The easy way to do it in Haskell is to generate a list of all the Fibonacci numbers, and then filter on size and parity, and finally sum.

The tricky bit is generating the list of Fibs. In Haskell lists can be coinductive; that is, the definition for the list ξ can refer to ξ itself on the right hand side of the definition, and this is due to fact that lists are lazy by default in Haskell. I still find this behaviour magical to this day.

I’ll have to assume for the moment that Scala can’t do coinductive lists, at least not trivially, based on zero evidence whatsoever.

So, how do you do the Fibonacci numbers when you can’t rely on Haskell magic? Well, the most obvious solution is to just use the standard definition of the Fibonacci numbers, define a function based on it, then just iterate through the function values (or map the function over the list of natural numbers, taking advantage of the fact that a function mapped over the naturals is isomorphic to a natural-indexed array/list). The problem here is that calculating the Fibonacci function for an arbitrary natural number is expensive.

So, let’s over-engineer this a little bit, before even trying to see whether the running time of a naive implementation would be acceptable. Let’s see if we can do memoisation in Scala and take advantage of how easy it is to mutate data (as compared to Haskell).

Well, how does one do memoisation in Scala? A quick Google gave me this: http://stackoverflow.com/questions/16257378/is-there-a-generic-way-to-memoize-in-scala Okay, so maybe lets try to naive way first, because I have no idea what Memo[A,B](f: A => B) extends (A => B) means at this moment 😕.

Using this page as a guide on how to do switch-like pattern matching in Scala, I come up with the following:

package com.example

object Hello {
  def main(args: Array[String]) {
    println(
      Stream.from(1)
        .map(fib)
        .takeWhile(_ <= 4000000)
        .filter(_ % 2 == 0)
        .sum
    )
  }

  def fib(i : Int): Int = i match {
    case 0 => 1
    case 1 => 2
    case _ => fib (i - 1) + fib (i - 2)
  }
}

Here are a few interesting things to note about this code:

  • I used Stream.from to create a lazy infinite list. So Scala does have them after all! In fact, while researching Stream I came upon the following little gem: http://www.scala-lang.org/api/2.11.5/index.html#scala.collection.immutable.Stream Not only is coinduction so possible in Scala (and easy to express), Scala Stream also comes with memoisation built-in! Damn!
  • I’m using anonymous lambda parameters (f(_) instead of x => f(x)). Now we’re cooking with fire!
  • This is not as slow as I thought it would be. It only takes 6 seconds on this laptop (with IDEA the juggernaut slowly eating what little RAM and CPU I still have left). The equivalent Haskell code (using coinduction) takes 4 seconds.

So, luckily this problem was just simple enough that we didn’t need memoisation or BigInt to do it. The rest of the Euler problems dealing with Fibonacci numbers are not so easy.

For completeness, here is the Haskell code I wrote to solve this problem 2 years ago:

fib :: Num a => a -> a -> [a]
fib n1 n2 = n1:n2:zipWith (+) (fib n1 n2) (tail (fib n1 n2))

answer = sum . filter even . takeWhile (<= 4000000) $ fib 1 2
 
main = do
    print answer

My neophyte coding choices are even more bizarre in this code than they were in my original Haskell solution to problem 1. To compare apples to apples, let’s rewrite this to match the Scala code above:

fib 1 = 1
fib 2 = 2
fib n = fib (n - 1) + fib (n - 2)

answer = sum 
    . filter even 
    . takeWhile (<= 4000000) 
    . map fib 
    $ [1..]

main =
    print answer

Apart from the fact that Haskell has a built-in even function (and the bit about the operations being reversed), the semantics and structure of this code resemble those of the Scala code exactly, modulo syntax. This code also runs 10 times faster than the first incarnation, at 0.4s; machine code ftw.

In conclusion

Scala, at least the basics of it, is not that scary. I must say, of all the popular JVM-based languages out there, so far I seem to like it more than the rest (Clojure has those LISP brackets everywhere; and you could probably write a Hebrew poem in a text file and Groovy would still compile it without complaining, only to explode at run-time).

Stuff I have learned includes:

  • Typesafe Activator is simple and straight-forward. Almost like Haskell’s Stack, although the latter has the annoying “feature” that it tends to ignore dependency files used by Haskell’s default dependency manager. And like Stack it pulls literally terrabytes of data to bootstrap itself when you have an absolutely clean computer.
  • Scala has nice syntactic sugar to create ranges (reminds me of Python), and also has a very powerful API for lazy streams. I think I’m falling in love.
  • Scala’s TraversableLike — which is the ancestor of Scala’s list-like data structures — supports the full gamut of functional higher-order functions. This by default makes great classes of algorithms infinitely more expressible. Why should “Do this to every member of the list” have to first be translated to “Create a method, create a new list, and write a for-loop”?
  • Coupled with the point above, Haskell’s elegance and expressivity seems to translate almost mechanically to Scala. This is a Good Thing®.
  • IntelliJ IDEA will make a perfectly good computer feel like a relic from the noughties.
Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s