First steps in Scala for beginning programmers, Part 4

Topics: iteration, for expressions, yield, map, filter, count

Preface

This is part 4 of tutorials for first-time programmers getting into Scala. Other posts are on this blog, and you can get links to those and other resources on the links page of the Computational Linguistics course I’m creating these for.

This tutorial departs from the very beginner nature of the previous three, so this may be of more interest to readers who already have some programming experience in another language. (Though also, see the section on using matching in Scala in Part 3.)

Iteration, the Scala way(s)

Up to now, we have (mostly) accessed individual items on a list by using their indices. But one of the most natural things to do with a list is to repeat some action for each item on the list, for example: “For each word in the given list of words: print it”. Here is how to say this in Scala.

[sourcecode language=”scala”]
scala> val animals = List("newt", "armadillo", "cat", "guppy")
animals: List[java.lang.String] = List(newt, armadillo, cat, guppy)

scala> animals.foreach(println)
newt
armadillo
cat
guppy
[/sourcecode]

This says to take each element of the list (indicated by foreach) and apply a function (in this case, println) to it, in order. There is some underspecification going on in that we aren’t providing a variable to name elements. This works in some cases, such as above, but won’t always be possible. Here’s is how it looks in full, with a variable naming the element.

[sourcecode language=”scala”]
scala> animals.foreach(animal => println(animal))
newt
armadillo
cat
guppy
[/sourcecode]

This is useful when you need to do a bit more, such as concatenating a String element with another String.

[sourcecode language=”scala”]
scala> animals.foreach(animal => println("She turned me into a " + animal))
She turned me into a newt
She turned me into a armadillo
She turned me into a cat
She turned me into a guppy
[/sourcecode]

Or, if you are performing a computation with it, like outputing the length of each element in a list of strings.

[sourcecode language=”scala”]
scala> animals.foreach(animal => println(animal.length))
4
9
3
5
[/sourcecode]

We can obtain the same result as foreach using a for expression.

[sourcecode language=”scala”]
scala> for (animal <- animals) println(animal.length)
4
9
3
5
[/sourcecode]

With what we have been doing so far, these two ways of expressing the pattern of iterating over the elements of a List are equivalent. However, they are different: a for expression returns a value, whereas foreach simply performs some function on every element of the list. This latter kind of use is  termed a side-effect: by printing out each element, we are not creating new values, we are just performing an action on each element. With for expressions, we can yield values that create transformed Lists. For example, contrast using println with the following.

[sourcecode language=”scala”]
scala> val lengths = for (animal <- animals) yield animal.length
lengths: List[Int] = List(4, 9, 3, 5)
[/sourcecode]

The result is a new list that contains the lengths (number of characters) of each of the elements of the animals list. (You can of course print its contents now by doing lengths.foreach(println), but typically we want to do other, usually more interesting, things with such values.)

What we just did was map the values of animals into a new set of values in a one-to-one manner, using the function length. Lists have another function called map that does this directly.

[sourcecode language=”scala”]
scala> val lengthsMapped = animals.map(animal => animal.length)
lengthsMapped: List[Int] = List(4, 9, 3, 5)
[/sourcecode]

So, the for-yield expression and the map method achieve the same output, and in many cases they are pretty much equivalent. Using map, however, is often more convenient because you can easily chain a series of operations together. For example, let’s say you want to add 1 to a List of numbers and then get the square of that, so turning List(1,2,3) into List(2,3,4) into List(4,9,16). You can do that quite easily using map.

[sourcecode language=”scala”]
nums.map(x=>x+1).map(x=>x*x)
[/sourcecode]

Some readers will be puzzled by what was just done. Here it is more explicitly, using an intermediate variable nums2 to store the add-one list.

[sourcecode language=”scala”]
scala> val nums2 = nums.map(x=>x+1)
nums2: List[Int] = List(2, 3, 4)

scala> nums2.map(x=>x*x)
res9: List[Int] = List(4, 9, 16)
[/sourcecode]

Since nums.map(x=>x+1) returns a List, we don’t have to name it to a variable to use it — we can just immediately use it, including doing another map function on it. (Of course, one could do this computation in a single go, e.g. map((x+1)*(x+1)), but often one is using a series of built-in functions, or functions one has predefined already).

You can keep on mapping to your heart’s content, including mapping from Ints to Strings.

[sourcecode language=”scala”]
scala> nums.map(x=>x+1).map(x=>x*x).map(x=>x-1).map(x=>x*(-1)).map(x=>"The answer is: " + x)
res12: List[java.lang.String] = List(The answer is: -3, The answer is: -8, The answer is: -15)
[/sourcecode]

Note: the use of x in all these cases is not important. They could have been named x, y, z and turlingdromes42 — any valid variable name.

Iterating through multiple lists

Sometimes you have two lists that are paired up and you need to do something to elements from each list simultaneously. For example, let’s say you have a list of word tokens and another list with their parts-of-speech. (See the previous tutorial for discussion of parts-of-speech.)

[sourcecode language=”scala”]
scala> val tokens = List("the", "program", "halted")
tokens: List[java.lang.String] = List(the, program, halted)

scala> val tags = List("DT","NN","VB")
tags: List[java.lang.String] = List(DT, NN, VB)
[/sourcecode]

Now, let’s say we want to output these as the following string:

the/DT program/NN halted/VB

Initially, we’ll do it a step at a time, and then show how it can be done all in one line.

First, we use the zip function to bring two lists together and get a new list of pairs of elements from each list.

[sourcecode language=”scala”]
scala> val tokenTagPairs = tokens.zip(tags)
tokenTagPairs: List[(java.lang.String, java.lang.String)] = List((the,DT), (program,NN), (halted,VB))
[/sourcecode]

Zipping two lists together in this way is a common pattern used for iterating over two lists.

Now we have a list of token-tag pairs we can use a for expression to turn it into a List of strings.

[sourcecode language=”scala”]
scala> val tokenTagSlashStrings = for ((token, tag) <- tokenTagPairs) yield token + "/" + tag
tokenTagSlashStrings: List[java.lang.String] = List(the/DT, program/NN, halted/VB)
[/sourcecode]

Now we just need to turn that list of strings into a single string by concatenating all its elements with a space between each. The function mkString makes this easy.

[sourcecode language=”scala”]
scala> tokenTagSlashStrings.mkString(" ")
res19: String = the/DT program/NN halted/VB
[/sourcecode]

Finally, here it all is in one step.

[sourcecode language=”scala”]
scala> (for ((token, tag) <- tokens.zip(tags)) yield token + "/" + tag).mkString(" ")
res23: String = the/DT program/NN halted/VB
[/sourcecode]

Ripping a string into a useful data structure

It is common in computational linguistics to need convert string inputs into useful data structures. Consider the part-of-speech tagged sentence mentioned in the previous tutorial. Let’s begin by assigning it to the variable sentRaw.

[sourcecode language=”scala”]

val sentRaw = "The/DT index/NN of/IN the/DT 100/CD largest/JJS Nasdaq/NNP financial/JJ stocks/NNS rose/VBD modestly/RB as/IN well/RB ./."

[/sourcecode]

Now, let’s turn it into a List of Tuples, where each Tuple has the word as its first element and the postag as its second. We begin with the single line that does this so that you can see what the desired result is, and then we’ll examine each step in detail.

[sourcecode language=”scala”]
scala> val tokenTagPairs = sentRaw.split(" ").toList.map(x => x.split("/")).map(x => Tuple2(x(0), x(1)))
tokenTagPairs: List[(java.lang.String, java.lang.String)] = List((The,DT), (index,NN), (of,IN), (the,DT), (100,CD), (largest,JJS), (Nasdaq,NNP), (financial,JJ), (stocks,NNS), (rose,VBD), (modestly,RB), (as,IN), (well,RB), (.,.))
[/sourcecode]

Let’s take each of these in turn. The first split cuts sentRaw at each space character, and returns an Array of Strings, where each element is the material between the spaces.

[sourcecode language=”scala”]
scala> sentRaw.split(" ")
res0: Array[java.lang.String] = Array(The/DT, index/NN, of/IN, the/DT, 100/CD, largest/JJS, Nasdaq/NNP, financial/JJ, stocks/NNS, rose/VBD, modestly/RB, as/IN, well/RB, ./.)
[/sourcecode]

What’s an Array? It’s a kind of sequence, like List, but it has some different properties that we’ll discuss later. For now, let’s stick with Lists, which we can do by using the toList method. Additionally, let’s assign it to a variable so that the remaining operations are easier to focus on.

[sourcecode language=”scala”]
scala> val tokenTagSlashStrings = sentRaw.split(" ").toList
tokenTagSlashStrings: List[java.lang.String] = List(The/DT, index/NN, of/IN, the/DT, 100/CD, largest/JJS, Nasdaq/NNP, financial/JJ, stocks/NNS, rose/VBD, modestly/RB, as/IN, well/RB, ./.)
[/sourcecode]

Now, we need to turn each of the elements in that list into pairs of token and tag. Let’s first consider a single element, turning something like “The/DT” into the pair (“The”,”DT”). The next lines show how to do this one step at a time, using intermediate variables.

[sourcecode language=”scala”]
scala> val first = "The/DT"
first: java.lang.String = The/DT

scala> val firstSplit = first.split("/")
firstSplit: Array[java.lang.String] = Array(The, DT)

scala> val firstPair = Tuple2(firstSplit(0), firstSplit(1))
firstPair: (java.lang.String, java.lang.String) = (The,DT)
[/sourcecode]

So, firstPair is a tuple representing the information encoded in the string first. This involved two operations, splitting and then creating a tuple from the Array that resulted from the split. We can do this for all of the elements in tokenTagSlashStrings using map. Let’s first convert the Strings into Arrays.

[sourcecode language=”scala”]
scala> val tokenTagArrays = tokenTagSlashStrings.map(x => x.split("/"))
res0: List[Array[java.lang.String]] = List(Array(The, DT), Array(index, NN), Array(of, IN), Array(the, DT), Array(100, CD), Array(largest, JJS), Array(Nasdaq, NNP), Array(financial, JJ), Array(stocks, NNS), Array(rose, VBD), Array(modestly, RB), Array(as, IN), Array(well, RB), Array(., .))
[/sourcecode]

And finally, we turn the Arrays into Tuple2s and get the result we obtained with the one-liner earlier.

[sourcecode language=”scala”]
scala> val tokenTagPairs = tokenTagArrays.map(x => Tuple2(x(0), x(1)))
tokenTagPairs: List[(java.lang.String, java.lang.String)] = List((The,DT), (index,NN), (of,IN), (the,DT), (100,CD), (largest,JJS), (Nasdaq,NNP), (financial,JJ), (stocks,NNS), (rose,VBD), (modestly,RB), (as,IN), (well,RB), (.,.))
[/sourcecode]

Note: if you are comfortable with using one-liners that chain a bunch of operations together, then by all means use them. However, there is no shame in using several lines involving a bunch of intermediate variables if that helps you break apart the task and get the result you need.

One of the very useful things of having a List of pairs (Tuple2s) is that the unzip function gives us back two Lists, one with all of the first elements and another with all of the second elements.

[sourcecode language=”scala”]
scala> val (tokens, tags) = tokenTagPairs.unzip
tokens: List[java.lang.String] = List(The, index, of, the, 100, largest, Nasdaq, financial, stocks, rose, modestly, as, well, .)
tags: List[java.lang.String] = List(DT, NN, IN, DT, CD, JJS, NNP, JJ, NNS, VBD, RB, IN, RB, .)
[/sourcecode]

With this, we’ve come full circle. Having started with a raw string (such as we are likely to read in from a text file), we now have Lists that allow us to do useful computations, such as converting those tags into another form.

Providing a function you have defined to map

Let’s return to the postag simplification exercise we did in the previous tutorial. We’ll modify it a bit: rather than shortening the Penn Treebank parts-of-speech, let’s convert them to course parts-of-speech using the English words that most people are familiar with, like noun and verb. The following function turns Penn Treebank tags into these course tags, for more tags than we covered in the last tutorial (note: this is still incomplete, but serves to illustrate the point).

[sourcecode language=”scala”]
def coursePos (tag: String) = tag match {
case "NN" | "NNS" | "NNP" | "NNPS"                       => "Noun"
case "JJ" | "JJR" | "JJS"                                => "Adjective"
case "VB" | "VBD" | "VBG" | "VBN" | "VBP" | "VBZ" | "MD" => "Verb"
case "RB" | "RBR" | "RBS" | "WRB" | "EX"                 => "Adverb"
case "PRP" | "PRP$" | "WP" | "WP$"                       => "Pronoun"
case "DT" | "PDT" | "WDT"                                => "Article"
case "CC"                                                => "Conjunction"
case "IN" | "TO"                                         => "Preposition"
case _                                                   => "Other"
}
[/sourcecode]

We can now map this function over the parts of speech in the collection obtained previously.

[sourcecode language=”scala”]
scala> tags.map(coursePos)
res1: List[java.lang.String] = List(Article, Noun, Preposition, Article, Other, Adjective, Noun, Adjective, Noun, Verb, Adverb, Preposition, Adverb, Other)
[/sourcecode]

Voila! If we want to convert the tags in this manner and then output them as a string like what we started with, it’s just a few steps. We’ll start from the beginning and recap. Try running the following for yourself.

[sourcecode language=”scala”]
val sentRaw = "The/DT index/NN of/IN the/DT 100/CD largest/JJS Nasdaq/NNP financial/JJ stocks/NNS rose/VBD modestly/RB as/IN well/RB ./."

val (tokens, tags) = sentRaw.split(" ").toList.map(x => x.split("/")).map(x => Tuple2(x(0), x(1))).unzip

tokens.zip(tags.map(coursePos)).map(x => x._1+"/"+x._2).mkString(" ")
[/sourcecode]

A further point is that when you provide expressions like (x => x+1) to map, you are actually defining an anonymous function! Here is the same map operation with different levels of specification

[sourcecode language=”scala”]

scala> val numbers = (1 to 5).toList
numbers: List[Int] = List(1, 2, 3, 4, 5)

scala> numbers.map(1+)
res11: List[Int] = List(2, 3, 4, 5, 6)

scala> numbers.map(_+1)
res12: List[Int] = List(2, 3, 4, 5, 6)

scala> numbers.map(x=>x+1)
res13: List[Int] = List(2, 3, 4, 5, 6)

scala> numbers.map((x: Int) => x+1)
res14: List[Int] = List(2, 3, 4, 5, 6)
[/sourcecode]

So, it’s all consistent: whether you pass in a named function or an anonymous function, map will apply it to each element in the list.

Finally, note that you can use that final form to define a function.

[sourcecode language=”scala”]

scala> def addOne = (x: Int) => x + 1
addOne: (Int) => Int

scala> addOne(1)
res15: Int = 2
[/sourcecode]

This is similar to defining functions as we had previously (e.g. def addOne (x: Int) = x+1), but it is more convenient in certain contexts, which we’ll get to later. For now, the thing to realize is that whenever you map, you are either using a function that already existed or creating one on the fly.

Filtering and counting

The map method is a convenient way of performing computations on each element of a List, effectively transforming a List from one set of values to a new List with a set of values computed from each corresponding element. There are yet more methods that have other actions, such as removing elements from a List (filter), counting the number of elements satisfying a given predicate (count), and computing an aggregate single result from all elements in a List (reduce and fold). Let’s consider a simple task: count how many tokens are not a noun or adjective in a tagged sentence. As a starting point, let’s take the list of mapped postags from before.

[sourcecode language=”scala”]
scala> val courseTags = tags.map(coursePos)
courseTags: List[java.lang.String] = List(Article, Noun, Preposition, Article, Other, Adjective, Noun, Adjective, Noun, Verb, Adverb, Preposition, Adverb, Other)
[/sourcecode]

One way of doing this is to filter out all of the nouns and adjectives to obtain a list without them and then get its length.

[sourcecode language=”scala”]
scala> val noNouns = courseTags.filter(x => x != "Noun")noNouns: List[java.lang.String] = List(Article, Preposition, Article, Other, Adjective, Adjective, Verb, Adverb, Preposition, Adverb, Other)

scala> val noNounsOrAdjectives = noNouns.filter(x => x != "Adjective")
noNounsOrAdjectives: List[java.lang.String] = List(Article, Preposition, Article, Other, Verb, Adverb, Preposition, Adverb, Other)

scala> noNounsOrAdjectives.length
res8: Int = 9
[/sourcecode]

However, because filter just takes a Boolean value, we can of course use Boolean conjunction and disjunction to simplify things. And, we don’t need to save intermediate variables. Here’s the one liner.

[sourcecode language=”scala”]
scala> courseTags.filter(x => x != "Noun" && x != "Adjective").length
res9: Int = 9
[/sourcecode]

If all we want is the number of elements, we can instead just use count with the same predicate.

[sourcecode language=”scala”]
scala> courseTags.count(x => x != "Noun" && x != "Adjective")
res10: Int = 9
[/sourcecode]

As an exercise, try doing a one-liner that starts with sentRaw and provides the value “resX: Int = 9” (where X is whatever you get in your Scala REPL).

In the next tutorial, we’ll see how to use reduce and fold to compute aggregate results from a List.

Copyright 2011 Jason Baldridge

The text of this tutorial is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. Attribution may be provided by linking to www.jasonbaldridge.com and to this original tutorial.

Suggestions, improvements, extensions and bug fixes welcome — please email Jason at jasonbaldridge@gmail.com or provide a comment to this post.

First steps in Scala for beginning programmers, Part 3

Topics: conditional execution with if-else blocks and matching

Preface

This is part 3 of tutorials for first-time programmers getting into Scala. Other posts are on this blog, and you can get links to those and other resources on the links page of the Computational Linguistics course I’m creating these for.

Conditionals

Variables come and variables go, and they take on different values depending on the input. We typically need to enact different behaviors conditioned on those values. For example, let’s simulate a bar tender in Austin who must make sure that he doesn’t give alcohol to individuals under 21 years of age.

[sourcecode language=”scala”]
scala> def serveBeer (customerAge: Int) = if (customerAge >= 21) println("beer") else println("water")
serveBeer: (customerAge: Int)Unit

scala> serveBeer(23)
beer

scala> serveBeer(19)
water
[/sourcecode]

What we’ve done here is a standard use of conditionals to produce one action or another — in this case just printing one message or another. The expression in the if (…) is a Boolean value, either true or false. You can see this by just doing the inequality directly:

[sourcecode language=”scala”]
scala> 19 >= 21
res7: Boolean = false
[/sourcecode]

And these expressions can be combined according to the standard rules for conjunction and disjunction of Booleans. Conjunction is indicated with && and disjunction with ||.

[sourcecode language=”scala”]
scala> 19 >= 21 || 5 > 2
res8: Boolean = true

scala> 19 >= 21 && 5 > 2
res9: Boolean = false
[/sourcecode]

To check equality, use ==.

[sourcecode language=”scala”]
scala> 42 == 42
res10: Boolean = true

scala> "the" == "the"
res11: Boolean = true

scala> 3.14 == 6.28
res12: Boolean = false

scala> 2*3.14 == 6.28
res13: Boolean = true

scala> "there" == "the" + "re"
res14: Boolean = true
[/sourcecode]

The equality operator == is different from the assignment operator =, and you’ll get an error if you attempt to use = for equality tests.

[sourcecode language=”scala”]
scala> 5 = 5
<console>:1: error: ‘;’ expected but ‘=’ found.
5 = 5
^

scala> x = 5
<console>:10: error: not found: value x
val synthvar$0 = x
^
<console>:7: error: not found: value x
x = 5
^
[/sourcecode]

The first example is completely bad because we cannot hope to assign a value to a constant like 5. With the latter example, the error complains about not finding a value x. That’s because it is a valid construct, assuming that a var variable x has been previously defined.

[sourcecode language=”scala”]
scala> var x = 0
x: Int = 0

scala> x = 5
x: Int = 5
[/sourcecode]

Recall that with var variables, it is possible to assign them a new value. However, it is actually not necessary to use vars much of the time, and there are many advantages with sticking with vals. I’ll be helping you think in these terms as we go along. For now, try to ignore the fact that vars exist in the language!

Back to conditionals. First, here are more comparison operators:

x == y   (x is equal to y)
x != y    (x does not equal y)
x > y     (x is larger than y)
x < y     (x is less than y)
x >= y   (x is equal to y, or larger than y)
x <= y   (x is equal to y, or less than y)

These operators work on any type that has a natural ordering, including Strings.

[sourcecode language=”scala”]
scala> "armadillo" < "bear"
res25: Boolean = true

scala> "armadillo" < "Bear"
res26: Boolean = false

scala> "Armadillo" < "Bear"
res27: Boolean = true
[/sourcecode]

Clearly, this isn’t the usual alphabetic ordering you are used to. Instead it is based on ASCII character encodings.

A very beautiful and useful thing about conditionals in Scala is that they return a value. So, the following is a valid way to set the values of the variables x and y.

[sourcecode language=”scala”]
scala> val x = if (true) 1 else 0
x: Int = 1

scala> val y = if (false) 1 else 0
y: Int = 0
[/sourcecode]

Not so impressive here, but let’s return to the bartender, and rather than the serveBeer function printing a String, we can have it return a String representing a beverage, “beer” in the case of a 21+ year old and “water” otherwise.

[sourcecode language=”scala”]
scala> def serveBeer (customerAge: Int) = if (customerAge >= 21) "beer" else "water"
serveBeer: (customerAge: Int)java.lang.String

scala> serveBeer(42)
res21: java.lang.String = beer

scala> serveBeer(20)
res22: java.lang.String = water
[/sourcecode]

Notice how the first serveBeer function returned Unit but this one returns a String. Unit means that no value is returned — in general this is to be discouraged for reasons we won’t get into here. Regardless of that, the general pattern of conditional assignment shown above is something you’ll be using a lot.

Conditionals can also have more than just the single if and else.  For example, let’s say that the bartender simply serves age appropriate drinks to each customer, and that 21+ get beer, teenagers get soda and little kids should get juice.

[sourcecode language=”scala”]
scala> def serveDrink (customerAge: Int) = {
|     if (customerAge >= 21) "beer"
|     else if (customerAge >= 13) "soda"
|     else "juice"
| }
serveDrink: (customerAge: Int)java.lang.String

scala> serveDrink(42)
res35: java.lang.String = beer

scala> serveDrink(16)
res36: java.lang.String = soda

scala> serveDrink(6)
res37: java.lang.String = juice
[/sourcecode]

And of course, the Boolean expressions in any of the ifs or else ifs can be complex conjunctions and disjunctions of smaller expressions. Let’s consider a computational linguistics oriented example now that can take advantage of that, and which we will continue to build on in later tutorials.

Everybody (hopefully) knows what a part-of-speech is. (If not, go check out Grammar Rock on YouTube.) In computational linguistics, we tend to use very detailed tagsets that go far beyond “noun”, “verb”, “adjective” and so on. For example, the tagset from the Penn Treebank uses NN for singular nouns (table), NNS for plural nouns (tables), NNP for singular proper noun (John), and NNPS for plural proper noun (Vikings).

Here’s an annotated sentence with postags from the first sentence of the Wall Street Journal portion of the Penn Treebank, in the format word/postag.

The/DT index/NN of/IN the/DT 100/CD largest/JJS Nasdaq/NNP financial/JJ stocks/NNS rose/VBD modestly/RB as/IN well/RB ./.

We’ll see how to process these en masse shortly, but for now, let’s build a function that turns single tags like “NNP” into “NN” and “JJS” into “JJ”, using conditionals. We’ll let all the other postags stay as they are.

We’ll start with a suboptimal solution, and then refine it. The first thing you might try is to create a case for every full form tag and output its corresponding shortened tag.

[sourcecode language=”scala”]
scala> def shortenPos (tag: String) = {
|     if (tag == "NN") "NN"
|     else if (tag == "NNS") "NN"
|     else if (tag == "NNP") "NN"
|     else if (tag == "NNPS") "NN"
|     else if (tag == "JJ") "JJ"
|     else if (tag == "JJR") "JJ"
|     else if (tag == "JJS") "JJ"
|     else tag
| }
shortenPos: (tag: String)java.lang.String

scala> shortenPos("NNP")
res47: java.lang.String = NN

scala> shortenPos("JJS")
res48: java.lang.String = JJ
[/sourcecode]

So, it’s doing the job, but there is a lot of redundancy — in particular, the return value is the same for many cases. We can use disjunctions to deal with this.

[sourcecode language=”scala”]
def shortenPos2 (tag: String) = {
if (tag == "NN" || tag == "NNS" || tag == "NNP" || tag == "NNP") "NN"
else if (tag == "JJ" || tag == "JJR" || tag == "JJS") "JJ"
else tag
}
[/sourcecode]

These are logically equivalent.

There is an easier way of doing this, using properties of Strings. Here, the startsWith method is very useful.

[sourcecode language=”scala”]
scala> "NNP".startsWith("NN")
res51: Boolean = true

scala> "NNP".startsWith("VB")
res52: Boolean = false
[/sourcecode]

We can use this to simplify the postag shortening function.

[sourcecode language=”scala”]
def shortenPos3 (tag: String) = {
if (tag.startsWith("NN")) "NN"
else if (tag.startsWith("JJ")) "JJ"
else tag
}
[/sourcecode]

This makes it very easy to add an additional condition that collapses all of the verb tags to “VB”. (Left as an exercise.)

A final note of conditional assignments: they can return anything you like, so, for example, the following are all valid. For example, here is a (very) simple (and very imperfect) English stemmer that returns the stem and and suffix.

[sourcecode language=”scala”]
scala> def splitWord (word: String) = {
|     if (word.endsWith("ing")) (word.slice(0,word.length-3), "ing")
|     else if (word.endsWith("ed")) (word.slice(0,word.length-2), "ed")
|     else if (word.endsWith("er")) (word.slice(0,word.length-2), "er")
|     else if (word.endsWith("s")) (word.slice(0,word.length-1), "s")
|     else (word,"")
| }
splitWord: (word: String)(String, java.lang.String)

scala> splitWord("walked")
res10: (String, java.lang.String) = (walk,ed)

scala> splitWord("walking")
res11: (String, java.lang.String) = (walk,ing)

scala> splitWord("booking")
res12: (String, java.lang.String) = (book,ing)

scala> splitWord("baking")
res13: (String, java.lang.String) = (bak,ing)
[/sourcecode]

If we wanted to work with the stem and suffix directly with variables, we can assign them straight away.

[sourcecode language=”scala”]
scala> val (stem, suffix) = splitWord("walked")
stem: String = walk
suffix: java.lang.String = ed
[/sourcecode]

Matching

Scala provides another very powerful way to encode conditional execution called matching. They have much in common with if-else blocks, but come with some nice extra features. We’ll go back to the postag shortener, starting with a full list out of the tags and what to do in each case, like our first attempt with if-else.

[sourcecode language=”scala”]
def shortenPosMatch (tag: String) = tag match {
case "NN" => "NN"
case "NNS" => "NN"
case "NNP" => "NN"
case "NNPS" => "NN"
case "JJ" => "JJ"
case "JJR" => "JJ"
case "JJS" => "JJ"
case _ => tag
}

scala> shortenPosMatch("JJR")
res14: java.lang.String = JJ
[/sourcecode]

Note that the last case, with the underscore “_” is the default action to take, similar to the “else” at the end of an if-else block.

Compare this to the if-else function shortenPos from before, which had lots of repetition in its definition of the form “else if (tag == “. Match statements allow you to do the same thing, but much more concisely and arguably, much more clearly. Of course, we can shorten this up.

[sourcecode language=”scala”]
def shortenPosMatch2 (tag: String) = tag match {
case "NN" | "NNS" | "NNP" | "NNPS" => "NN"
case "JJ" | "JJR" | "JJS" => "JJ"
case _ => tag
}
[/sourcecode]

Which is quite a bit more readable than the if-else shortenPosMatch2 defined earlier.

In addition to readability, match statements provide some logical protection. For example, if you accidentally have two cases that overlap, you’ll get an error.

[sourcecode language=”scala”]

scala> def shortenPosMatchOops (tag: String) = tag match {
|   case "NN" | "NNS" | "NNP" | "NNPS" => "NN"
|   case "JJ" | "JJR" | "JJS" => "JJ"
|   case "NN" => "oops"
|   case _ => tag
| }
<console>:10: error: unreachable code
case "NN" => "oops"
[/sourcecode]

This is an obvious example, but with more complex match options, it can save you from bugs!

We cannot use the startsWith method the same way we did with the if-else shortenPosMatch3. However, we can use regular expressions very nicely with match statements, which we’ll get to in a later tutorial.

Where match statements really shine is that they can match on much more than just the value of simple variables like Strings and Ints.  One use of matches is to check the types of the input to a function that can take a supertype of many types. Recall that Any is the supertype of all types; if we have the following function that takes an argument with any type, we can use matching to inspect what the type of the argument is and do different behaviors accordingly.

[sourcecode language=”scala”]
scala> def multitypeMatch (x: Any) = x match {
|    case i: Int => "an Int: " + i*i
|    case d: Double => "a Double: " + d/2
|    case b: Boolean => "a Boolean: " + !b
|    case s: String => "a String: " + s.length
|    case (p1: String, p2: Int) => "a Tuple[String, Int]: " + p2*p2 + p1.length
|    case (p1: Any, p2: Any) => "a Tuple[Any, Any]: (" + p1 + "," + p2 + ")"
|    case _ => "some other type " + x
| }
multitypeMatch: (x: Any)java.lang.String

scala> multitypeMatch(true)
res4: java.lang.String = a Boolean: false

scala> multitypeMatch(3)
res5: java.lang.String = an Int: 9

scala> multitypeMatch((1,3))
res6: java.lang.String = a Tuple[Any, Any]: (1,3)

scala> multitypeMatch(("hi",3))
res7: java.lang.String = a Tuple[String, Int]: 92
[/sourcecode]

So, for example, if it is an Int, we can do things like multiplication, if it is a Boolean we can negate it (with !), and so on. In the case statement, we provide a new variable that will have the type that is matched, and then after the arrow =>, we can use that variable in a type safe manner. Later we’ll see how to create classes (and in particular case classes), where this sort of matching based function is used regularly.

In the meantime, here’s an example of a simple addition function that allows one to enter a String or Int to specify its arguments. For example, the behavior we desire is this:

[sourcecode language=”scala”]
scala> add(1,3)
res4: Int = 4

scala> add("one",3)
res5: Int = 4

scala> add(1,"three")
res6: Int = 4

scala> add("one","three")
res7: Int = 4
[/sourcecode]

Let’s assume that we only handle the spelled out versions of 1 through 5, and that any string we cannot handle (e.g. “six” and aardvark”) is considered to be 0. Then the following two functions using matches handle it.

[sourcecode language=”scala”]
def convertToInt (x: String) = x match {
case "one" => 1
case "two" => 2
case "three" => 3
case "four" => 4
case "five" => 5
case _ => 0
}

def add (x: Any, y: Any) = (x,y) match {
case (x: Int, y: Int) => x + y
case (x: String, y: Int) => convertToInt(x) + y
case (x: Int, y: String) => x + convertToInt(y)
case (x: String, y: String) => convertToInt(x) + convertToInt(y)
case _ => 0
}
[/sourcecode]

Like if-else blocks, matches can return whatever type you like, including Tuples, Lists and more.

Match blocks are used in many other useful contexts that we’ll come to later. In the meantime, it is also worth pointing out that matching is actually used in variable assignment. We’ve seen it already with Tuples, but it can be done with Lists and other types.

[sourcecode language=”scala”]
scala> val (x,y) = (1,2)
x: Int = 1
y: Int = 2

scala> val colors = List("blue","red","yellow")
colors: List[java.lang.String] = List(blue, red, yellow)

scala> val List(color1, color2, color3) = colors
color1: java.lang.String = blue
color2: java.lang.String = red
color3: java.lang.String = yellow
[/sourcecode]

This is especially useful in the case of the args Array that comes from the command line when creating a script with Scala. For example, consider a program that is run as following.

[sourcecode language=”bash”]
$ scala nextYear.scala John 35
Next year John will be 36 years old.
[/sourcecode]

Here’s how we can do it. (Save the next two lines as nextYear.scala and try it out.)

[sourcecode language=”scala”]
val Array(name, age) = args
println("Next year " + name + " will be " + (age.toInt + 1) + " years old.")
[/sourcecode]

Notice that we had to do age.toInt. That is because age itself is a String, not an Int.

Conditional execution with if-else blocks and match blocks is a powerful part of building complex behaviors into your programs that you’ll see and use frequently!

Copyright 2011 Jason Baldridge

The text of this tutorial is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. Attribution may be provided by linking to www.jasonbaldridge.com and to this original tutorial.

Suggestions, improvements, extensions and bug fixes welcome — please email Jason at jasonbaldridge@gmail.com or provide a comment to this post.

First steps in Scala for beginning programmers, Part 2

Topics: Tuples, Lists, methods on Lists and Strings

Preface

This is the second in a planned series of tutorials on programming in Scala for first-time programmers, with specific reference to my Fall 2011 course Introduction to Computational Linguistics. You can see the other tutorials here on this blog; they are also listed on the course’s links page.

This tutorial focuses on Tuples and Lists, which are two constructs for working with groups of elements. You won’t get much done without the latter, and the former are so incredibly useful you probably find yourself using them a lot.

Tuples

We saw in the previous tutorial how a single value can be assigned to a variable and then used in various contexts. A Tuple is a generalization of that: a collection of two, three, four, and more values. Each value can have its own type.

[sourcecode language=”scala”]
scala> val twoInts = (3,9)
twoInts: (Int, Int) = (3,9)

scala> val twoStrings = ("hello", "world")
twoStrings: (java.lang.String, java.lang.String) = (hello,world)

scala> val threeDoubles = (3.14, 11.29, 1.5)
threeDoubles: (Double, Double, Double) = (3.14,11.29,1.5)

scala> val intAndString = (7, "lucky number")
intAndString: (Int, java.lang.String) = (7,lucky number)

scala> val mixedUp = (1, "hello", 1.16)
mixedUp: (Int, java.lang.String, Double) = (1,hello,1.16)
[/sourcecode]

The elements of a Tuple can be recovered in a few different ways. One way is to use a Tuple when initializing some variables, each of which takes on the value of the corresponding position in the Tuple on the right side of the equal sign.

[sourcecode language=”scala”]
scala> val (first, second) = twoInts
first: Int = 3
second: Int = 9

scala> val (numTimes, thingToSay, price) = mixedUp
numTimes: Int = 1
thingToSay: java.lang.String = hello
price: Double = 1.16
[/sourcecode]

Scala peels off the values and assigns them to each of the single variables. This becomes very useful in the context of functions that return Tuples. For example, consider a function that provides the left and right edges of a range when you give it the midpoint of the range and the size of the interval on each side of the midpoint.

[sourcecode language=”scala”]
scala> def rangeAround(midpoint: Int, size: Int) = (midpoint – size, midpoint + size)
rangeAround: (midpoint: Int, size: Int)(Int, Int)
[/sourcecode]

Since rangeAround returns a Tuple (specifically, a Pair), we can call it and set variables for the left and right directly from the function call.

[sourcecode language=”scala”]
scala> val (left, right) = rangeAround(21, 3)
left: Int = 18
right: Int = 24
[/sourcecode]

Another way to access the values in a Tuple is via indexation, using “_n” where n is the index of the item you want.

[sourcecode language=”scala”]
scala> print(mixedUp._1)
1
scala> print(mixedUp._2)
hello
scala> print(mixedUp._3)
1.16
[/sourcecode]

The syntax on this is a bit odd, but you’ll get used to it.

Tuples are an amazingly useful feature in a programming language. You’ll see some examples of their utility as we progress.

Lists

Lists are collections of ordered items that will be familiar to anyone who has done any shopping. Tuples are obviously related to lists, but they are less versatile in that they must be created in a single statement, they have a bounded length (about 20 or so), and they don’t support operations that perform computations on all of their elements.

In Scala, we can create lists of Strings, Ints, and Doubles (and more).

[sourcecode language=”scala”]
scala> val groceryList = List("apples", "milk", "butter")
groceryList: List[java.lang.String] = List(apples, milk, butter)

scala> val odds = List(1,3,5,7,9)
odds: List[Int] = List(1, 3, 5, 7, 9)

scala> val multinomial = List(.2, .4, .15, .25)
multinomial: List[Double] = List(0.2, 0.4, 0.15, 0.25)
[/sourcecode]

We see that Scala responds that a List has been created, along with brackets around the type of the elements it contains. So, List[Int] is read as “a List of Ints” and so on. This is to say that List is a parameterized data structure: it is a container that holds elements of specific types. We’ll see how knowing this allows us to do different things with Lists parameterized by different types.

We can also create Lists with mixtures of types.

[sourcecode language=”scala”]
scala> val intsAndDoubles = List(1, 1.5, 2, 2.5)
intsAndDoubles: List[Double] = List(1.0, 1.5, 2.0, 2.5)

scala> val today = List("August", 23, 2011)
today: List[Any] = List(August, 23, 2011)
[/sourcecode]

Types are sometimes autoconverted, such as converting Ints to Doubles for intsAndDoubles, but often there is no obvious generalizable type. For example, today is a List[Any], which means it is a List of Anys — and Any is the most general type in Scala, the supertype of all types. It’s sort of like saying “Yeah, I have a list of… well, you know… stuff.”

Lists can also contain Lists (and Lists of Lists, and Lists of Lists of Lists…).

[sourcecode language=”scala”]
scala> val embedded = List(List(1,2,3), List(10,30,50), List(200,400), List(1000))
embedded: List[List[Int]] = List(List(1, 2, 3), List(10, 30, 50), List(200, 400), List(1000))
[/sourcecode]

The type of embedded is List[List[Int]], which you can read as “a List of Lists of Ints.”

List methods

Okay, so now that we have some lists, what can we do with them? A lot, actually. One of the most basic properties of a list is its length, which you can get by using “.length” after the variable that refers to the list.

[sourcecode language=”scala”]
scala> groceryList.length
res19: Int = 3

scala> odds.length
res20: Int = 5

scala> embedded.length
res21: Int = 4
[/sourcecode]

Notice that the length of embedded is 4, which is the number of Lists it contains (not the number of elements in those lists).

The notation variable.method indicates that you are invoking a function that is specific to the type of that variable on the value in that variable. Okay, that was a mouthful. Scala is an object-oriented language, which means that every value has a set of actions that comes with it. Which actions are available depends on its type. So, above, we called the length method that is available to Lists on each of the list values given above. You didn’t realize it in the previous tutorial, but you were using methods when you added Ints or concatenated Strings — it’s just that Scala allows us to go without “.” and paretheses in certain cases. If we don’t drop them, here’s what it looks like.

[sourcecode language=”scala”]
scala> (2).+(3)
res25: Int = 5

scala> "Portis".+("head")
res26: java.lang.String = Portishead
[/sourcecode]

What is going on is that Ints have a method called “+” and Strings have a different method called “+“. They could have been called “bill” and “bob”, but that would be harder to remember, among other things. Ints have other methods, such as ““, “*“, and “/“, that Strings don’t have. (Note: I’m now returning to omitting the “.” and paretheses.)

[sourcecode language=”scala”]
scala> 5-3
res27: Int = 2

scala> "walked" – "ed"
<console>:8: error: value – is not a member of java.lang.String
"walked" – "ed"
[/sourcecode]

Scala complains that we tried to use the “” method on a String, since Strings don’t have such a method. On the other hand, Ints don’t have a method called length, while Strings do.

[sourcecode language=”scala”]
scala> 5.length
<console>:8: error: value length is not a member of Int
5.length
^

scala> "walked".length
res31: Int = 6
[/sourcecode]

With Strings, length returns the number of characters, whereas with Lists, it is the number of elements. The String length method could have been called “numberOfCharacters”, but “length” is easier to remember and it allows us to treat Strings like other sequences and think of them similarly.

Lets return to Lists and what we can do with them. “Addition” of two lists is their concatenation and is indicated with “++“.

[sourcecode language=”scala”]
scala> val evens = List(2,4,6,8)
evens: List[Int] = List(2, 4, 6, 8)

scala> val nums = odds ++ evens
nums: List[Int] = List(1, 3, 5, 7, 9, 2, 4, 6, 8)
[/sourcecode]

We can append a single item to the front of a List with “::“.

[sourcecode language=”scala”]
scala> val zeroToNine = 0 :: nums
zeroToNine: List[Int] = List(0, 1, 3, 5, 7, 9, 2, 4, 6, 8)
[/sourcecode]

And sort a list with sorted, and reverse it with reverse, and do both in sequence.

[sourcecode language=”scala”]
scala> zeroToNine.sorted
res42: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

scala> zeroToNine.reverse
res43: List[Int] = List(8, 6, 4, 2, 9, 7, 5, 3, 1, 0)

scala> zeroToNine.sorted.reverse
res44: List[Int] = List(9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
[/sourcecode]

What the last line says is “take zeroToNine, get a new sorted list from it, and then reverse that list.” Notice that calling these functions never changes zeroToNine itself! That is because List is immutable: you cannot change it, so all of these operations return new Lists. This property of Lists brings with it many benefits that we’ll return to later.

Note: immutability is different from the val/var distinction. It is common to think that a val variable is immutable, but it is not — it is fixed and cannot be reassigned. The following examples all involve immutable Lists, but the fixed variable is a val while the reassignable variable is a var.

[sourcecode language=”scala”]
scala> val fixed = List(1,2)
fixed: List[Int] = List(1, 2)

scala> fixed = List(3,4)
<console>:8: error: reassignment to val
fixed = List(3,4)
^

scala> var reassignable = List(5,6)
reassignable: List[Int] = List(5, 6)

scala> reassignable = List(7,8)
reassignable: List[Int] = List(7, 8)
[/sourcecode]

One of the things one frequently wants to do with a list is access its elements directly. This is done via indexation into the list, starting with 0 for the first element, 1 for the second element, and so on.

[sourcecode language=”scala”]
scala> odds
res48: List[Int] = List(1, 3, 5, 7, 9)

scala> odds(0)
res49: Int = 1

scala> odds(1)
res50: Int = 3
[/sourcecode]

Starting with 0 for the index of the first element is standard practice in computer science. It might seem strange at first, but you’ll get used to it fairly quickly.

We can of course use any Int expression to access an item in a list.

[sourcecode language=”scala”]
scala> zeroToNine(3)
res63: Int = 5

scala> zeroToNine(5-2)
res64: Int = 5

scala> val index = 3
index: Int = 3

scala> zeroToNine(index)
res65: Int = 5
[/sourcecode]

If we ask for an index that is equal to or greater than the number of elements in the list, we get an error.

[sourcecode language=”scala”]
scala> odds(10)
java.lang.IndexOutOfBoundsException: 10
at scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:51)
at scala.collection.immutable.List.apply(List.scala:45)
at .<init>(<console>:9)
at .<clinit>(<console>)
at .<init>(<console>:11)
at .<clinit>(<console>)
at $export(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:592)
at scala.tools.nsc.interpreter.IMain$Request$$anonfun$10.apply(IMain.scala:828)
at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)
at scala.tools.nsc.io.package$$anon$2.run(package.scala:31)
at java.lang.Thread.run(Thread.java:680)
[/sourcecode]

Looking at all that, you might be thinking “WTF?” It’s called the stack trace, and it gives you a detailed breakdown of where problems happened in a bit of code. For beginning programmers, this is likely to look overwhelming and intimidating — you can safely glaze over it for now, but before long, it will be necessary to be able to use the stack trace to identify problems in your code and address them.

Another useful method is slice, which gives you a sublist from one index up to, but not including, another.

[sourcecode language=”scala”]
scala> zeroToNine
res55: List[Int] = List(0, 1, 3, 5, 7, 9, 2, 4, 6, 8)

scala> zeroToNine.slice(2,6)
res56: List[Int] = List(3, 5, 7, 9)
[/sourcecode]

So, the slice gave us a list with the elements from index 2 (the third element) up to index 5 (the sixth element).

Returning briefly to Strings — other List methods than length work with them too.

[sourcecode language=”scala”]
scala> val artist = "DJ Shadow"
artist: java.lang.String = DJ Shadow

scala> artist(3)
res0: Char = S

scala> artist.slice(3,6)
res1: String = Sha

scala> artist.reverse
res2: String = wodahS JD

scala> artist.sorted
res3: String = " DJSadhow"
[/sourcecode]

On lists that contain numbers, we can use the sum method.

[sourcecode language=”scala”]
scala> odds.sum
res59: Int = 25

scala> multinomial.sum
res60: Double = 1.0
[/sourcecode]

However, if the list contains non-numeric values, sum isn’t valid.

[sourcecode language=”scala”]
scala> groceryList.sum
<console>:9: error: could not find implicit value for parameter num: Numeric[java.lang.String]
groceryList.sum
^
[/sourcecode]

What is going on is some very cool and useful automagical behavior by Scala involving implicits. We’ll come back to that later, but for now you can happily use sum on Lists of Ints and Doubles.

One thing we often want to do with lists is obtain a String representation of their contents in some visually useful way. For example, we might want a grocery list to be a String with one item per line, or a list of Ints to have a comma between each element. The mkString method does just what we need.

[sourcecode language=”scala”]
scala> groceryList.mkString("n")
res22: String =
apples
milk
butter

scala> odds.mkString(",")
res23: String = 1,3,5,7,9
[/sourcecode]

Want to know if a list contains a particular element? Use contains on the list.

[sourcecode language=”scala”]
scala> groceryList.contains("milk")
res4: Boolean = true

scala> groceryList.contains("coffee")
res5: Boolean = false
[/sourcecode]

And now we arrive at Booleans, another of the most important basic types. They play a major role in conditional execution, which we’ll cover in the next tutorial.

There are actually many more methods available for lists, which you can see by going to the entry for List in the Scala API. API stands for Application Programming Interface — in other words a collection of specifications for what you can do with various components of the Scala programming language. I’m going to do my best to give you the methods you need for now, but eventually you will need to be able to look at the API entries for Scala types to see what methods are available, what they do and how to use them.

Some of the most important methods on Lists we haven’t covered are map, filter, foldLeft, and reduce. We’ll come back to them in detail later, but for now here is a teaser that should give you an intuitive sense of what they do.

[sourcecode language=”scala”]
scala> val odds = List(1,3,5,7,9)
odds: List[Int] = List(1, 3, 5, 7, 9)

scala> odds.map(1+)
res6: List[Int] = List(2, 4, 6, 8, 10)

scala> odds.filter(4<)
res7: List[Int] = List(5, 7, 9)

scala> odds.foldLeft(10)(_ + _)
res8: Int = 35

scala> odds.filter(6>).map(_.toString).reduce(_ + "," + _)
res9: java.lang.String = 1,3,5
[/sourcecode]

Now we’re getting functional. 🙂

Copyright 2011 Jason Baldridge

The text of this tutorial is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. Attribution may be provided by linking to www.jasonbaldridge.com and to this original tutorial.

Suggestions, improvements, extensions and bug fixes welcome — please email Jason at jasonbaldridge@gmail.com or provide a comment to this post.

First steps in Scala for first time programmers, Part 1

Topics: the Scala REPL, expressions, variables, basic types, simple functions, saving and running programs, comments.

Preface

The is the first of several Scala tutorials I’m creating for my Fall 2011 graduate Introduction to Computational Linguistics course at UT Austin, loosely based on similar tutorials that Katrin Erk created for teaching Python in a similar course. These tutorials assume no previous programming background, an assumption which is unfortunately still quite rare in the help-folks-learn-Scala universe, and which more or less necessitates the creation of these tutorials. The one exception I’m aware of is SimplyScala.

Note: if you already know a programming language, this tutorial will probably not be very useful for you. (Though perhaps some of the later ones on functional programming and such that I intend to do will be, so check back.) In the meantime, check out existing Scala learning materials I’ve listed in the links page for the course.

This tutorial assumes you have Scala installed and that you are using some form of Unix (if you use Windows, you’ll want to look into Cygwin). If you are having problems with this, you might try the examples by evaluating them in the code box on SimplyScala.

A (partial) starter tour of Scala expressions, variables, and basic types

We’ll use the Scala REPL for entering Scala expressions and seeing what the result of evaluating them is. REPL stands for read-eval(uate)-print-loop, which means it is a program that (1) reads the expressions you type in, (2) evaluates them using the Scala compiler, (3) prints out the result of the evaluation, and then (4) waits for you to enter further expressions.

Note: it is very important that you actually type the commands given below into the REPL. If you just read them over, it will in many cases look quite obvious (and it is), but someone who is new to programming will generally find many gaps in their understanding by actually trying things out. In particular, programming languages are very exact, so they’ll do exactly what you tell them to do — and you’ll almost surely mess a few things up, and learn from that.

In a Unix shell, type:

[sourcecode lang=”bash”]

$ scala

[/sourcecode]

You should see something like the following:

[sourcecode lang=”bash”]

Welcome to Scala version 2.9.0.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_26).
Type in expressions to have them evaluated.
Type :help for more information.

scala>

[/sourcecode]

The scala> line is the prompt that the REPL is waiting for you to enter expressions. Let’s enter some.

[sourcecode lang=”scala”]

scala> "Hello world"
res0: java.lang.String = Hello world

scala> 2
res1: Int = 2

[/sourcecode]

The REPL tells us that the first is a String which contains the characters Hello world, that the second is an Int whose value is the integer 2. Strings and Ints are types — this is an easy but important distinction that sometimes takes beginning programmers a while to get used to. This allows Scala to know what to do when you want to use the numerical value 2 (an Int) or the character representing 2 (a String). For example, here is the latter.

[sourcecode lang=”scala”]

scala> "2"
res3: java.lang.String = 2

[/sourcecode]

Scala knows that different actions are afforded by different types. For example, Ints can be added to each other.

[sourcecode lang=”scala”]

scala> 2+3
res4: Int = 5

[/sourcecode]

Scala evaluates the result and prints the result to the screen. No surprise, the result is what you think it should be. With Strings, addition doesn’t make sense, but the + operator instead indicates concatenation of the strings.

[sourcecode lang=”scala”]

scala> "Portis" + "head"
res5: java.lang.String = Portishead

[/sourcecode]

So, if you consider the String “2” and the String “3”, and use the + operator on them, you don’t get “5” — instead you get “23”.

[sourcecode lang=”scala”]

scala> "2" + "3"
res6: java.lang.String = 23

[/sourcecode]

We can ask Scala to display the result of evaluating a given expression by using the print command.

[sourcecode lang=”scala”]

scala> print ("Hello world")
Hello world
scala> print (2)
2
scala> print (2 + 3)
5

[/sourcecode]

Note that the result is the action of printing, not a value with a type. For the last item, what happens is:

  1. Scala evaluates 2 + 3, which is 5.
  2. Scala passes that value to the command print.
  3. print outputs “5”

You can think of the print command as a verb, and its parameter (e.g. Hello world or 2) as its object.

We often need to store the result of evaluating an expression to a variable for later use (in fact, programming doesn’t get done with out doing this). Let’s do this trivially to start with, breaking down the print statement above into two steps.

[sourcecode lang=”scala”]

scala> val x = 2 + 3
x: Int = 5

scala> print (x)
5

[/sourcecode]

Here, x is a variable, which we’ve indicate by prefacing it with val, which indicates it is a fixed variable whose value cannot change.

You can choose the names for variables, but they must follow some rules.

  • Variable names may contain: letters, numbers, underscore
  • They must not start with a number
  • They must not be identical to one of the “reserved words”  that Scala has already defined, such as for, if, def, val, and var

Typical names for variables will be strings like x, y1, result, firstName, here_is_a_long_variable_name.

Returning to the variable x, we can now use it for other computations.

[sourcecode lang=”scala”]

scala> x + 3
res12: Int = 8

scala> x * x
res13: Int = 25

[/sourcecode]

And of course, we can assign the results of such computations to other variables.

[sourcecode lang=”scala”]

scala> val y = 3 * x + 2
y: Int = 17

[/sourcecode]

Notice that Scala knows that multiplication takes precedence over addition in such computations. If we wanted to override that, we’d need to use parentheses to indicate it, just like in basic algebra.

[sourcecode lang=”scala”]

scala> val z = 3 * (x + 2)
z: Int = 21

[/sourcecode]

Now, let’s introduce another type, Double, for working with real-valued numbers. The Ints considered thus far are whole numbers, which do fine with multiplication, but will lead to behavior you might not expect when used with division.

[sourcecode lang=”scala”]

scala> 5 * 2
res12: Int = 10

scala> 7/2
res13: Int = 3

[/sourcecode]

You probably expected to get 3.5 for the latter. However, because both 7 and 2 are Ints, Scala returns an Int — specifically it returns the number of times the denominator can go entirely into the numerator. To get the result you’d normally want here, you need to use Doubles such as the following.

[sourcecode lang=”scala”]

scala> 7.0/2.0
res14: Double = 3.5

[/sourcecode]

Now the result is the value you’d expect, and it is of type Double. Scala uses conventions to know the type of values, e.g. it knows that things in quotes are Strings, that numbers that have a “.” in them are Doubles, and that numbers without “.” are Ints. This is an important part of how it Scala infers the types of variables, which is a very useful and somewhat unique property among of languages of its kind (which are called statically typed languages). To see this in a bit more detail, note that you can tell Scala explicitly what a variable’s type is.

[sourcecode lang=”scala”]

scala> val a: Int  = 2
a: Int = 2

[/sourcecode]

The a: Int portion of the line indicates that the variable a has the type Int. Here are some examples of other variables with different types.

[sourcecode lang=”scala”]

scala> val b: Double = 3.14
b: Double = 3.14

scala> val c: String = "Hello world"
c: String = Hello world

[/sourcecode]

Because Scala already knows these types, it is redundant to specify them in these cases. However, when expressions are more complicated, it is at times necessary to specify types explicitly.

Importantly, we cannot assign the variable a type that conflicts with the result of the expression. Here, we try to assign a Double value to a variable of type Int, and Scala reports an error.

[sourcecode lang=”scala”]

scala> val d: Int = 6.28
<console>:7: error: type mismatch;
found   : Double(6.28)
required: Int
val d: Int = 6.28
^
[/sourcecode]

In many cases, especially with beginning programming, you won’t have to worry about declaring the types of your variables. We’ll see situations where it is necessary as we progress.

In addition to variables declared with val, Scala allows variables to be declared with var — these variable can have their values reassigned. A few examples are the easiest way to see the difference.

[sourcecode lang=”scala”]

scala> val a = 1
a: Int = 1

scala> a = 2
<console>:8: error: reassignment to val
a = 2
^

scala> var b = 5
b: Int = 5

scala> b = 6
b: Int = 6

[/sourcecode]

You can think of a val variable as a sealed glass container into which you can look to see its value, but into which you cannot put anything new, and a var variable as an openable container that allows you both to see the value and to swap a new value in for the old one. We’re going to focus on using vals mostly as they ultimately provide many advantages when combined with functional programming, and because I hope to get you thinking in terms of vals rather than vars while you are starting out.

Functions

Variables are more useful when used in the context of functions in which a variable like x can be injected with different values by the user of a function. Let’s consider converting degrees Fahrenheit to Celsius. To convert 87, 92, and 100 from Fahrenheit to Celcius, we could do the following.

[sourcecode lang=”scala”]

scala> (87 – 32) * 5 / 9.0
res15: Double = 30.555555555555557

scala> (92 – 32) * 5 / 9.0
res16: Double = 33.333333333333336

scala> (100 – 32) * 5 / 9.0
res17: Double = 37.77777777777778

[/sourcecode]

Obviously, there is a lot of repetition here. Functions allow us to specify the common parts of such calculations, while allowing variables to specify the parts that may be different. In the conversion case, the only thing that changes is the temperature reading in Fahrenheit. Here’s how we declare the appropriate function in Scala.

[sourcecode lang=”scala”]

scala> def f2c (x: Double) = (x – 32) * 5/9.0
f2c: (x: Double)Double

[/sourcecode]

Breaking this down we have:

  • def is a Scala keyword indicating that a function is being defined
  • f2c is the name given to the function
  • (x: Double) is the parameter to the function, which is a variable named x of type Double
  • (x – 32) * 5/9.0 is the body of the function, which will take the value given by the user of the function, subtract 32 from it and then multiply the result of that by five-ninths

Using the function is easy — give the name of the function, and then provide the value you are passing into the function in parentheses.

[sourcecode lang=”scala”]

scala> f2c(87)
res18: Double = 30.555555555555557

scala> f2c(92)
res19: Double = 33.333333333333336

scala> f2c(100)
res20: Double = 37.77777777777778

[/sourcecode]

And so on. For each call, the function evaluates the expression for x equal  to the value passed into the function. Now we don’t have to retype all the common stuff again and again.

Functions can have multiple arguments. For example, the following is a function which takes two integers, squares each of them and then adds the squared values.

[sourcecode lang=”scala”]

scala> def squareThenAdd (x: Int, y: Int) = x*x + y*y
squareThenAdd: (x: Int, y: Int)Int

scala> squareThenAdd(3,4)
res21: Int = 25

[/sourcecode]

Which indeed is the same as doing it explicitly.

[sourcecode lang=”scala”]

scala> 3*3 + 4*4
res22: Int = 25

[/sourcecode]

An important aspect of functions is that all of the variables must be bound. If not, we get an error.

[sourcecode lang=”scala”]

scala> def badFunctionWithUnboundVariable (x: Int) = x + y
<console>:8: error: not found: value y
def badFunctionWithUnboundVariable (x: Int) = x + y

[/sourcecode]

Functions can do much more complex and interesting things that what I’ve shown here, which we’ll get to in another tutorial.

Editing programs in a text editor and running them on the command line

The REPL is very useful for trying out Scala expressions and seeing how they are evaluated in real time, but actual program development is done by writing a text file that contains a series of expressions that perform interesting behaviors. Doing this is straightforward. Open a text editor (see the course links page for some suggestions), and put the following as the first line, with nothing else.

[sourcecode lang=”scala”]

print ("Hello world")

[/sourcecode]

Save this file as HelloWorld.scala, making sure it is saved as text only. Then in a Unix shell, go to the directory where that file is saved, and type the following.

[sourcecode lang=”bash”]

$ scala HelloWorld.scala

[/sourcecode]

You’ll see that Hello world is output, but that the Unix prompt is jammed up right after it. You may have expected it to print out and then leave the Unix prompt on the next line; however, there is nothing in the print command or in the string we asked it to print that indicates that a newline should have been used. To fix this, go back to the editor and change the line to be the following.

[sourcecode lang=”scala”]

print ("Hello worldn")

[/sourcecode]

When you run this, your Unix prompt appears on the line following Hello world. Characters like ‘n‘ are metacharacters that indicate outputs other than standard characters like letters, numbers and symbols.

Now, you could also have achieved the same result by writing.

[sourcecode lang=”scala”]

println ("Hello world")

[/sourcecode]

The functions print and println are the same except that the latter always adds a newline at the end of its output — something that is often desired and thus simplifies the programmer’s life. However, we still often need to use the newline character and other characters when outputting strings. For example, put the following into HelloWorld.scala and run it again.

[sourcecode lang=”scala”]

println("Hello worldnHere is a list:nt1nt2nt3")

[/sourcecode]

From the output, it should be quite clear what ‘t‘ means. Notice that it wasn’t necessary to put ‘n‘ after the final 3 because println was used instead of print.

This is a trivial program, but in general they tend to get quite complex. This is where code comments come in handy. You can indicate that a line should be ignored by Scala as a comment by using two forward slashes. Comments can be used to indicate who the author of a program is, what the license is for it, documentation to help others (and your future self) understand what various parts of the code are doing, and commenting out lines of code that you don’t want to erase but which you want temporarily inactive.

Here’s a slightly lengthier program with comments and function definitions and uses of those functions along with printing.

[sourcecode lang=”scala”]

// Author: Jason Baldridge (jasonbaldridge@gmail.com)

// This is a trivial program for students learning to program with Scala.

// This is a comment. The next line defines a function that squares
// its argument.
def sq (x: Int) = x * x

// The next line prints the result of calling sq with the argument 3.
println("3 squared = " + sq(3))

// The next line is commented out, so even though it is a valid Scala
// expression, it won’t be evaluated by Scala.
// println("4 squared = " + sq(4))

// Now, we define a function that uses the previously defined sq
// (rather than using x*x and y*y as before).
def squareThenAdd (x: Int, y: Int) = sq(x) + sq(y)

// Now we use it.
println("Squaring 3 and 4 and adding the results = "
+ squareThenAdd(3,4))

[/sourcecode]

Save this as ScalaFirstStepsPart1.scala and run it with the Scala executable. You should see the following results.

[sourcecode lang=”bash”]

$ scala ScalaFirstStepsPart1.scala
3 squared = 9
Squaring 3 and 4 and adding the results = 25

[/sourcecode]

Looks good, right? But what is going on with those print statements? We saw earlier that 2+3 evaluates to 5, but that “2”+”3″ evaluates to “23”, and here we have used + on a String and an Int. Shouldn’t that result in an error? What Scala is doing is converting the Int into a string automatically for us, which simplifies the outputing of results considerably. That means we can do things like the following (back to using the REPL).

[sourcecode lang=”scala”]

scala> println("August " + 22 + ", " + 2011)
August 22, 2011

[/sourcecode]

That seems a bit pointless because we could have just written “August 22, 2011″, but here’s an example where it is a bit more useful: we can name tomorrow’s day by using an Int for today’s and adding one to it.

[sourcecode lang=”scala”]

scala> val dayOfTheMonthToday = 22
dayOfTheMonthToday: Int = 22

scala> println("Today is August " + dayOfTheMonthToday + " and tomorrow is August " + (dayOfTheMonthToday+1))
Today is August 22 and tomorrow is August 23

[/sourcecode]

Note that the (dayOfTheMonthToday+1) part is actually Int addition, and the result of that is converted to a String that is concatenated with the rest of the string. This example is still fairly contrived (and obviously doesn’t deal with the end of the month and all), but this autoconversion gets used a lot when you start working with more complex programs. And, perhaps even more obviously, we might reasonably want to add an Int and a Double.

[sourcecode lang=”scala”]

scala> 2 + 3.0
res27: Double = 5.0

[/sourcecode]

Here, the result is a Double, since it is the more general type than Int. This kind of autoconversion happens a lot, and often you won’t even realize it is going on.

Another thing to note is that the last print statement went over multiple lines. We’ll be seeing more about what the rules are for statements that run over multiple lines, but this example shows perhaps the easiest one to remember: when you open a parenthesis with “(“, you can keep on going multiple lines until its partner “)” is encountered. So, for example, we could have done the following very spread-out statement.

[sourcecode lang=”scala”]

println(
"Squaring 3 and 4 and adding the results = "

+

squareThenAdd(3,4)
)

[/sourcecode]

As well as being used for boxing in a bunch of code that is an argument to a function, as above, parentheses are quite useful for indicating the order of precedence of combining multiple items, such as indicating that an addition should be done before a multiplication, or that an Int addition should be done before a String concatenation, as shown earlier in the tutorial. Basically, parentheses are often optional, but if you can’t remember what the default rules for expressions being grouped together are, then you can group them explicitly with parentheses.

Copyright 2011 Jason Baldridge

The text of this tutorial is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. Attribution may be provided by linking to www.jasonbaldridge.com and to this original tutorial.

Suggestions, improvements, extensions and bug fixes welcome — please email Jason at jasonbaldridge@gmail.com or provide a comment to this post.

Fun with function composition in Scala

This is my first post with content, and is what motivated me to start this blog — a simple little code bite that I thought might be useful for others. And, since it is about composing functions, it helped  me come up with the name Bcomposes.

So, the goal of this post is to show how a list of functions can be composed to create a single function, in the context of mapping a set of values using those functions. It’s a cute example that shows off some of the goodness that comes with functional programming in Scala. And, while this isn’t a tutorial, it might still be useful for people who are just getting into functional programming.

We’ll start with the list of numbers 1 to 5 and some simple functions — one for adding 1, another for squaring, and third for adding 100.

[sourcecode lang=”scala”]
scala> val foo = 1 to 5 toList
foo: List[Int] = List(1, 2, 3, 4, 5)

scala> val add1 = (x: Int) => x + 1
add1: (Int) => Int = <function1>

scala> val add100 = (x: Int) => x + 100
add100: (Int) => Int = <function1>

scala> val sq = (x: Int) => x * x
sq: (Int) => Int = <function1>
[/sourcecode]

We can then apply any of these functions to each element in the list foo by using the map function.

[sourcecode lang=”scala”]
scala> foo map add1
res0: List[Int] = List(2, 3, 4, 5, 6)

scala> foo map add100
res1: List[Int] = List(101, 102, 103, 104, 105)

scala> foo map sq
res2: List[Int] = List(1, 4, 9, 16, 25)
[/sourcecode]

We can save the results of mapping all the values through add1, and then map the resulting list through sq.

[sourcecode lang=”scala”]
scala> val bar = foo map add1
bar: List[Int] = List(2, 3, 4, 5, 6)

scala> bar map sq
res3: List[Int] = List(4, 9, 16, 25, 36)

[/sourcecode]

Or, if we don’t care about the intermediate result, we can just keep on mapping, through both functions.

[sourcecode lang=”scala”]
scala> foo map add1 map sq
res4: List[Int] = List(4, 9, 16, 25, 36)
[/sourcecode]

What we just did, above, was sq(add1(x)) for every x in the list foo. We could have instead composed the two functions, since sq(add1(x)) = sqοadd1(x). Here’s what it looks like in Scala:

[sourcecode lang=”scala”]
scala> val sqComposeAdd1 = sq compose add1
sqComposeAdd1: (Int) => Int = <function1>

scala> foo map sqComposeAdd1
res5: List[Int] = List(4, 9, 16, 25, 36)
[/sourcecode]

Of course, we can do this with more than two functions.

[sourcecode lang=”scala”]
scala> foo map add1 map sq map add100
res6: List[Int] = List(104, 109, 116, 125, 136)

scala> foo map (add100 compose sq compose add1)
res7: List[Int] = List(104, 109, 116, 125, 136)
[/sourcecode]

And so on. Now, imagine that you want the user of a program you’ve written to be able to select the functions they want to apply to a list of items, perhaps from a set of predefined functions you’ve provided plus perhaps ones they are themselves defining. So, here’s the really useful part: we can compose that arbitrary bunch of functions on the fly to turn them into a single function, without having to write out “compose … compose … compose…” or “map … map … map …” We do this by building up a list of the functions (in the order we want to apply them to the values) and then reducing them using the compose function. Equivalent to what we had above:

[sourcecode lang=”scala”]
scala> val fncs = List(add1, sq, add100)
fncs: List[(Int) => Int] = List(<function1>, <function1>, <function1>)

scala> foo map ( fncs.reverse reduce (_ compose _) )
res8: List[Int] = List(104, 109, 116, 125, 136)
[/sourcecode]

Notice the that it was necessary to reverse the list in order for the composition to be ordered correctly. If you don’t feel like doing that, you can use andThen in Scala.

[sourcecode lang=”scala”]
scala> foo map (add1 andThen sq andThen add100)
res9: List[Int] = List(104, 109, 116, 125, 136)
[/sourcecode]

Which we can of course use with reduce as well.

[sourcecode lang=”scala”]
scala> foo map ( fncs reduce (_ andThen _) )
res10: List[Int] = List(104, 109, 116, 125, 136)
[/sourcecode]

Since functions are first class citizens (something we used several times above), we can assign the composed or andThened result to a val and use it directly.

[sourcecode lang=”scala”]
scala> val superFunction = fncs reduce (_ andThen _)
superFunction: (Int) => Int = <function1>

scala> foo map superFunction
res11: List[Int] = List(104, 109, 116, 125, 136)
[/sourcecode]

This example is of course artificial, but the general pattern works nicely with much more complex/interesting functions and can provide a nice way of configuring a bunch of alternative functions for different use cases.

Initial post

New blog created! I’ve called it “bcomposes” because this is Jason Baldridge composing the entries, and because B is the symbol for the composition combinator of combinatory logic (also used in Combinatory Categorial Grammar, the subject of my dissertation). I’ve created it mainly so I can post useful code snippets while I program and teach, but perhaps I’ll put other things up from time to time.

Not sure when I’ll get started with posts, but at least this is created for now! Feel free to see my tweets in the meantime…