Lecture 12a: Traversable Functors

Presenter Notes

Foldable

The Foldable type class captures the concept of data structures that we can iterate over. Lists are foldable, as are Vectors and Streams.

Foldables give us a way to process values embedded in a structure as if they existed in a sequential order, as we’ve seen with list folding.

Foldable gives us great use cases for monoids and the Eval monad.

Presenter Notes

In general, a fold function allows users to transform one algebraic data type to another.

The typical use case is to accumulate a value as we traverse. We supply an accumulator value and a binary function to combine it with an item in the sequence.

The function produces another accumulator, allowing us to recurse down the sequence. When we reach the end, the final accumulator is our result.

Presenter Notes

Cats’ Foldable abstracts the two operations foldLeft and foldRight into a type class:

trait Foldable[F[_]] { self =>
  def foldLeft[A, B](fa: F[A], b: B)
                    (f: (B, A) => B): B
  def foldRight[A, B](fa: F[A], lb: Eval[B])
                     (f: (A, Eval[B]) => Eval[B]): Eval[B]
  //...
}

Presenter Notes

Cats provides out-of-the-box instances of Foldable for a handful of Scala data types: List, Vector, Stream, Option, and Map.

import cats.Foldable
val ints = List(1, 2, 3)
import cats.instances.list._
Foldable[List].foldLeft(ints, 0)(_ + _)
//res0: Int = 6

Presenter Notes

The Foldable instance for Map allows us to fold over its values (as opposed to its keys).

Because Map has two type parameters, we have to fix one of them to create the single-parameter type constructor we need to summon the Foldable:

import cats.std.map._
type StringMap[A] = Map[String, A]
val stringMap = Map("a" -> "b", "c" -> "d")
Foldable[StringMap].foldLeft(stringMap, "nil")(_ + "," + _)
//res1: String = nil,b,d

Presenter Notes

Foldable defines foldRight differently from foldLeft, in terms of the Eval monad:

def foldRight[A, B](fa: F[A], lb: Eval[B])
  (f: (A, Eval[B]) => Eval[B]): Eval[B]



Using Eval means folding with Foldable is always stack safe, even when the collection’s default definition of foldRight is not.

Presenter Notes

For example, the default implementation for Stream is not stack safe. We can see the stack depth changing as we iterate across the stream.

import cats.Eval
import cats.Foldable
def stackDepth: Int = Thread.currentThread.getStackTrace.length

Presenter Notes

The longer the stream, the larger the stack requirements for the fold. A sufficiently large stream will trigger a StackOverflowException:

(1 to 5).toStream.foldRight(0) {
  (item, accum) => println(stackDepth)
  item + accum
}
//60
//58
//56
//54
//52
//res2: Int = 15

Presenter Notes

However since Eval's map and flatMap are trampolined, Foldable's foldRight maintains the same stack depth throughout:

import cats.std.stream._
val s = (1 to 5).toStream
val accum: Eval[Int] = Eval.now(0)
val result: Eval[Int] =
  Foldable[Stream].foldRight(s, accum) {
    (item: Int, accum: Eval[Int]) =>
      println(stackDepth)
      accum.map(_ + item)
}

Presenter Notes

result.value
//55
//55
//55
//55
//55
//res3: Int = 15

Presenter Notes

Folding with Monoids

Cats’ Foldable provides us with a host of useful methods defined on top of foldLeft.

Many of these are facsimiles of familiar methods from the standard library, including find, exists, forall, toList, isEmpty, and nonEmpty:

Foldable[Option].nonEmpty(Option(42))
//res4: Boolean = true
Foldable[List].find(List(1, 2, 3))(_ % 2 == 0)
//res5: Option[Int] = Some(2)

Presenter Notes

In addition to these familiar methods, Cats provides two methods that make use of monoids:

fold (and its alias combineAll) combines all elements in the sequence using their Monoid foldMap maps a user-supplied function over the sequence and combines the results using a Monoid

Presenter Notes

For example, we can use combineAll to sum over a List[Int]:

import cats.std.int._ // import Monoid[Int]
Foldable[List].fold(List(1, 2, 3))
//res6: Int = 6

Presenter Notes

Alternatively, we can use foldMap to convert each Int to a String and concatenate them:

import cats.std.string._ // import Monoid[String]
Foldable[List].foldMap(List(1, 2, 3))(_.toString)
//???

Presenter Notes

The interesting thing about fold and foldMap is that they use a Monoid instead of a function to give us the final result.

One very important aspect to understand here is that it is the fold function that requires the elements of Foldable to have a Monoid instance, while Foldable itself does not have that restriction.

Presenter Notes

trait Foldable[F[_]] { self =>
  //foldLeft, foldRight etc
  def fold[A](fa: F[A])
             (implicit A: Monoid[A]): A =
    foldLeft(fa, A.empty) { (acc, a) =>
      A.combine(acc, a)
    }

  def foldMap[A, B](fa: F[A])
                   (f: A => B)
                   (implicit B: Monoid[B]): B =
    foldLeft(fa, B.empty) { (b, a) =>
      B.combine(b, f(a))
    }
}

Presenter Notes

The same goes for foldM, which implements left associative monadic folding using an implicit Monad:

trait Foldable[F[_]] { self =>
  //foldLeft, foldRight, fold, foldMap etc
  def foldM[G[_], A, B](fa: F[A], z: B)
                       (f: (B, A) => G[B])
                       (implicit G: Monad[G]): G[B] =
    foldLeft(fa, G.pure(z)) { (gb, a) =>
      G.flatMap(gb)(b => f(b, a))
    }
}

Presenter Notes

import cats.implicits._
def binSmalls(acc: Int, x: Int): Option[Int] =
  if (x > 9) none[Int] else (acc + x).some
(Foldable[List].foldM(List(2, 8, 3, 1), 0) {binSmalls})
//???
(Foldable[List].foldM(List(2, 10, 3, 1), 0) {binSmalls})
//???

Presenter Notes

Finally, we can compose Foldables to support deep traversal of nested sequences:

import cats.std.vector._
val deepFold = Foldable[List].compose(Foldable[Vector])
val ints = List(Vector(1, 2, 3), Vector(4, 5, 6))
deepFold fold ints
//res7: Int = 21

Presenter Notes

Traverse

In functional programming it is very common to encode "effects" as data types - common effects include Option for possibly missing values, Xor and Validated for possible errors, and Future for asynchronous computations.

These effects tend to show up in functions working on a single piece of data - for instance parsing a single String into an Int, validating a login, or asynchronously fetching website information for a user.

Presenter Notes

def parseInt(s: String): Option[Int] = ???
import cats.data.Xor
import scala.concurrent.Future
trait SecError
trait Token
def validateLogin(cred: Token): Xor[SecError, Unit] = ???
trait Profile
trait User
def userInfo(user: User): Future[Profile] = ???

Presenter Notes

Each function asks only for the data it actually needs; in the case of userInfo, a single User.

We could write a function that takes a list of users and fetches profiles for all of them, but that would be a bit strange.

If we just wanted to fetch the profile of just one user, we would either have to wrap it in a List or write a separate function that takes in a single user anyways.

Presenter Notes

More fundamentally, functional programming is about building lots of small, independent pieces and composing them to make larger and larger pieces - does this hold true in this case?

Given just User => Future[Profile], what should we do if we want to fetch profiles for a List[User]? We could try familiar combinators like map:

def profilesFor(users: List[User]) = users.map(userInfo)
profilesFor: (users: List[User])List[Future[Profile]]

Presenter Notes

Note the return type List[Future[Profile]]. This makes sense given the type signatures, but seems unwieldy.

We now have a list of asynchronous values, and to work with those values we must then use the combinators on Future for every single one.

It would be nicer instead if we could get the aggregate result in a single Future, say a Future[List[Profile]].

Presenter Notes

As it turns out, the Future companion object has a traverse method on it.

However, that method is specialized to standard library collections and Futures - there exists a much more generalized form that would allow us to do things like parse a List[String] or validate credentials for a List[User].

Enter Traverse.

Presenter Notes

Traverse depends on Applicative (and thus Functor) as well as Foldable:

trait Traverse[F[_]] extends Functor[F] with Foldable[F] { self =>

  def traverse[G[_]: Applicative, A, B]
    (fa: F[A])(f: A => G[B]): G[F[B]]

  def sequence[G[_]: Applicative, A]
    (fga: F[G[A]]): G[F[A]] = traverse(fga)(ga => ga)
  //...
}

Presenter Notes

sequence threads all the G effects through the F structure to invert the structure from F[G[_]] to G[F[_]].

traverse allows you to transform elements inside the structure like a Functor, producing applicative effects along the way, and lift those instances of applicative structure outside of the Traversable structure.

Given a function which returns a G effect, traverse threads this effect through the running of this function on all the values in F, returning an F[A] in a G context.

Presenter Notes

In our above examples, F is List, and G is Option, Xor, or Future.

For the profile example, traverse says given a List[User] and a function User => Future[Profile], it can give you a Future[List[Profile]].

More generally, F[_] is some sort of context which may contain a value (or several). While List tends to be among the most general cases, there also exist Traversable instances for Option, Xor, and Validated (among others).

Presenter Notes

One way to think of traverse is as a generalization of sequence:

List(1,2,3).traverse(_.some)
//res8 = ???
List(1,2,3).map(_.some).traverse(identity)
//res9 = ???
List(1,2,3).map(_.some).sequence
//res10 = ???

Presenter Notes

Another is as a generalization of map:

import cats.syntax.traverse._
List(1,2,3).traverse[Id, Int]((x: Int) => x + 1)
//res11 = ???

Presenter Notes

Type signature of traverse in relation to map and flatMap:

def      map(f: A =>   B) : F[A] =>   F[B]
def traverse(f: A => G[B]): F[A] => G[F[B]]     
def  flatMap(f: A => F[B]): F[A] =>   F[B]

Presenter Notes

We’re still mapping a function over some embedded value(s), like map, but similar to flatMap, the function is itself generating more structure.

However, unlike flatMap, the generated structure is of a different type than the embedded structure.

Presenter Notes

Finally, note the implied relationship between Foldable and Traverse.

  • Foldable does not extend Monoid, but has methods that rely on monoidal values (e.g. def fold[A: Monoid](fa: F[A]))

  • Traverse does not extend Applicative, but has methods that rely on applicative structures (e.g. def traverse[G[_]: Applicative, A, B](fa: F[A])(f: A => G[B]))

Presenter Notes

Traverse can also express foldMap and by extension foldLeft and foldRight.

Suppose that our G is a type constructor Const that takes any type to Int, so that Const[A] throws away its type argument A and just gives us an Int:

import cats.data.Const
Const(1)
//res12: Const[Int,Nothing] = Const(1)
Const(1) map { (_: String) + "!" }
//res13: Const[Int,String]  = Const(1)

Presenter Notes

When A forms a Semigroup, an Apply is derived, and when A form a Monoid, an Applicative is derived automatically.

import cats.syntax.apply._
Const(2).retag[String => String] ap Const(1).retag[String]
//res14: Const[Int,String] = Const(3)



With a Const functor we can turn any Monoid into an Applicative, so with a Monoid[M] we should be able to get a foldMap from our traverse.

Presenter Notes

If we instantiate G to be Const[M, Nothing], traverse begins to look a lot like foldMap from Foldable

def traverse[A,B]
            (fa: F[A])
            (f: A => Const[M, Nothing]): Const[M, F[Nothing]]
def foldMap [A,B]
            (fa: F[A])
            (f: A => M): M

Presenter Notes

def foldMap[A, M: Monoid, F[_]:Traverse]
          (fa: F[A])
          (f: A => M): Const[M,F[Nothing]] =
          fa traverseU { (a: A) => Const((f(a))) }
}
foldMap(List('a', 'b', 'c')) { c: Char => c.toInt }
//res15: Const[Int,List[Nothing]] = Const(294)
foldMap(Nil) { c: Char => c.toInt }
//res16: Const[Int,List[Nothing]] = Const(0)

Note that we are using traverseU, which is the Unapply variant of traverse.

Presenter Notes

If we let our aggregator exit the Const then we get foldMap exactly:

def foldMap[A, M: Monoid, F[_]:Traverse]
          (fa: F[A])
          (f: A => M): M =
{
  val x = fa traverseU { (a: A) => Const((f(a))) }
  x.getConst
}
foldMap(List('a', 'b', 'c')) { c: Char => c.toInt }
//res17: Int = 294

Presenter Notes

Traversing lists

Here is the cats implementation of traverse for List:

def traverse[G[_], A, B]
            (fa: List[A])
            (f: A => G[B])
            (implicit G: Applicative[G]): G[List[B]] =
  foldRight[A, G[List[B]]](fa, Always(G.pure(List.empty))) {
    (a, glb) => G.map2Eval(f(a), glb)(_ :: _)
  }.value



Note that map2Eval is similar to map2 but uses Eval to allow for laziness in the second argument.

Presenter Notes

Here is a standalone implementation using ap:

def myTraverse[G[_], A, B]
            (fa: List[A])
            (f: A => G[B])
            (implicit G: Applicative[G]): G[List[B]] =
  fa.foldRight[G[List[B]]](G.pure(List.empty)) {
    val cons = (h: B, t: List[B]) => h :: t
    (a, glb) => G.ap(f(a).map(cons.curried))(glb)
  }



Recall the ap type signature: ap(f: F[A => B]): F[A] => F[B]. What is being applied here?

Presenter Notes

Exercise

Let's do a specific example where A and B are Int, and G is Option:

val l = List(1,2,3)
val cons = (h: Int, t: List[Int]) => h :: t
Apply[Option].ap(None.map(cons.curried))(l.some)
//???
def f(i: Int): Option[Int] = if(i % 2 == 0) Some(i) else None
val baz = (a: Int, glb: Option[List[Int]]) =>     
  Apply[Option].ap(f(a).map(cons.curried))(glb)
baz(1, Some(l))
//???
baz(4, Some(l))
//???

Presenter Notes

Exercise

Quick State monad refresher:

type IntState[A] = State[Int, A]
def pure[A](a: A): IntState[A]  = State(s => (s, a ))
def set(s: Int): IntState[Unit] = State(_ => (s, ()))
val l = List(1,2,3)
val a = myTraverse(l)(pure)
a.run(0).value
//???
val b = myTraverse(l)(set)
b.run(0).value
//???



We'll do a more involved example of traversing with State in the tutorial.

Presenter Notes

Traverse Laws

Traverse[T[_]] has two laws.

There are many ways to state the laws, we present one way here, but you may see them in different configurations elsewhere.

Presenter Notes

Identity

The identity law for Traverse can be stated as:

sequence[Id,A](xs) == xs

The identity law says that traversing in the identity applicative (type Id[X] = X) has no effect.

Presenter Notes

Composition

The composition law for Traverse can be stated as:

sequence[Lambda[a => F[G[a]]],A](xs) ==
  map(sequence[F,G[A]](xs))(sequence[G,A])



The fusion law says that traversal in F[_] followed by traversal in G[_] can be fused into one traversal in the composite applicative F[G[_]].

Presenter Notes

Traversable Monads

Last week we tried to implement flatMap with a monad composition:

def flatMap[A, B](fa: M1[M2[A]])(f: A => M1[M2[B]]): M1[M2[B]]



We saw that we'd end up with something of type M1[M2[M1[M2[B]]]]. This is normally irreducible,

However if M2 is traversable then we can use its sequence method to swap layers and get something of type M1[M1[M2[M2[B]]]], then use the joins from M1 and M2 to get a result of type M1[M2[B]].

Presenter Notes

The tutorial contains an exercise on combining two monads into one composite monad when the inner monad has a Traverse instance.

While the types always allow this, the result fails to meet the monad laws unless the traversable monad is also a commutative monad.

Presenter Notes

Two examples of commutative monads are Reader and Option.

Two examples of monads that are not commutative are List and State.

See the Wikipedia article on the "distributive law between monads" and a more in-depth article on n-lab.

Presenter Notes

Example: Futures

Going back to our earlier Future example, we can write an Applicative instance for Future that runs each Future concurrently.

Then when we traverse a List[A] with an A => Future[B], we perform a scatter-gather operation

Each A creates a concurrent computation that will produce a B (the scatter), and as the futures complete they will be gathered back into a list.

Presenter Notes

Example: Parsers

import cats.Semigroup
import cats.data.{NonEmptyList, OneAnd, Xor}
import cats.data.{Validated, ValidatedNel}
import cats.std.list._
import cats.syntax.traverse._
def parseIntXor(s: String): Xor[NumberFormatException, Int] =
  Xor.catchOnly[NumberFormatException](s.toInt)
val x1 = List("1", "2", "3").traverseU(parseIntXor)
//Right(List(1, 2, 3))
val x2 = List("1", "abc", "def").traverseU(parseIntXor)
//Left(NumberFormatException: For input string: "abc")

Presenter Notes

Traversal behavior is closely tied with the behavior of the underlying Applicative.

def parseIntValidated(s: String):
  ValidatedNel[NumberFormatException, Int] =
  Validated.catchOnly[NumberFormatException](s.toInt)
           .toValidatedNel   
List("1", "2", "3").traverseU(parseIntValidated)
//Valid(List(1, 2, 3))
val v3 = List("1", "abc", "def").traverseU(parseIntValidated)
//Invalid(OneAnd(NumberFormatException: For input string:
//"abc",List(NumberFormatException: For input string: "def")))

Presenter Notes

Example: Readers

Another interesting effect we can use is Reader. Recall that a Reader[E, A] is a type alias for Kleisli[Id, E, A] which is a wrapper around E => A.

If we fix E to be some sort of environment or configuration, we can use the Reader applicative in our traverse.

Presenter Notes

Imagine we have a data pipeline that processes a bunch of data, each piece of data being categorized by a topic.

Given a specific topic, we produce a Job that processes that topic:

import cats.data.Reader
trait Context
trait Topic
trait Result
type Job[A] = Reader[Context, A]
def processTopic(topic: Topic): Job[Result] = ???

Presenter Notes

We'd like to aggregate many topics (a List[Topic]) and compose the resulting Jobs together into one Job[List[Result]].

Since Reader has an Applicative instance, we can do this by traversing over the list with processTopic:

def processTopics(topics: List[Topic]) =
  topics.traverse(processTopic)
//returns Job[List[Result]]

Presenter Notes

When our new job's run method is called, it will go through each topic and run its topic-specific job, collecting results as it goes.

Note that our job's run method has type signature Context => Result, so it requires a Context before producing the value we want.

For example, in Spark the information needed to run a Spark job (where the master node is, memory allocated, etc.) resides in a SparkContext.

Presenter Notes

Going back to the above example, we can see how one may define topic-specific Spark jobs (type Job[A] = Reader[SparkContext, A]) and then run several Spark jobs on a collection of topics via traverse.

We then get back a Job[List[Result]], which is equivalent to SparkContext => List[Result].

When finally passed a SparkContext, we can run the job and get our results back.

Presenter Notes

Moreover, the fact that our aggregate job is not tied to any specific SparkContext allows us to pass in a SparkContext pointing to a production cluster, or (using the exact same job) pass in a test SparkContext that just runs locally across threads. This makes testing large jobs easy.

Finally, this encoding ensures that all the jobs for each topic run on the exact same cluster. At no point do we manually pass in or thread a SparkContext through - that is taken care for us by the (applicative) effect of Reader and therefore by traverse.

Presenter Notes

Links

The Essence of the Iterator Pattern, by Jeremy Gibbons and Bruno Oliveira. Published in Mathematically-Structured Functional Programming, 2006.

An Investigation of the Laws of Traversals, by Mauro Jaskelioff and Ondrej Rypacek, published in Mathematically-Structured Functional Programming, 2012.

Typeclassopedia is a great resource on Monad, Applicative, Traverse and other type classes.

Presenter Notes