Why Functional Programming Matters

Introduction
What is Functional Programming?
Modularity in Software
A Better Description of Functional Programming
- Higher Order Functions
  - Example: reduce
- Lazy Evaluation
What is functional Programming? (revisited)
Conclusion (my thoughts)

Introduction

Why Functional Programming Matters is a paper by John Hughes written in 1984, and was slightly revised in 1989. It is an introductory paper that describes the strengths of functional programming and argues that it is vitally important to the real world. These are my notes I took while reading the paper. Some examples are skipped, and the ones I do cover are hand wavey at best in pseudo language. Hopefully I cover the main points, but refer to the paper if the examples are not clear.

What is Functional Programming?

When trying to explain what functional programming is, we typically provide the following properties:

There is no assignment, so there are no variables.
Your code has no side effects.
Functions are referentially transparant, meaning functions can be evaluated at any time and they will always evaluate to the same value.
- I think this might just be another way of stating that there is no assignment and no side effects.

The problem with this explaination is that these properties only explain what funtional programming is not, not what it is. How can a programmer take that explanation and strive to write functional code? Can one not do assignment particularly well, or not write code with side effects particularly well? This explanation doesn't give a programmer something to strive for in their code. Can we come up with a better explanation of what functional programming is that can give a programmer something to strive for and explain why it is important?

Modularity in Software

It's accepted that modularity is key to writing software, especially as it grows in complexity. Modularity brings about productivity improvements, such as speed, code reuse, and testability. When solving a problem, it is solved by first breaking it up into sub-problems, the sub-problems into their own sub-problems, and then gluing those solutions together.

Functional programming provides important ways to modularize these sub-problems and glue them together into a program that is not possible in traditional languages.

A Better Description of Functional Programming

So what are these types of glue that functional programming provides? This is what makes functional programming unique, so by taking a look at how functional programs are modularized we can come up with a better description.

Higher Order Functions

Higher order functions allow simple functions to be glued together to form more complex ones. They are functions that have a function as a parameter and/or return a function.

Example: reduce

We can start with one function that we'll call reduce, which is used to process lists. reduce is a function that recursively iterates through a list, applies a specialized function to each element to the list, and returns the value of the applied function up the recursive stack. Reduce looks like

(reduce f value)

Where f is a specializing function, and value is the base case value for the recursion. When applied to f and value, reduce returns a function awaiting a list. We can use reduce to create a function that sums up a list of numbers.

sum = reduce add 0
sum [1 2 3 4] // evaluates to 10

The call stack looks like (add 1(add 2 (add 3 (add 4 0)))).

We can use reduce to create a copy of a list.

copy = reduce cons []
copy [1 2 3 4] // evaluates to [1 2 3 4]

The function cons takes two arguments, a value and a list, and puts the value on the front of the list. And for the base case of the recursion, we give it an empty list. The call stack looks like (cons 1 (cons 2 (cons 3 (cons 4 [])))) which gives us the list [1 2 3 4].

We can use reduce to implement map. Map applies a function to each element of a list, and returns a new list of the values of those applied functions.

fandcons = cons . f
map = reduce fandcons []

double x = 2 * x
map double [1 2 3 4] // evaluates to [2 4 6 8]

Note: The "⋅" operator is function composition. (f ⋅ g) h is equivalent to f (g h).

Reduce shows us how you can glue modular functions together to create more powerful functions. Reduce is a generic function that knows how to operate on a list, and it applies the provided specializing function to each element of the list.

The ideas here can also be applied to other data structures other than lists. The paper covers redtree, a function similar to reduce except that it operates on trees. With redtree we can implement sumtree and maptree very similar to how we did with lists.

Whenever a new data structure is defined, new higher order functions should be written to process it. This makes it easy to process and encapsulates the details of the implementation.

Lazy Evaluation

Lazy evaluation is another kind of glue that functional programming provides. Let's say you have two functions, f and g. f computes output, which is provided to g. They are run in strict synchronization, and f only computes output when g tries to read some input. This is called lazy evaluation, and prevents large amounts of data to be stored in memory or written to a file as a temporary buffer.

A neat feature resulting from this is that f can be non-terminating, producing infinate output for g. Since f will be terminated when g is done reading output, the termination condition is separated from the loop body which generates the output. You can therefore modularize a program into a generator, which generates a large number of possible answers, and a selector which chosses appropriate ones.

Example: Square Root Algorithm

We can take advantage of this by implementing the Newton-Raphson square roots algorithm. The algorithm works by starting with an initial approximation for the square root of a number which we'll call a₀. From that initial approximation, we can come up with better and better ones using the rule:

a(n+1) = (a(n) + N/a(n)) / 2

If the approximations approach some limit a, then a is the square root of N. If we provide a tolerance and stop the approximation once two successive approximations differ by less than the tolerance, we can approximate the square root. Using a typical programming language it would be programmed iteratively in a loop like the following:

sqrt(tol, a0, N) {
  x = a0
  y = a0 + 2*tol
  while(true) {
    if (abs(x - y) < tol) {
      return x;
    }
    y = x;
    x = (x + N/x) / 2;
  }
}

Unfortunately, the sqrt function is indivisible when written like this. Using lazy evaluation we can write it in a modular fashion and reuse those parts in other code.

First we define the function next which computes an approximation when given the previous one:

next N x = (x + N/x) / 2

Next we define a function repeat which will repeatedly call a function with the result of the previous call to the function in order to build a list:

repeat f a = cons a (repeat f (f a))

Then we can build our list of square root approximations:

repeat (next N) a0

Our list is an example of an infinite list. It can potentially create output forever. But because of lazy evaluation, approximations will only be computed as needed.

The last part we need is the function within, which is an example of a selector. The selector is a function which will iterate through the infinite list until subsequent approximations are within the tolerance.

within tol (cons a (cons b rest)) = b if abs(a - b) <= tol
                                  = within tol (cons b rest) otherwise

Now we can glue these 3 functions into a new function which approximates the square root of a number.

sqrt a0 tol N = within eps (repeat (next N) a0)

It's clear how much more modular this is than the original solution. We've broke it down into 3 sub-problems: a function that creates an approximation (next), a function that repeatedly calls it (repeat), and a function that tests for the termination condition (within) and glued them together to solve a specific problem. We can leverage those functions in many other solutions outside of approximating square roots.

Example: Game Trees

In artificial intelligence there's an algorithm that estimates how good of a position a game player is in, called the alpha-beta heuristic. It looks ahead to see how the game might develop, but avoids looking ahead on moves that are not beneficial. To implement the alpha-beta heuristic we can use a game tree to represent the state of a game.

The tree contains the current game state at the root, and its childen are the possible moves from the current state to another state.

Lazy evaluation is important here because sometimes game trees are infinite in size or too big to hold in memory.

Lazy evaluation allows you to evaluate portions of the tree at a time, and when those portions have been evaluated, they are garbage collected. Only the portion being evaluated needs to be in memory at a given time which makes it possible to operate on huge, or even infinite trees.

The implementation of the alpha-beta heuristic plays out similar to the square root algorithm, so refer to the paper for the details. What's interesting though, is that the paper goes through several iterations of optimization on the algorithm which are great examples of the type of modularization made possible by higher order functions and lazy evaluation.

Additional Examples

The paper also walk through a couple of more examples, including numerical differentiation through approximation and numerical integration through approximation.

What is functional Programming? (revisited)

A better way to describe functional programming, which also provides something for the programmer to strive for, is the unique forms of modularization that it provides. Higher order functions and lazy evaluation both provide very powerful forms of modularization that allows a problem to be divided into it's sub-problems and glued back together. These solutions are often more reusable for other pieces of code than the solutions you would get through traditional programming languages. In this paper you've seen several of them, such as reduce, map, retree, maptree, repeat, and within.

If you look back at the examples, you can see that our earlier definition still holds true as well. They contain no assignment, are referentially transparent, and they have no side effects.

Conclusion (my thoughts)

After reading through this paper, here's my attempt to put words in Hughes' mouth and imagine how he would define functional programming (this is not from the paper):

Functional programming is a style which allows one to solve problems by breaking them up into functions that solve it's subproblems, and gluing those functions together to solve the original problem. Functional programming provides two important types of glue: higher order functions and lazy evaluation. These help modularize your solutions into decoupled components that can be reused in many other problems. In a way not possible in traditional languages.

But what are traditional languages? Object oriented languages today are adopting more and more features from functional languages. If higher order functions and lazy evaluation are the only requirements, C# should be considered a functional language. But if you use the negative definition, C# allows assignment and the mutation of state so therefore it isn't functional. Even some languages considered to be functional such as Lisp also allow mutation.

Maybe the line between functional and traditional languages has blurred a little bit, but it's clear that higher order functions and lazy evaluation are very powerful, and as Hughes argues in his paper, they still matter.