In this chapter, we’ll revisit some ideas we introduced previously - function purity and inherent side effects. Let’s start by remembering that pure functions can, in principle, be replaced with a lookup table.

Obviously, that is not a very practical approach with functions where there is too many input parameter combinations, e.g. a lookup table for sum of two numbers would need to be infinitely large:

a b sum
0 0 0
0 1 1
1 0 1
1 1 2
1 2 3
2 1 3
2 2 4
2 3 5

On the other hand, when the number of different input combinations is fairly small, this is a perfectly reasonable option. For example, if we need a function that returns month’s ordinal number in the year, given its name, there are only 12 possible inputs. So, the table is finite and rather small:

month number
January 1
February 2
March 3
April 4
May 5
June 6
July 7
August 8
September 9
October 10
November 11
December 12

So, we could implement this function imperative-style using if:

let monthByNameImperative = function(name){
  if(name == "January"){
    return 1;
  }
  if(name == "February"){
    return 2;
  }
  //omitted for brevity... you get the point
  if(name == "December"){
    return 12;
  }
}

Or, if we don’t like so many if statements, we can use switch case, but that’s going to give us pretty much the same thing, just with a slightly nice syntax:

let monthByNameImperative = function(name){
  switch(name) {
    case "January":
      return 1;
    case "February":
      return 2;
    //again, omitting for brevity
    case "December":
      return 12;
  }
}

Either way it’s tedious to type out, even with lines omitted. And it’s just as ugly to read. The alternative is to implement the lookup table:

let monthByNameLookup = function(name){
  let monthsByName = {"January"   : 1,
                      "February"  : 2,
                      "March"     : 3,
                      "April"     : 4,
                      "May"       : 5,
                      "June"      : 6,
                      "July"      : 7,
                      "August"    : 8,
                      "September" : 9,
                      "October"   : 10,
                      "November"  : 11
                      "December"  : 12};
  return monthsByName[name];
}

monthByNameLookup("January");
//=> 1

monthByNameLookup("March");
//=> 3

monthByNameLookup("November");
//=> 11

Works fine, but our lookup table is being created every time we run the function. That’s wasteful - this is something that’s always going to be the same, we should only create it once. Let’s try doing that:


let monthsByName = {"January"   : 1,
                    "February"  : 2,
                    "March"     : 3,
                    "April"     : 4,
                    "May"       : 5,
                    "June"      : 6,
                    "July"      : 7,
                    "August"    : 8,
                    "September" : 9,
                    "October"   : 10,
                    "November"  : 11
                    "December"  : 12};

let monthByNameLookup = function(name){
  return monthsByName[name];
}

monthByNameLookup("January");
//=> 1

monthByNameLookup("March");
//=> 3

monthByNameLookup("November");
//=> 11

Now we defined the lookup table exactly once and we did that outside the function. We certainly fixed one problem - the table is not being created over and over again, every time the function is called. But now we have another problem:

monthByNameLookup("January");
//=> 1

monthsByName["January"] = 42;
monthByNameLookup("January");
//=> 42

Our lookup table is both exposed and mutable. That’s not good. In fact, this breaks the function purity - note that we can get two completely different results for the same input parameter.

There are two straightforward ways to solve this:

  • make the lookup table immutable
  • encapsulate it so that it can’t be accessed by arbitrary code.

In plain JavaScript, we can make an object immutable by using both const and Object.freeze(), i.e.

const monthsByName = {"January"   : 1,
                      "February"  : 2,
                      "March"     : 3,
                      "April"     : 4,
                      "May"       : 5,
                      "June"      : 6,
                      "July"      : 7,
                      "August"    : 8,
                      "September" : 9,
                      "October"   : 10,
                      "November"  : 11
                      "December"  : 12};
Object.freeze(monthsByName);

let monthByNameImmutable = function(name){
  return monthsByName[name];
}

monthByNameImmutable("January");
//=> 1

monthsByName["January"] = 42; //prevented by Object.freeze()
monthByNameImmutable("January");
//=> 1

monthsByName = {"January" : 42}; //prevented by const
//=> Uncaught TypeError: Assignment to constant variable.
monthByNameImmutable("January");
//=> 1

Keyword const makes sure that variable monthsByName cannot be assigned to point at another object and Object.freeze() makes sure that the object itself cannot be modified. The result is that our lookup table is now immutable which makes the function pure.

The other thing we can do is encapsulate monthsByName so that it is not accessible to the rest of the code. We’ll turn our function into a closure to make that happen:

let createMonthByNameFn = function(){
  let monthsByName = {"January"   : 1,
                      "February"  : 2,
                      "March"     : 3,
                      "April"     : 4,
                      "May"       : 5,
                      "June"      : 6,
                      "July"      : 7,
                      "August"    : 8,
                      "September" : 9,
                      "October"   : 10,
                      "November"  : 11
                      "December"  : 12};
  return function(name){
    return monthsByName[name];
  }
}

let monthByNameClosure = createMonthByNameFn();

monthByNameClosure("January");
//=> 1

monthByNameClosure("March");
//=> 3

monthByNameClosure("November");
//=> 11

Lookup table gets created exactly once, during function creation within createMonthByNameFn. The table is technically stored in a mutable data structure, but since its encapsulated within the local state of createMonthByNameFn function it is not accessible by any other code and thus effectively cannot be mutated. So, the function is pure and will always behave correctly.

Of course, these two techniques are not mutually exclusive. There is nothing preventing us to use both immutable data and closures, e.g:

let createMonthByNameFn = function(){
  const monthsByName = {"January"   : 1,
                        "February"  : 2,
                        "March"     : 3,
                        "April"     : 4,
                        "May"       : 5,
                        "June"      : 6,
                        "July"      : 7,
                        "August"    : 8,
                        "September" : 9,
                        "October"   : 10,
                        "November"  : 11
                        "December"  : 12};
  Object.freeze(monthsByName);
  return function(name){
    return monthsByName[name];
  }
}

let monthByNameClosure = createMonthByNameFn();

All right, that’s one nice lookup table based pure function. But it’s easy to make one when the number of different inputs is finite and reasonably small. What do we do when that’s not the case?

Let’s consider one simple sumFirst function:

let sumFirst = function(n){
  if(n == 0){
    return 0;
  }
  return n + sumFirst(n - 1);
}

This recursive function will take much longer to run for a large n, than for a small one. Inversely, it will be called much more often with small numbers, because of the recursion.

In order to measure this, we’ll introduce a high order function that measures the time it takes to run code included in the function supplied in the parameter. Since evaluation times can be very short, we’ll use the built-in performance object, which measures time in high resolution.

let withStopwatch = function(f){
  let t1 = performance.now();
  let result = f();
  let t2 = performance.now();
  let duration = t2 - t1;
  return {duration : duration,
          result   : result};
}

withStopwatch(function() {
               return sumFirst(5);
               });
//=> {duration: 0.06499997107312083, result: 15}

withStopwatch(function() {
              return sumFirst(1000);
              });
//=> {duration: 0.11999998241662979, result: 500500}

Note that withStopwatch expects a function with no parameters, while sumFirst takes one parameter. We’re working around that mismatch by passing the function call wrapped in an anonymous function. We’ll show a more elegant solution to this kind of problem in the next chapter when we talk about partial application.

In any case, we can see that the sum of first 1000 numbers took nearly twice the time than the sum of only first 5 numbers. If we had a lookup table, that time could be a lot shorter and it would be the same for any supplied value, no matter how large.

The only problem is that there is infinitely many possible integers we can input. Well, technically speaking, it’s not really infinite number of integers, rather a very large but finite number of them. Still, the table would need to be so big it wouldn’t be practical.

But not all inputs are equally important. In our example, inputs 2, 5 and 10 can be considered very likely to be passed to the function, both manually and in a recursive call from a larger number. On the other hand number 784125 might not be something likely to be passed to the function.

So, what we’re going to do is implement a function where we start with an empty lookup table and contribute to it every time sumFirst is called. Then, the next time sumFirst is called, we can look it up in the table. If the result is there, that’s it. Otherwise, we have to calculate it and add it to the table to be used the next time it’s called with the same value.

let createSumFirstCached = function(){
  let lookup = {};
  let sumFirstCached = function(n){
    let existingResult = lookup[n];
    if(existingResult){
      return existingResult;
    }
    if(n == 0){
      return 0;
    }
    let newResult = n + sumFirstCached(n - 1);
    lookup[n] = newResult;
    return newResult;
  }
  return sumFirstCached;
}

We used the same trick as before - protecting the lookup table from external tampering with a closure. This time, however, we want the table to be mutable, since we’re building it as we go. It’s just that we want it mutated only from within sumFirstCached. Also note that we internally named our closure so that we could call the cached version recursively. Let’s see how it performs:

//function is created with an empty table
let sumFirstCached = createSumFirstCached();

//first run with 5 will need to calculate the result
withStopwatch(function() {
               return sumFirstCached(5);
               });
//=> {duration: 0.06500002928078175, result: 15}

//but the second one will only need to look it up
withStopwatch(function() {
               return sumFirstCached(5);
               });
//=> {duration: 0.034999975468963385, result: 15}

//also, first run with 1000 needs to calculate it all
//well, not all of it, but only down to 5, as that
//result was just cached during the first call to sumFirstCached(5)
withStopwatch(function() {
              return sumFirstCached(1000);
              });
//=> {duration: 0.24500000290572643, result: 500500}

//the second call only looks up the result
withStopwatch(function() {
              return sumFirstCached(1000);
              });
//=> {duration: 0.020000035874545574, result: 500500}

Looks ok, but the code is a bit messy. We’ve got the function logic (sum of first n numbers) mixed together with the caching logic. If we separate those two things, we’ll have a much more readable code and we’ll be able to reuse the caching algorithm with other functions as well. We’ll do this by implementing a higher order function that accepts a pure function as its parameter and returns a cached version of the same function. For the sake of simplicity, we’ll assume that the passed function always takes exactly one parameter.

let memoize = function(f){
  let lookup = {};
  return function(a){
    let existingResult = lookup[a];
    if(existingResult){
      return existingResult;
    }
    let newResult = f(a);
    lookup[a] = newResult;
    return newResult;
  }
}

So far we kept saying caching, but now all the sudden our function is called memoize. What exactly is memoization?

Memoization is an optimization technique used to speed up evaluation of expensive pure functions by storing the result in a data structure to be used upon any repeated function call with the same parameters.

Practically, it’s a very specific application of caching, where the resource being cached is a function. So, let’s re-run our sumFirst example with memoize:

let sumFirstMemoized = memoize(sumFirst);

withStopwatch(function() {
               return sumFirstMemoized(5000);
               });
//=> {duration: 2.915000019595027, result: 12502500}

withStopwatch(function() {
               return sumFirstMemoized(5000);
               });
//=> {duration: 0.04000001400709152, result: 12502500}

withStopwatch(function() {
               return sumFirstMemoized(5000);
               });
//=> {duration: 0.05000003054738045, result: 12502500}

Seems to be working fine, but there is one small issue when we try to run it with a slightly larger input:

withStopwatch(function() {
               return sumFirstMemoized(5001);
               });
//=> {duration: 0.2799999751150608, result: 12507501}

We get the full calculation time even though the result for 5000 is already cached. Ideally, our memoized function would use that value and just add 5001 on top of it. We had this logic implemented in sumFirstCached, note that we’re calling sumFirstCached in the recursive call.

Unfortunately, that’s not the case in with our generic memoize since it just calls whatever function it was supplied with and our sumFirst function just calls itself as it is - un-memoized.

But there is a trick that we can use to get around it if we break the rules a bit. Let’s take another look at sumFirst:

let sumFirst = function(n){
  if(n == 0){
    return 0;
  }
  return n + sumFirst(n - 1);
}

We say that it recursively calls itself, but what it really does is call whatever function is referred by variable called sumFirst. So far we never intentionally mutated the value of this variable, i.e. made it point to a different function. This time, that’s exactly what we’re doing.

//step 1 - define recursive base function
let sumFirst = function(n){
  if(n == 0){
    return 0;
  }
  return n + sumFirst(n - 1);
}

//step 2 - create a memoized function and overwrite the original
// function to point to the memoized version
sumFirst = memoize(sumFirst);

//test step 1 - run it once with 5000 to create the results
withStopwatch(function() {
               return sumFirst(5000);
               });
//=> {duration: 1.9950000569224358, result: 12502500}

//test step 2 - run it again with 5000 to check if it was cached
withStopwatch(function() {
               return sumFirst(5000);
               });
//=> {duration: 0.04999997466802597, result: 12502500}

//test step 3 - run it with 5001 and see that it evaluated
// quickly even though we never called it with this parameter.
// that's because with the value for 5000 already known,
// this result boils down to a simple addition.
withStopwatch(function() {
               return sumFirst(5001);
               });
//=> {duration: 0.059999991208314896, result: 12507501}

This is typically only a problem with recursive functions and only if we want to help evaluation with partial solutions. Otherwise, assigning the memoized function to a new variable is sufficient.

The final question we’re going to question is whether or not memoized function is pure. The algorithm depends on a lookup table that’s being mutated, so from that point of view, we might think that the answer is a simple no, but it turns out it’s more complicated than that. It really depends on how we implement the memoization algorithm.

Let’s refresh on purity conditions. In chapter 3.1 Pure functions we said:

Pure function is a function where the return value is:

  1. only determined by its input values - check, whether the result was cached or not, we’re always going to get the same value for the same inputs, as long as the original function was pure to begin with.
  2. without observable side effects - at the first glance check, again, as long as the original function was pure.

In other words, evaluation time and memory consumption are inherent side effects of every piece of code that runs on an actual computer. We can’t get away from that. By using memoization, we only get to trade one for the other, i.e. we save some time by consuming more memory.

However, our current implementation of memoize has one big drawback. It only ever adds values to the lookup table and never removes them. That wouldn’t be a problem in a world where computers have infinite memory. But in the universe we’re stuck in, that’s eventually going to eat up all the available RAM and simply crash the application. A spontaneous app crash after, say, hours or days or months of runtime can be considered a side effect. And a bad one, too.

The simplest thing we can do to fix this is introduce a capacity constraint on the lookup table, and then make sure that some entries are removed when that capacity is reached. We can illustrate that point with a naive implementation of memoize with capacity management:

let memoize = function(f){
  let lookup = {};
  let cleanup = function(){
    let keys = Object.keys(lookup);
    if(keys.length > 1000){
      let keysToRemove = keys.slice(0, 10);
      for(let i = 0; i < keysToRemove.length; i++){
        delete lookup[keysToRemove[i]];
      }
    }
  }
  return function(a){
    let existingResult = lookup[a];
    if(existingResult){
      return existingResult;
    }
    let newResult = f(a);
    lookup[a] = newResult;
    cleanup();
    return newResult;
  }
}

We added cleanup, a local function accessible only from the closure, that runs every time a new entry is added to the lookup table. This function checks if the total number of entries in the table exceeded its capacity and, if so, removes some entries from it. This is a very naive cleanup implementation that does spend too much effort to determine which elements to remove, it just deletes the first 10 entries it finds in the object. A better implementation could possibly remove least recently used entries or perhaps least commonly used ones.

Nevertheless, this small addition did put an upper bound on memory usage and that made our memoization produce pure functions!

Now that we are familiar with both memoization and lazy evaluation, we can re-visit the example from previous chapter and see how those two powerful concepts fit together. As a reminder, we had a higher order function wrapArray that takes an array and a function and returns a data structure where every element is the result of the provided function applied on the respective element of the array.

The issue we had with the existing implementation was that each time we accessed a particular element, the function would calculate the same result over and over. We can see evidence of the function running multiple times if we add some side effect to it, e.g. console output:

let numbers = [3, 4, 1, 7, 3, 10];

let wrapArray = function(array, f){
  return {length: array.length,
          get:    function(index){
                    return f(array[index]);
          }}
}

let double = function(n){
  console.log("Doubling number " + n);
  return n * 2;
}

let doubles = wrapArray(numbers, double);

doubles.get(0);
//=> Doubling number 3
//=> 6
doubles.get(0);
//=> Doubling number 3
//=> 6
doubles.get(0);
//=> Doubling number 3
//=> 6

Ideally, we’d only need to calculate double(3) the first time we try running doubles.get(0) and then cache that result for future reference. Or in other words, we’d need to memoize the wrapping function:

let wrapArrayCached = function(array, f){
  let memoizedFunction = memoize(f);
  return {length: array.length,
          get:    function(index){
                    return memoizedFunction(array[index]);
          }}
}

let doubles = wrapArrayCached(numbers, double);

doubles.get(0);
//=> Doubling number 3
//=> 6
doubles.get(0);
//=> 6
doubles.get(0);
//=> 6

And just like that our lazy data structure now has result caching. This concludes the chapter about memoization. In next chapter we will play with function arity a bit when we explore partial application and currying.