In this chapter, we’ll revisit some ideas we introduced previously - function purity and inherent side effects. Let’s start by remembering that pure functions can, in principle, be replaced with a lookup table.
Obviously, that is not a very practical approach with functions where there is too many input parameter combinations, e.g. a lookup table for sum of two numbers would need to be infinitely large:
a | b | sum |
---|---|---|
… | … | … |
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 2 |
1 | 2 | 3 |
2 | 1 | 3 |
2 | 2 | 4 |
2 | 3 | 5 |
… | … | … |
On the other hand, when the number of different input combinations is fairly small, this is a perfectly reasonable option. For example, if we need a function that returns month’s ordinal number in the year, given its name, there are only 12 possible inputs. So, the table is finite and rather small:
month | number |
---|---|
January | 1 |
February | 2 |
March | 3 |
April | 4 |
May | 5 |
June | 6 |
July | 7 |
August | 8 |
September | 9 |
October | 10 |
November | 11 |
December | 12 |
So, we could implement this function imperative-style using if
:
let monthByNameImperative = function(name){
if(name == "January"){
return 1;
}
if(name == "February"){
return 2;
}
//omitted for brevity... you get the point
if(name == "December"){
return 12;
}
}
Or, if we don’t like so many if
statements, we can use switch case
, but that’s going to give us pretty much
the same thing, just with a slightly nice syntax:
let monthByNameImperative = function(name){
switch(name) {
case "January":
return 1;
case "February":
return 2;
//again, omitting for brevity
case "December":
return 12;
}
}
Either way it’s tedious to type out, even with lines omitted. And it’s just as ugly to read. The alternative is to implement the lookup table:
let monthByNameLookup = function(name){
let monthsByName = {"January" : 1,
"February" : 2,
"March" : 3,
"April" : 4,
"May" : 5,
"June" : 6,
"July" : 7,
"August" : 8,
"September" : 9,
"October" : 10,
"November" : 11
"December" : 12};
return monthsByName[name];
}
monthByNameLookup("January");
//=> 1
monthByNameLookup("March");
//=> 3
monthByNameLookup("November");
//=> 11
Works fine, but our lookup table is being created every time we run the function. That’s wasteful - this is something that’s always going to be the same, we should only create it once. Let’s try doing that:
let monthsByName = {"January" : 1,
"February" : 2,
"March" : 3,
"April" : 4,
"May" : 5,
"June" : 6,
"July" : 7,
"August" : 8,
"September" : 9,
"October" : 10,
"November" : 11
"December" : 12};
let monthByNameLookup = function(name){
return monthsByName[name];
}
monthByNameLookup("January");
//=> 1
monthByNameLookup("March");
//=> 3
monthByNameLookup("November");
//=> 11
Now we defined the lookup table exactly once and we did that outside the function. We certainly fixed one problem - the table is not being created over and over again, every time the function is called. But now we have another problem:
monthByNameLookup("January");
//=> 1
monthsByName["January"] = 42;
monthByNameLookup("January");
//=> 42
Our lookup table is both exposed and mutable. That’s not good. In fact, this breaks the function purity - note that we can get two completely different results for the same input parameter.
There are two straightforward ways to solve this:
- make the lookup table immutable
- encapsulate it so that it can’t be accessed by arbitrary code.
In plain JavaScript, we can make an object immutable by using both const
and Object.freeze()
, i.e.
const monthsByName = {"January" : 1,
"February" : 2,
"March" : 3,
"April" : 4,
"May" : 5,
"June" : 6,
"July" : 7,
"August" : 8,
"September" : 9,
"October" : 10,
"November" : 11
"December" : 12};
Object.freeze(monthsByName);
let monthByNameImmutable = function(name){
return monthsByName[name];
}
monthByNameImmutable("January");
//=> 1
monthsByName["January"] = 42; //prevented by Object.freeze()
monthByNameImmutable("January");
//=> 1
monthsByName = {"January" : 42}; //prevented by const
//=> Uncaught TypeError: Assignment to constant variable.
monthByNameImmutable("January");
//=> 1
Keyword const
makes sure that variable monthsByName
cannot be assigned to point at another object and
Object.freeze()
makes sure that the object itself cannot be modified. The result is that our lookup table is
now immutable which makes the function pure.
The other thing we can do is encapsulate monthsByName
so that it is not accessible to the rest of the code.
We’ll turn our function into a closure to make that happen:
let createMonthByNameFn = function(){
let monthsByName = {"January" : 1,
"February" : 2,
"March" : 3,
"April" : 4,
"May" : 5,
"June" : 6,
"July" : 7,
"August" : 8,
"September" : 9,
"October" : 10,
"November" : 11
"December" : 12};
return function(name){
return monthsByName[name];
}
}
let monthByNameClosure = createMonthByNameFn();
monthByNameClosure("January");
//=> 1
monthByNameClosure("March");
//=> 3
monthByNameClosure("November");
//=> 11
Lookup table gets created exactly once, during function creation within createMonthByNameFn
. The table is
technically stored in a mutable data structure, but since its encapsulated within the local state of
createMonthByNameFn
function it is not accessible by any other code and thus effectively cannot be mutated.
So, the function is pure and will always behave correctly.
Of course, these two techniques are not mutually exclusive. There is nothing preventing us to use both immutable data and closures, e.g:
let createMonthByNameFn = function(){
const monthsByName = {"January" : 1,
"February" : 2,
"March" : 3,
"April" : 4,
"May" : 5,
"June" : 6,
"July" : 7,
"August" : 8,
"September" : 9,
"October" : 10,
"November" : 11
"December" : 12};
Object.freeze(monthsByName);
return function(name){
return monthsByName[name];
}
}
let monthByNameClosure = createMonthByNameFn();
All right, that’s one nice lookup table based pure function. But it’s easy to make one when the number of different inputs is finite and reasonably small. What do we do when that’s not the case?
Let’s consider one simple sumFirst
function:
let sumFirst = function(n){
if(n == 0){
return 0;
}
return n + sumFirst(n - 1);
}
This recursive function will take much longer to run for a large n
, than for a small one. Inversely, it will
be called much more often with small numbers, because of the recursion.
In order to measure this, we’ll introduce a high order function that measures the time it takes to run code
included in the function supplied in the parameter. Since evaluation times can be very short, we’ll use the built-in
performance
object, which measures time in high resolution.
let withStopwatch = function(f){
let t1 = performance.now();
let result = f();
let t2 = performance.now();
let duration = t2 - t1;
return {duration : duration,
result : result};
}
withStopwatch(function() {
return sumFirst(5);
});
//=> {duration: 0.06499997107312083, result: 15}
withStopwatch(function() {
return sumFirst(1000);
});
//=> {duration: 0.11999998241662979, result: 500500}
Note that withStopwatch
expects a function with no parameters, while sumFirst
takes one parameter.
We’re working around that mismatch by passing the function call wrapped in an anonymous function. We’ll show a
more elegant solution to this kind of problem in the next chapter when we talk about partial application.
In any case, we can see that the sum of first 1000 numbers took nearly twice the time than the sum of only first 5 numbers. If we had a lookup table, that time could be a lot shorter and it would be the same for any supplied value, no matter how large.
The only problem is that there is infinitely many possible integers we can input. Well, technically speaking, it’s not really infinite number of integers, rather a very large but finite number of them. Still, the table would need to be so big it wouldn’t be practical.
But not all inputs are equally important. In our example, inputs 2
, 5
and 10
can be considered very
likely to be passed to the function, both manually and in a recursive call from a larger number. On the other
hand number 784125
might not be something likely to be passed to the function.
So, what we’re going to do is implement a function where we start with an empty lookup table and contribute to
it every time sumFirst
is called. Then, the next time sumFirst
is called, we can look it up in the table.
If the result is there, that’s it. Otherwise, we have to calculate it and add it to the table to be used the
next time it’s called with the same value.
let createSumFirstCached = function(){
let lookup = {};
let sumFirstCached = function(n){
let existingResult = lookup[n];
if(existingResult){
return existingResult;
}
if(n == 0){
return 0;
}
let newResult = n + sumFirstCached(n - 1);
lookup[n] = newResult;
return newResult;
}
return sumFirstCached;
}
We used the same trick as before - protecting the lookup table from external tampering with a closure.
This time, however, we want the table to be mutable, since we’re building it as we go. It’s just that we want
it mutated only from within sumFirstCached
. Also note that we internally named our closure so that we could
call the cached version recursively. Let’s see how it performs:
//function is created with an empty table
let sumFirstCached = createSumFirstCached();
//first run with 5 will need to calculate the result
withStopwatch(function() {
return sumFirstCached(5);
});
//=> {duration: 0.06500002928078175, result: 15}
//but the second one will only need to look it up
withStopwatch(function() {
return sumFirstCached(5);
});
//=> {duration: 0.034999975468963385, result: 15}
//also, first run with 1000 needs to calculate it all
//well, not all of it, but only down to 5, as that
//result was just cached during the first call to sumFirstCached(5)
withStopwatch(function() {
return sumFirstCached(1000);
});
//=> {duration: 0.24500000290572643, result: 500500}
//the second call only looks up the result
withStopwatch(function() {
return sumFirstCached(1000);
});
//=> {duration: 0.020000035874545574, result: 500500}
Looks ok, but the code is a bit messy. We’ve got the function logic (sum of first n
numbers) mixed together
with the caching logic. If we separate those two things, we’ll have a much more readable code and we’ll be
able to reuse the caching algorithm with other functions as well. We’ll do this by implementing a higher order
function that accepts a pure function as its parameter and returns a cached version of the same function.
For the sake of simplicity, we’ll assume that the passed function always takes exactly one parameter.
let memoize = function(f){
let lookup = {};
return function(a){
let existingResult = lookup[a];
if(existingResult){
return existingResult;
}
let newResult = f(a);
lookup[a] = newResult;
return newResult;
}
}
So far we kept saying caching, but now all the sudden our function is called memoize
. What exactly is
memoization?
Memoization is an optimization technique used to speed up evaluation of expensive pure functions by storing the result in a data structure to be used upon any repeated function call with the same parameters.
Practically, it’s a very specific application of caching, where the resource being cached is a function.
So, let’s re-run our sumFirst
example with memoize
:
let sumFirstMemoized = memoize(sumFirst);
withStopwatch(function() {
return sumFirstMemoized(5000);
});
//=> {duration: 2.915000019595027, result: 12502500}
withStopwatch(function() {
return sumFirstMemoized(5000);
});
//=> {duration: 0.04000001400709152, result: 12502500}
withStopwatch(function() {
return sumFirstMemoized(5000);
});
//=> {duration: 0.05000003054738045, result: 12502500}
Seems to be working fine, but there is one small issue when we try to run it with a slightly larger input:
withStopwatch(function() {
return sumFirstMemoized(5001);
});
//=> {duration: 0.2799999751150608, result: 12507501}
We get the full calculation time even though the result for 5000
is already cached. Ideally, our memoized
function would use that value and just add 5001
on top of it. We had this logic implemented in sumFirstCached
,
note that we’re calling sumFirstCached
in the recursive call.
Unfortunately, that’s not the case in with our generic memoize
since it just calls whatever function it was
supplied with and our sumFirst
function just calls itself as it is - un-memoized.
But there is a trick that we can use to get around it if we break the rules a bit. Let’s take another look at
sumFirst
:
let sumFirst = function(n){
if(n == 0){
return 0;
}
return n + sumFirst(n - 1);
}
We say that it recursively calls itself, but what it really does is call whatever function is referred by
variable called sumFirst
. So far we never intentionally mutated the value of this variable, i.e. made it
point to a different function. This time, that’s exactly what we’re doing.
//step 1 - define recursive base function
let sumFirst = function(n){
if(n == 0){
return 0;
}
return n + sumFirst(n - 1);
}
//step 2 - create a memoized function and overwrite the original
// function to point to the memoized version
sumFirst = memoize(sumFirst);
//test step 1 - run it once with 5000 to create the results
withStopwatch(function() {
return sumFirst(5000);
});
//=> {duration: 1.9950000569224358, result: 12502500}
//test step 2 - run it again with 5000 to check if it was cached
withStopwatch(function() {
return sumFirst(5000);
});
//=> {duration: 0.04999997466802597, result: 12502500}
//test step 3 - run it with 5001 and see that it evaluated
// quickly even though we never called it with this parameter.
// that's because with the value for 5000 already known,
// this result boils down to a simple addition.
withStopwatch(function() {
return sumFirst(5001);
});
//=> {duration: 0.059999991208314896, result: 12507501}
This is typically only a problem with recursive functions and only if we want to help evaluation with partial solutions. Otherwise, assigning the memoized function to a new variable is sufficient.
The final question we’re going to question is whether or not memoized function is pure. The algorithm depends on a lookup table that’s being mutated, so from that point of view, we might think that the answer is a simple no, but it turns out it’s more complicated than that. It really depends on how we implement the memoization algorithm.
Let’s refresh on purity conditions. In chapter 3.1 Pure functions we said:
Pure function is a function where the return value is:
- only determined by its input values - check, whether the result was cached or not, we’re always going to get the same value for the same inputs, as long as the original function was pure to begin with.
- without observable side effects - at the first glance check, again, as long as the original function was pure.
In other words, evaluation time and memory consumption are inherent side effects of every piece of code that runs on an actual computer. We can’t get away from that. By using memoization, we only get to trade one for the other, i.e. we save some time by consuming more memory.
However, our current implementation of memoize
has one big drawback. It only ever adds values to the lookup
table and never removes them. That wouldn’t be a problem in a world where computers have infinite memory.
But in the universe we’re stuck in, that’s eventually going to eat up all the available RAM and simply crash
the application. A spontaneous app crash after, say, hours or days or months of runtime
can be considered a side effect. And a bad one, too.
The simplest thing we can do to fix this is introduce a capacity constraint on the lookup table, and then make
sure that some entries are removed when that capacity is reached. We can illustrate that point with a naive
implementation of memoize
with capacity management:
let memoize = function(f){
let lookup = {};
let cleanup = function(){
let keys = Object.keys(lookup);
if(keys.length > 1000){
let keysToRemove = keys.slice(0, 10);
for(let i = 0; i < keysToRemove.length; i++){
delete lookup[keysToRemove[i]];
}
}
}
return function(a){
let existingResult = lookup[a];
if(existingResult){
return existingResult;
}
let newResult = f(a);
lookup[a] = newResult;
cleanup();
return newResult;
}
}
We added cleanup
, a local function accessible only from the closure, that runs every time a new entry is
added to the lookup table. This function checks if the total number of entries in the table exceeded its
capacity and, if so, removes some entries from it. This is a very naive cleanup implementation that does spend
too much effort to determine which elements to remove, it just deletes the first 10 entries it finds in the
object. A better implementation could possibly remove least recently used entries or perhaps least commonly
used ones.
Nevertheless, this small addition did put an upper bound on memory usage and that made our memoization produce pure functions!
Now that we are familiar with both memoization and lazy evaluation, we can re-visit the example from previous
chapter and see how those two powerful concepts fit together. As a reminder, we had a higher order function
wrapArray
that takes an array and a function and returns a data structure where every element is the result
of the provided function applied on the respective element of the array.
The issue we had with the existing implementation was that each time we accessed a particular element, the function would calculate the same result over and over. We can see evidence of the function running multiple times if we add some side effect to it, e.g. console output:
let numbers = [3, 4, 1, 7, 3, 10];
let wrapArray = function(array, f){
return {length: array.length,
get: function(index){
return f(array[index]);
}}
}
let double = function(n){
console.log("Doubling number " + n);
return n * 2;
}
let doubles = wrapArray(numbers, double);
doubles.get(0);
//=> Doubling number 3
//=> 6
doubles.get(0);
//=> Doubling number 3
//=> 6
doubles.get(0);
//=> Doubling number 3
//=> 6
Ideally, we’d only need to calculate double(3)
the first time we try running doubles.get(0)
and then cache
that result for future reference. Or in other words, we’d need to memoize the wrapping function:
let wrapArrayCached = function(array, f){
let memoizedFunction = memoize(f);
return {length: array.length,
get: function(index){
return memoizedFunction(array[index]);
}}
}
let doubles = wrapArrayCached(numbers, double);
doubles.get(0);
//=> Doubling number 3
//=> 6
doubles.get(0);
//=> 6
doubles.get(0);
//=> 6
And just like that our lazy data structure now has result caching. This concludes the chapter about memoization. In next chapter we will play with function arity a bit when we explore partial application and currying.