Friday, April 29, 2016

Introduction to Caching in JavaScript and NodeJS

Disclaimer:  This article is intended for new web developers who don't know much about caching and would like to have a basic idea.  It is not a comprehensive post, nor is it supposed to accurately reflect advanced caching systems.  All of the scripts referenced in this post can be downloaded from my GitHub Repository.

When I was first starting out in web development, I heard people talk a lot about this "cash" system that you could use in websites and your code, but I honestly didn't know too much about it.  I eventually figured out it was a way to store data so you could access it faster, but I still thought it was some complicated architecture or a special system, but I was really just over thinking it.

A Cache is simply a place in memory where you can store data that is accessed often.  In the context of JavaScript, this can simply be a global variable that you put the results of an AJAX call into.  Then you create your function to look in that first, and if something is there, return that, otherwise make the original call you wanted to.  There are a lot of libraries out there that will do that for you but we're going to build a very basic caching system, and walk through each of the parts.

I'm going to use NodeJS for this blog post, but this works just as well in the browser with an AJAX request or something similar.

First, I'll create a function that calls out to the github API and returns a value.  I'll walk through the function and then add some caching to show how it's done.  Below we have the basic script, which you can run with:

node nocache.js

For this tutorial, I am using the npm library request for ease of use and am making a request to the GitHub API to query my repositories for the data about them.  This is very similar to a request you would use in Express or another web framework and I have done things very similar for other projects.  One thing to note is that the request library lets you specify the options easily for the HTTP request and the GitHub API requires a User-Agent in the header, so that's why that's there.  If you don't GitHub will return an error and reject your request.

Next on line 15 I created the function to make the request.  For this tutorial I have a bunch of logs and start and end times to track the time in milliseconds the entire request takes.  So I set the start to which just converts to a number (in ms), and then logged the start time.  The data in the body comes as a JSON string, so I parsed that and logged the name first item since it's an array of information.

Finally, there are some logs to output the ending time, and the total time elapsed for the request.  On average I get roughly 400 ms on a good connection.  And if this is inside a web request, you don't want to be adding half a second to a second to every request, on top of everything else the server is doing.  To simulate this, the script nocache_multi.js has a few setTimeouts to repeatedly call the same function.  As you can see to the right when you run it, each time you get a similar response time.

This script is the perfect location to add caching because it's not changing the request parameters and we can pretty much expect that the response will be the same every time at least most of the time.  So instead of making the request every time the function is run, I'm going to add a storage object so that I can store the response and use that when the function is called again.

In the script below, you can see I've added a very basic cache named repo_cache on line 14, and added a result field to the object to store the data.  In a bit you'll see why I split it into a separate field but for now, but you can see how it's being used below.  On line 25 I added a check to see if we had any result data, if so, we simply log the results from that data and return, otherwise we continue with the original process.  In addition I split out the logging into a separate function so we can call it from each path.  The last change I made was that when I successfully get data, to store the parsed result in the repo_cache.result object.

When you run this function, you'll see that the first request takes some time, and then the next three are almost instantaneous.  Here's what my output looked like to the right.

As you can see the first request duration was 440 ms, and the rest were zero because we had the data in memory.

So I successfully "cached" the response and had a much better response time thereafter.  But we have a problem with this.  This kind of cache isn't very useful for a couple reasons.  First is that the data is going to get stale after a while and if the web server stays running for a while the data will be inaccurate, so there needs to be some kind of way to invalidate the cache, or turn it back off.  Well that's pretty easy to do, we just need to store a timestamp of when we generated the data for the cache as well as a timeout and then if the timestamp + the timeout is less than the current time, we make a new request and refresh the cache.

Here is a snippet of the script cache_advanced.js from the GitHub repo below:

The changes I made were adding a last_updated and timeout field to the cache object, as well as checking those 2 during the cache check, and updating the last_updated field after the request.

I got the following results to the right when I ran the script.  Since the timeout was set to 1 second and the time between requests was 1 second, I was able to cache the result for a single request, and then it was invalidated and refreshed and then accessed for the final function call.

So that's at a VERY basic level what caching is.  There are a lot of things you can do to improve it and learn more about caching like:  Using Function names and parameters to create a hash to store the results in and what not.  This function was just a quick and dirty way to create a cache for a script.  So hopefully if you're new to web development, that cleared up caching a bit and makes it a little easier to understand.

Feel free to download the Github Repository and grab all the scripts here:

No comments:

Post a Comment