Caching Data in a Distributed System
By: Jared White, Senior Software Developer, SPECTRUM
Cached Data – Data held in working memory of software.
Distributed System – Software installed and run in separate environments but working together as a single system.
SPECTRUM is a large enterprise application that provides massive amounts of functionality in the financial sector.
SPECTRUM can be installed on a single application server. But because of large processing volume or the required system up time that some of our SPECTRUM users require, a single server is not adequate. In these cases SPECTRUM is installed and run in a clustered environment (distributed system).
Cached data is data that is proactively stored in working memory. This is usually done to speed up processing of individual system requests. An example of this data may be a list of states in the USA, a list of web pages in the application, or a list of valid account status code. Long term storage of these lists is maintained in a database, and when the application needs these lists it gets them from the database. Database operations can be very time consuming which will in turn impact application performance. In order to speed up performance, this data can be cached in working memory.
Think of it like this – the application is one office and the database is another office. When the application needs data, a worker leaves the application office, walks to the database office, gets the data, and returns the information back to the application office. After the information is used it is put in the trash. Now repeat that one-million times. The application office is now full and we need to take out the trash. Data is lost.
Now let us work smarter. We get a filing cabinet to hold data that we will use frequently. After we get the list of account statuses from the database we put that list in the filing cabinet. Now when this list is needed we go to the filing cabinet instead of the database office. We just saved a lot of time. The filing cabinet is the cache. (And really, when an application goes to the database it takes a really long time compared to looking it up in local memory.)
Now the application office has become very busy, so we create a duplicate application office. The new application office has its own filing cabinet for frequently used data. Everything is working well, until one office modifies the list of account status code in its filing cabinet. Now the application offices have different data and application offices don’t work the same. This is a problem. The two application offices represent a clustered environment. We call each application instance a node.
We need to find a way to keep data in the two application nodes the same. When data is changed in one node (office) how do we keep it the same in the other node (office)? There needs to be a mechanism to keep each node’s cached data the same (in sync).
There are open source and paid-for software available that can be used to keep cached data in sync – but in our experience, this software is heavy and difficult to configure and maintain. So we wanted to create a simple solution that would work well in SPECTRUM.
Since we have multiple nodes but only one database, we could update the database. Then each of the nodes should periodically check the database for any changes to cached data. The problem with this is that the cached data will be out of sync between the time the database is updated and the time it checks for the update.
Because the database is our long term storage, the database must be updated when the data is changed. When the application is shut down the data is stored in the database until the application starts and gets the data again. But how can we instantly update each node when the data is changed on one of the other nodes? Answer, we should notify each node when the data is modified.
Going back to the office metaphor, when https://en.wikipedia.org/wiki/Cache_(computing)cached data is modified, a worker leaves the application office and takes the modification to the database. In order to keep the other nodes in sync, we need to send another worker to notify the other application offices that a change has been made. In software we can do that with a message system. When a node gets a message that cached data has changed, it will send a worker to the database to get the updated data. This will shorten the time that the data is out of sync to milliseconds.
Now we can scale our application to handle large amounts of processing by adding new application nodes. We can also keep our cached data in sync through notifications in a message system to keep processing on each node fast as well.
This is a high-level description of how we do this in SPECTRUM.
Cached data must be in a class that implements the SPECTRUMCache interface, and is registered with the SPECTRUMCacheManager.
The SPECTRUMCacheManager is responsible for maintaining a list of SPECTRUMCache objects. It is also responsible for communication between nodes. When a SPECTRUMCache object is updated, it calls refreshRemote on itself. Then refreshRemote calls refreshOtherNodes on the SPECTRUMCacheManager. The SPECTRUMCacheManager will then send a message to the RabbitMQ server. The other nodes subscribe to RabbitMQ and receive all messages that notify of cached data changes. When the message comes in the SPECTRUMCacheManager finds the appropriate SPECTRUMCache object and calls refreshLocal. The SPECTRUMCache object then refreshes its own data.
Initialize a SPECTRUMCache object
Refresh from the modified SPECTRUMCache object
ReferenceRoot is a SPECTRUMCache object
Refresh message is received