In-Memory, Distributed Data Structures in Node.js with Hazelcast

This article is about In-Memory Data-Structures. But this time, about the “magic” ones: Lists, Maps, Queues, Locks, Streams… that works distributed across multiple nodes and across different runtimes (Java, C#, C++, Node.js, Python, Go).

Motivations

I had the opportunity to participate in the #AWSSumit2018 Berlin. There were many great talks, awesome environment… with special people!
In one of the talks: “Reactive Microservices Architecture on AWS”, I noticed how the speaker was using the Vert.x framework on his examples, and promoting the use of multiple caching layers…

Long story short, this was my favourite lesson:

Even if you are rich enough, you should NOT waste computing resources…, use caching!

But, what is the relation between Vert.x and the topic of this story? 
Hazelcast: The Operational In-Memory Computing Platform

Meet the Hazelcast IMDG project — A developer introduction

Running a Hazelcast single server instance using Docker:

docker run -p 5701:5701 — rm -ti hazelcast/hazelcast

In-Memory Data Grids

“In-memory processing has been a pretty hot topic lately. Many companies that historically would not have considered using in-memory technology because it was cost prohibitive are now changing their core systems’ architectures to take advantage of the low-latency transaction processing that in-memory technology offers. This is a consequence of the fact that the price of RAM is dropping significantly and rapidly and as a result, it has become economical to load the entire operational dataset into memory with performance improvements of over 1000x faster. In-Memory Compute and Data Grids provide the core capabilities of an in-memory architecture…”
https://www.gridgain.com/resources/blog/in-memory-data-grid-explained

Distributed Data Structures

“In computer science, a data structure is a data organization and storage format that enables efficient access and modification.[1][2][3] More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.[4]…”
https://en.wikipedia.org/wiki/Data_structure

Leaving aside all the theory and extensive literature related to data structures, if you are a software developer, you are currently using data structures such as Maps, Lists/Arrays or Sets; those are simply part of any process of retrieval, mapping or aggregation of data. 
With Hazelcast, you still have access to this kind of structures, with the big benefit that they are highly efficiently distributed through a network of nodes and accessible from multiple runtimes and programming languages.

The distributed counter” case study

Let’s describe the previous with an example:

“I have the need to count the number of realtime notifications my services have pushed to remote Web clients. I need the counter retrieval to be fast, as multiple other services check it on a per-second basis…”

Classic solution using a SQL/No-SQL persistent database:

  1. Create a “PushLogs” table/collection to store the logs related to message push activity. The model schema will look like this: {senderId: String, room: String, timestamp: Date, msgId: String}
  2. On every push request, log the action by adding a new entry to the collection.
  3. On count request, execute a count query to the collection.

This solution is simple and even works when I scale my application to multiple distributed instances…, but it is not fast/efficient enough! 👍

Extending the solution with Hazelcast:

  1. On instance start, also create/reference a distributed “Atomic Long” counter. If value is 0, assign the existing PushLogs table/collection entries count.
  2. On every push request, also increment the counter by 1.
  3. On count request, return counter value.

This solution remains simple and even works when I scale my application to multiple distributed instances…, but this time it is fast/efficient enough! 🙌 🚀

// on application instance start...
const counter = hazelcast.getAtomicLong('distributed-pushes-counter')
await counter.compareAndSet(0, await PushLogs.count({}))

// on push...
await counter.incrementAndGet()

// on request
await counter.get()

Other Hazelcast supported Data Structures per runtimes are summarised at https://hazelcast.org

Distributed Locks

Distributed Lock is a very special Data Structure, it allow us to request and receive exclusive read/write access to a shared resource in a distributed environment.

The following example is self-explained: “Nobody else touch the console!” 😁

const lock = hazelcast.getLock('my-distributed-lock')
lock.lock().then(() => {
  // the execution of this function will happen once
  // at a time across all distributed instances
  // ...
  console.log('Happy Locking!')
}).finally(() => lock.unlock())

Near Cache (in-RAM synced Maps)

“If you ask a mathematician how slow is a network I/O operation compared to RAM based data retrieval, he will just answer: It is infinitely slower!”

Map or Cache entries in Hazelcast are partitioned across the cluster members. Hazelcast clients do not have local data at all. Suppose you read the key k a number of times from a Hazelcast client or k is owned by another member in your cluster. Then each map.get(k) or cache.get(k) will be a remote operation, which creates a lot of network trips. 
If you have a data structure that is mostly read, then you should consider creating a local Near Cache, so that reads are sped up and less network traffic is created.
http://docs.hazelcast.org/docs/latest-development/manual/html/Performance/Near_Cache/index.html

The Locations Management API” case study

Let’s describe the previous with an example:

Within my X organization, almost all business processes revolve around locations. The locations API response time for data retrieval endpoints is required to be the lowest, so it does not impact the performance of dependent APIs…

// on application instance start, init near cache Map
const {Client, Config} = require('hazelcast-client')
const nearCachedMapName = 'my-holy-fast-map'

const cfg = new Config.NearCacheConfig()
cfg.name = nearCachedMapName
cfg.invalidateOnChange = true
cfg.nearCacheConfigs[nearCachedMapName] = cfg

const hazelcast = await Client.newHazelcastClient(cfg)
const locations = hazelcast.getMap(nearCachedMapName)

// on application instance start, fill map with locations once
if (await locations.size() === 0) {
  (await LocationsService.findAll()).forEach(location => {
    locations.put(location._id, location)
  })
}

// on location change
await locations.put(location._id, location)

// on locations retrieval "GET /locations"
await locations.values()

Polyglot

Hazelcast Data Structures are “magically” accessible between different runtimes, this means: the Distributed Lock, the Near Cache Map and many others can be used/shared between the programming languages and runtimes you love.

So, let’s get that Node.js Distributed Lock instance from C++

// Get a distributed lock called "my-distributed-lock"
ILock lock = hz.getILock("my-distributed-lock");
// Now create a lock and execute some guarded code.
lock.lock();
try {
  //do something here
} catch (...) {
  lock.unlock();
}

Hazelcast supported Data Structures per runtimes are summarised at https://hazelcast.org

Conclusions

“Caching is the last mile for performance optimisations, and the one that brings you the biggest improvements… when implemented right!”

For laziness, many developers(including me) tend to stress the SQL/No-SQL databases using them as a central storage for our application’s “distributed state” management.

When using Hazelcast IMDG, “GOD” mode Data Structures just become available, everywhere. Give it a try, that is my invitation!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.