Case Study Intro

Please forgive some of the vagaries here. Our Enterprise Client asked us to not reveal their name, so we’ve replaced their business name with “Enterprise Client.” They also didn’t want certain other information revealed that they were concerned that their competition or a malicious actor might learn, so we replaced some of the real functionality with something more generic. We respect our Enterprise Client and we have an ongoing long term business relationship with them. But we’re also proud of our work and want the world to know we’re here and this is what we do.


When Big Room was introduced to this Enterprise Client’s needs we certainly had our work cut out for us, as we were tasked with replatforming their existing legacy application to Node.js to modernize their platform and overcome some long-standing challenges. The application is extensive: a million complex records churn through the application every day and they are fully searchable; end user consumers can manage their data and documents with minimal friction, and business owners can manage interactions via a robust tracking system while monitoring the performance of their account (to which they subscribe, as this is a SaaS platform). Though we don't have enough room here to cover our approach to all of these topics, I’ve selected a nice variety of challenges to discuss that I think you might find interesting.

Now to the tech! As I mentioned we were tasked with rebuilding the entire existing legacy application with feature parity, plus new functionality on our stack of choice: Node.js and React. Exciting! This meant full-on greenfield development using tech we love and stand by 🎉!! This was a key opportunity to show our client that the best practices and open source tools we’ve been working so hard on can handle their big ‘ol Enterprise application. This project was great for us because a ton of feedback from this project made its way back into code contributions to improve our tools. In this next section, we’ll be going over some notable challenges and how we solved them with our tooling, complete with examples!


Problem Solving

Application Services

Problem: “We want to add an email to our Mailchimp list when a user submits an email signup form.”

Solution:  Mailchimp is a web service that revolves around building lists of emails for the purpose of running email campaigns. Our Enterprise Client collects the email addresses of the users on the SaaS platform which they use in order to send the relevant messages to each user.

Photo courtesy of unsplash.com

The first thing we do is identify how we intend to interact with Mailchimp. After some research we find there’s a REST API we can use to update mailing lists. We notice there’s no other code regarding Mailchimp in the app yet, so we’ll create a place for it in a new application service. Organizing code into app services helps enforce proper separation of concerns.

What exactly is an application service? Here are some thoughts on how we view them from our documentation:

Speaking generally, one might write code in a "service" as a way to group related business logic, data transactions (e.g. saving records to a database), or calls to external APIs. Sometimes the service layer denotes a "headless" interface to all the actions you can take in an application, independent of any transport (such as HTTP), and hiding the app's data layer (or model) from its consumers.

Pro Tip: One great thing about organizing your app in services is they are in prime position to be shared across codebases! Just be sure there isn’t any app-specific code and you’re all set. I’ve even published npm packages that just hold a hapipal server with a registered schmervice. hapipal’s boilerplate encourages all app code to be stored in a hapi plugin so this makes things very composable!

More complex app service example: An article service for a Medium blog type application

Now when a user fills out the front-end form, the payload will be sent to a route which will call the method mailchimpService.subscribe(). The general goal is to keep complex logic out of the route and keep it in a service (or across multiple services). Services are always welcome to utilize other services as long as there’s a clear separation of concerns.


CLI Interface

Problem: Is there a quick way to get us arbitrary stats from the database on-command? We don’t want to spend the effort to build a UI, we’d rather just be able to ask for these stats from you from time to time.”

Photo courtesy of thenounproject.com

Solution: This is a fairly unique situation, where the client wants to be able to see some app insights at any time but they don’t want to spend the effort to have a UI built for it just yet. They want their experience to be: Ask the developer for some statistics, then the developer provides them.

Use this hypothetical conversation to get an idea of the thinking behind a decision to use a CLI:

> Where do you think this logic should go?

>> We could put it in the appropriate app service!

> Yes, good thinking — but since this logic won’t be used anywhere else in the app, I think we could also be fine putting the logic in a CLI. Later on, it’s perfectly fine to refactor the logic to a service function and then call that from the CLI!

>> O-K that sounds good to me! The only thing I can think of is that creating a CLI seems out-of-scope for this, doesn’t it?

> Yes — absolutely. We’ve got a tool to help us, though! hpal takes care of all the boilerplate CLI setup. Out-of-the-box you can run and list any specified commands for a hapipal plugin. You can create your own CLI “commands” by plopping logic in named functions in a special file. It’s pretty awesome!

Pro Tip: hpal commands are a great way to take advantage of an application service’s position to share logic between routes and other parts of the application (for example, non-HTTP things like CLIs).

Here’s how we might write the logic for a CLI command recent-entries using hpal:

Example: Gist

Usage:

>> How do I use this CLI?

> At any time list your available commands with

$ hpal run --list

The output will be

...
Here are some commands found on your server:
hpal run enterprise-client-api:recent-entries

Running the command with an argument:

$ hpal run enterprise-client-api:recent-entries 90
...

(command output for the past 90 days)


Processing ~1 Million records daily

Problem: “We have a number of customers who have us automatically scrape data from their site to be put into our search. There are approximately 1 million records we currently process every day.”

Dev Note: Processing these feeds will be CPU-intensive, so we dedicate a single EC2 instance to handle this load.

Solution: I won’t be able to fit the full discussion and solution here, but will talk at high levels and touch on a few cool things 😁

Photo courtesy of unsplash.com

The application previously used cronjobs to implement these XML feed processors, which we call “bots.” Cronjobs made it difficult to scale this CPU-intensive work horizontally across multiple machines, and were error-prone since they required manual maintenance. When we were planning how to run these bots on a timer, we saw an opportunity to improve the system while we worked to revamp it.

Queues! To solve our timer hosting and on-demand-processing issues, we decided to use a message queue to transport commands to run each individual bot. A message queue is a piece of distributed infrastructure that receives, holds, and gives out data to a number of workers. We chose to use Amazon’s AWS SQS service to implement our queue. This helps solve our problem because Amazon hosts that piece of infrastructure, and our on-demand processing happens as soon as a message is delivered to one of the servers subscribed to the queue.

Architecture Overview

Here is our architecture at a very high level, with details below:

Diagram of Architecture

  • A queue service to interface with AWS SQS to receive commands to process feeds. No Enterprise Client-specific app-specific code exists in this service.
  • A models folder with files for each database model we use during feed processing. We’ll need the Record model, for example, since our bots need to insert records into the database.
  • An Elasticsearch service to load records into our search engine.
  • A bots folder that includes a file for each of the 90+ feeds we process. These files are where we put logic particular to each feed. Every feed will be different since every source is different — some are from similar sources, which helps, but almost all of them require some tailoring to get imported records just right!
  • A feed service application service that ties everything together. It has one method: createRecordsFromFeed() which receives streams for (possibly very large) feeds, and turns them into records in the database and search engine. Creating records from a feed is a big task because not all feeds are the same size or type — some are gigantic, while others only have a handful of relevant data. Some feeds are XML, others JSON. This service calls on the logic from our individual bot files to run specific feed code. It also uses our connected database models to add records to the database, and uses our Elasticsearch service to push new records into our search engine.

With this setup, we’re able to individually toggle bots on and off (by sending or not sending messages on our queue for a certain bot), and add new ones as customers are onboarded. We have nothing to worry about in terms of scalability of the message queue infrastructure, and we can always add more workers to the message queue if needed. Adding bots otherwise involves acquiring feeds, preparing our infrastructure to send messages to our queue, writing custom bot logic to format records from the feed into database records, and launching. Having this very nice and repeatable system for adding new bots makes estimating much easier. The unknowns to consider are “How similar is this feed to others that we’ve created bots for already?” and “Is there any fundamentally new functionality, such as supporting a format other than XML or JSON?”. I’m very pleased with the effort we spent on architecture because we’re still reaping the benefits as requests to add new bots trickle in, and I can leverage a base estimate that I’ve already prepared for each new bot.


Just a Slice 🍕

Here we discussed just a slice of the complicated yet delicious pie that is the overall Enterprise Client’s project! As the lead maintainer of this project, I feel at ease having battle-tested best practices and plenty of examples at my disposal to confidently continue building new features and refactoring existing ones. Throughout our examples, we mentioned modules under the hapipal umbrella. We’re very proud that our best practices proved to scale with this application, and that this suite of tools has proven itself as a provider of great DX at scale.

Itching to get something going?

At any time, run npx hpal new my-project\

On hapi

hapi is a Node.js framework

(From Why You Should Consider hapi)

hapi practically invented node framework plugins, the request lifecycle, server methods, and user extensions

What is hapipal?

hapipal is an ecosystem of tools and best practices for the working hapijs developer

> Are you currently using hapi or open to learning about it? Planning out a large project? Looking for architecture inspiration? Don’t know where to put stuff? hapipal’s got something to say about it!

Concepts

Some detailed reading and best practices:

If you learn best by reading and running code

We have some example apps that take on some Real World challenges, implement hapipal best practices, and show off our patterns!

  • hapipal examples repo
    • This repo will be kept up-to-date with all the awesome hapipal examples we have for users to look at!
  • hapipal Real World
    • Real World is an app specification for a “Medium blog clone”. It’s a very helpful project for comparing how different frameworks approach the same set of problems.
    • Make sure to check out the cool pattern used for the display service in this project. This pattern helps answer the question: “What’s the best way to organize DTOs on the back-end?” We didn’t have time to go over it here, but we make great use of this for our Enterprise Client.
  • Fishbowl
    • Fishbowl is a realtime game (using websockets) built by Devin Ivy where users play a pictionary-esque game with each other. It is intended for use with a separate group video chat solution, built to help people come together to have fun during quarantine.
  • Also checkout the gists linked to previously in this article for more clips and ideas.

That’s it!

🎉 🎊 🎉 Thank you so much for reading my first article! I’d like to especially thank the Big Room team for editing help and support doing this! If you have any questions about Big Room Studios, hapipal, would like to get in touch with the hapipal crew, or just want to say hi, feel free to reach out at hello@bigroomstudios.com!


Share this post