Background
Our customer, a biotech company specializing in cancer therapy, was recently acquired by a global Fortune 500 company. The acquisition and subsequent growth exposed severe challenges with updating and managing data in their Amazon Web Services (AWS) data lakes. It was taking our customer 18 hours to load the lakes, and complete database rebuilds were needed in order to add new users. These major pain points and other factors related to their monolithic data lake architecture resulted in these databases only being updated weekly - far too slow for the people who needed daily dashboards and reports to run their business.
SingleStone was hired to do a brief one-month Domain-Driven Discovery project that assessed their current state, designed a future state data model and data lake environment, and created a roadmap to implementation.
Response
Our team quickly framed the problem with our customer’s desired outcomes:
- Increase business self-sufficiency
- Increase data publishing frequency
- Improve data warehouse maintainability
Using tools like Event Storming, we explored our customer’s patient journey, from feeling sick to cured, to understand how value (and data) flowed through the organization. In parallel, we understood the current data architecture and friction points using C4 models and diving deep into their current data model and entity relationships.
Our team worked closely with our customer to explore several design options before refining them into a proposed future state that organized their enterprise data model around the natural bounded contexts of their business. Instead of one large and unwieldy analytics database for the entire company, we proposed a set of eight smaller context-aligned warehouses, each of which could be loaded independently on a faster daily cadence.
Leveraging best practices from our Enterprise Data Framework, we designed a tiered way to load and transform data from data sources, applying governance and creating lineage along the way. For scale, we designed an AWS Redshift cluster with one Postgres SQL database per context. This created a clean separation of contexts at the database level from source code through deployment, with the ability for cross-database queries when needed. A single cluster simplified maintenance and security and gave their team a starting point that could scale to a multi-region solution in the future.
We organized our designs and recommendations into an actionable roadmap that enabled our customer’s data warehouse team to modernize their architecture over time while working towards a clear future state design.
Result
In a single month, our customer went from pain to a plan. Using our visual methods and tools to drive collaboration, we helped our customer see how their patient’s journey can directly influence their data model and lakes. In order to continue to modernize and prepare for the future, we equipped our customer’s data team with a ready-to-execute roadmap that’s aligned with their desired outcomes.