Corios was hired by a prominent insurance carrier to modernize their analytics and data practices for all things analytical: underwriting, pricing, claims, repairs, coverage, compliance, and regulatory support. They wanted to reduce the cost of data storage, to align all their analysts on a consolidated set of tools and environments, and to modernize the enterprise so they could react to climate events and other large-scale adverse events faster and more efficiently.
The Corios solution we use in these engagements is Corios Rosetta, which includes Corios software and service methodology to inventory, score, prioritize and modernize our clients’ SAS data and analytics assets. After inventorying their workloads, data and teams, and interviewing leadership and subject matter experts, we recommended to centralize their workloads that relied on their primary atomic-level data warehouse (in Oracle), and to move their non-warehouse workloads and analysts to the use of Python on Domino Data Labs for virtual analytic environment provisioning and archiving. Then we invested the next 6 months modernizing the work of their 800+ analysts along this roadmap.
We brought several perspectives into this engagement that were critically important to the client’s success.
Change can be hard
Analyst behavior and readiness to adopt change is critically important, and there were four mindsets we commonly encounter. We segment the analyst community into groups who produce similar work, and determine their readiness for modernization. Then we target our migration efforts and coaching differently for each group.
- The “Ready to Go!” group (usually 10% of the analyst community) is already bought in to the benefits of modernization and serves as an internal champion for change. They often need very little direct support other than to point them in the right direction, but we stay in contact with them to help them socialize their early successes.
- “Help Me Get Started” (20% of the analyst community) prefers to have us provide them with written and video materials, especially case studies, side-by-side examples of before-and-after, and “learn by doing” training.
- The third and largest group, “Coach Me On This Journey” (roughly 40%) need active coaching and teaching experiences through multiple media, benefit reinforcement, peer teaming, and in some cases, daily check-in’s with their coach.
- Finally, every organization has members of the “Doubting Thomas” group (about 10% of the analyst community) who actively avoid change. We’ve found the best approach is to help the “Ready To Go!” group create their own success stories, to socialize those successes and lessons learned, and to encourage behavior change in the last group through demystifying the journey and making it less scary.
Finding the edges of the jellyfish without getting stung
The traditional means of defining the modernization roadmap and success criteria (i.e., walking the halls, and interviewing analysts in conference rooms, especially during the pandemic) is like trying to pick up a jellyfish on the beach–no discrete edges, but lots of stinging tentacles. We’ve found to succeed in this endeavor, you need to have data-driven inventory to answer the who-which-when-where-how many questions, and then rely on interviews to address the why and how questions. In this engagement we scanned the data about every analyst, every data file, every workload, and every line of code. We found for instance that out of 800 registered analysts, only 250 were actually active in performing analytics work. More interesting, hundreds of workloads and thousands of data files were owned by analysts who didn’t even work at this company any longer.
For a fun diversion, here’s a clip on actually picking up a dangerous jellyfish:
Share the power
It’s far more efficient to enable the analysts in the enterprise to do most of the migration, and we focus on the inventory, targeting, vision creation, training, coaching and change management. If you try to throw a hundred consultants at hundreds of analysts, thousands of workloads and hundreds of thousands of data files, you will end up with a chaotic and expensive mess. Instead, we found it’s smarter to leverage the knowledge the analysts already maintain to ensure the quality and validation of the work they migrate, and we provide the supervision and tutelage to do it consistently and rapidly.
Meet the data where it rests
By having a complete inventory of all the workloads, we found that more than 40 analysts spent most of their time querying and exporting massive amounts of data (e.g., hundreds of terabytes over five years) out of the atomic-level data warehouse, across the network into their ad hoc analytics workloads, even though they didn’t fundamentally augment or combine new data on top of the warehouse data. Instead, all that work could be performed inside the database, where the data was secure, backed up, and easily validated. The analysts only had to learn how to convert the use of their ad hoc analytics workloads to run in SQL, and then use a BI tool like Tableau that was already connected to the warehouse to build their reports. This reduced the volume of one-time and orphaned data on the file system and its own archiving platform, and allowed the analysts to spend more time actually analyzing and finding insights, than moving data from one place to the other.
Scale the mountain
About 10-15% of their workloads would benefit from being able to horizontally scale them beyond what a Python/Pandas framework can easily support. Moving those workloads to Dask and Spark for horizontal scaling of compute resources became a natural next step.
Cloud storage cost efficiencies
When the analytics computations cannot be run in the database, then heavily-used analytic data would benefit from distributed object storage in the cloud (e.g., parquet datasets on Amazon S3 for instance), which is well aligned with the use of horizontal scaling for computations.
Now that our client has upgraded their analytics workloads to take better advantage of their corporate resources such as their primary relational database, and their open source analytics on running on their on-premise Domino Data Labs environment, some of the next steps include:
- Achieving further storage efficiency by leveraging cloud storage for their large flat file analytic data on AWS S3.
- Moving their largest workloads that run inefficiently on Python by migrating those jobs to Spark. This can be achieved in both Domino Data Labs (on-premise) and on Amazon EMR (in the cloud).
- Designing and writing their net new workloads to leverage best practices from the beginning so that they don’t build up more technical debt related to their analytics data and workloads.