Watching the Inventives

Lessons Learned from 5 years of SAS workload migration and modernization

« Back to Blog

(tip of the cap to Elvis Costello)
Corios specializes in modernizing an enterprise’s legacy SAS data and analytics assets, by migrating them into modern cloud platforms like AWS, and integrating these workloads with open-source frameworks. The Corios Rosetta Scanner is a software and consulting offering that gets this process started by inventorying, scoring and prioritizing all your legacy data and analytics assets, at a very detailed level.
What we’ve learned through these analytics asset inventory projects over the past 18 months is that you can sort and prioritize them along four dimensions: value, cost, risk and transformation potential. Once that’s been accomplished, we’ve identified markers of which strategies are best for each asset: migration, modernization, or even, leaving the existing asset in place (because sometimes, when something’s not broken, it might be preferable to leave it alone).

Here are some useful metrics you can use to benchmark your own organization against the set of SAS clients we’ve either moved to the cloud, integrated their SAS assets with open source options, or some of both. For the metrics below, we define an “enterprise” client of SAS as an organization that has more than 100 registered user seats.
It’s important to note that these findings are not a result of using SAS software, nor are they isolated to only SAS users–indeed many of these behaviors are true for users of any major software platform. We’re simply reporting the metrics we’ve gathered for SAS clients, as that is our area of expertise.

Analyst Community Value

A typical enterprise client of SAS Institute has hundreds of analysts.
  • Among this cohort, roughly 10% are high value, 20% are moderate value and the remainder are “learning their craft” or have inherited processes from predecessors and are not actively involved in tweaking or refining these processes.
  • Among these analysts, fewer than 10% of their workloads use high-powered analytic techniques, and fewer yet use any machine learning techniques, leaving an enormous gap in terms of business value potential for the enterprise.
  • 15% of these analysts are high-value information producers, who build information assets used by at least 2 or more downstream consumers. These producers average roughly 15-30 downstream consumers.
  • Some organizations have a higher ratio of analysts who are both producers and consumers, and we have found this is a marker of an organization that has an effective culture for information sharing and cross-pollination.

Workload Portfolio Value

A typical enterprise client of SAS maintains thousands of SAS workloads.
  • Of these workloads, only 20% produce high business value to the enterprise.
  • These workloads are owned by only 10-15% of the enterprise’s analyst community.
  • We define high business value using a workload scorecard assessed across multiple dimensions:
    • Workload sophistication and complexity
    • Number and size of data sources read by and produced by the workload
    • Data transformation and analytic components of the workload: i.e., creating new derived attributes, combining disparate sources of data, creating descriptive and/or predictive assets such as models
    • Number of downstream users of this workload’s results, and
    • Sophistication score of the user who is the creator/owner of the workload.

Workload, Data and Infrastructure Costs

A typical enterprise client of SAS Institute:
  • Maintains hundreds of terabytes (some in the petabytes) of data files. But 80% of those data files haven’t been read or used in over 24 months, leading to allocation of millions of dollars in wasted high-cost data storage.
  • Maintains thousands of workloads and hundreds of thousands of data files. But 15% of the analysts who created those assets don’t actively use SAS, and a whopping 40% of those analysts are no longer employed by your company. That’s a massive exposure in terms of intellectual property and compliance risk. Yet these enterprises don’t know how to find these files and fix the problem.
  • Among these thousands of workloads, roughly 30% move data out of your enterprise databases and across the network to local file systems and hard drives without needing to do so. This creates an enormous negative impact on shared resources.

Workload, Data and Infrastructure Risks

A typical enterprise client of SAS:
  • Maintains thousands of workloads. 15-20% of those workloads were authored by analysts who open-text pasted their passwords to critical systems of record in their code. But finding and mitigating these security risk exposures is a needle-in-the-haystack detective hunt.
  • More than 40% of those thousands of workloads were authored by analysts who wrote essentially no documentation. Not only does this present a validation challenge, this also creates compliance risk exposures.
  • Maintains hundreds of terabytes of SAS data files. Roughly 5% of the records in these files contain Personally Identifiable Information (PII)–names, account numbers, email addresses, tax IDs, etc. This is another substantial compliance risk exposure.

Workload Transformation Potential

A typical enterprise client of SAS Institute has:
  • Hundreds of analysts. Among these analysts:
    • 20% are well prepared for analytics modernization
    • 40% need significant direction to get started, and
    • 10% tell us they’re desperately clinging to the tools, skills and practices they learned in the 1980s and 90s.
  • Thousands of workloads. Among these workloads:
    • Roughly a third of all SAS workloads pull data out of the database and conduct simple transformations in SAS, then write to BI applications or flat files, but most of this work could have remained in the database, avoided the data transfer impact on the network, avoided the need for redundant backup and lineage tracking, and additional validation.
    • Another third would benefit from being integrated with open source analytics techniques in Python and Spark to save money on analyzing data arriving from cloud sources in highly distributed architectures.
    • 25% would benefit from moving data files from on-premise file systems to cloud object stores like Amazon S3, because the storage of frequently-used flat file data can be more easily stored and accessed there, due to no limits on storage, rather than on finite-sized on-premise file systems.
If you want to conduct this benchmarking on your own enterprise, you should learn more about Corios Rosetta. We can help you rapidly inventory and score all your SAS data and analytics workloads, prioritize them for migration and modernization, and integrate them with your cloud and open source strategies. Or email me today at president@coriosgroup.com.

Robin Way

The Founder and President of Corios, Robin’s professional passion lies in democratizing and demystifying the science of applied analytics. An established thought leader fueled with 30 years’ experience in the design, development, execution and improvement of applied analytics models, Robin welcomes every opportunity to move the analytics conversation forward.

Connect with him on LinkedIn , or reach out to Corios to get in touch.