A Sit Down with EBM Chief Systems Architect Mike Lietzau.
Mike Lietzau is a founding partner, Chief Systems Architect and Director of IT for EBM Software. Since 2010 he has guided the company from a technological standpoint, and has been instrumental in the creation and ongoing development of the Catalyst business performance software, a tool more and more companies are using to find answers to their Big Data questions.
We recently sat down with Mike to talk about the Big Data challenges many companies are working through today and how new technologies are being used to handle these issues.
What are some of the most common problems that companies are facing with Big Data?
It’s not, surprisingly, “access” to Big Data that’s the issue. A lot of companies have access to Big Data. It’s the understanding and utilization of that Big Data. I mean, we’re talking about trillions of rows of data. They’re used to handling millions of records – they can do that in Excel or in a small Microsoft Access database. But when you start talking the scale of Big Data, companies are just not equipped to handle that.
Most companies have access to very large sets of data: inventory data, production data, social data… they just don’t know what to do with it. They can spend weeks or even months trying to catalog this information and pull it together, but typically what ends up happening is they pull it together for a one-time analysis and it’s not repeatable. With the Catalyst Data Lake the same tools and reports can be created, but it’s a very repeatable process. As new data comes into the Data Lake, your reports and visualizations are automatically refreshed.
For many people, the Data Lake is a pretty new concept. Could you explain the difference between a Data Lake and a Data Warehouse?
A Data Warehouse is a little bit older technology for storing a company’s data. It’s a very structured type of data set, and it requires a lot of IT involvement in setting it up and making any sort of changes to the Data Warehouse. Changes to a Data Warehouse can take weeks or months, depending on the change.
A Data Lake is a newer evolution of the Data Warehouse. It employs some of the same techniques as a Data Warehouse would, but it’s very flexible. It’s able to store structured and unstructured information, typically on a scale not seen in a Data Warehouse. We’re talking about terabytes of information. Because we have those very large amounts of information, typically companies will need to employ data scientists to actually make use of this information.
With Catalyst, we play the role of the data scientist. You’re up and running with the Data Lake very quickly and relatively inexpensively – you don’t have to bring in new hardware, you don’t have to bring in people to monitor that hardware – everything is taken care of for you. Your people can be working with this data, self-service, in a very short period of time.
Give me an idea of what it’s like to try and perform this Big Data analysis with Catalyst vs. without.
Without Catalyst, you’re basically left to fend for yourself in doing data discovery and pulling information for reports. When you’re dealing with the scale of information a Data Lake is really built for, that process can take weeks or even months. We’re talking about enormous amounts of time and effort. With the Catalyst Data Lake, you’re are up and running very quickly, the data models are already built, and we really work with you to ensure you’re getting clean and precise data with very little effort on your part.
As an example, we’ve had a couple of retail clients who are very, very large and have millions of transactions every day. With one company in particular we had trillions of rows of point-of-sale data and Catalyst handled that with no problem at all. We had Data Cubes, visualizations and reports based on this type of information, and it was information that company had never seen before because they were unable to pull that data together.
The Data Lake concept can also be helpful if you’ve changed ERP’s, correct? How can Catalyst help you access this legacy information?
Catalyst will connect with just about any ERP system. Many of our clients have gone through an ERP change like you’ve described, where they’re moving from one ERP system to another. With EBM Catalyst, we extract all the information from the legacy systems and your current systems, and we build a data model that sits on top of that. So that kind of hides the fact that you’re looking at two different data sets. Your reports and visualizations just look like they’re coming from one data set, when it could be two, three, four data sets underneath.
We had one company that had 23 different ERP systems in there, and we were able to pull all that information together, put it all in one data structure and the company could report on it as if it was one set of data.
What’s the difference between Catalyst and other business performance software solutions that use Data Lakes?
There are plenty of places out there that can build a Data Lake, but you’re typically starting from scratch. You’re having to write all the connections to your information and you’re having to build the data models yourself (or hire them to do it). With our Data Lake, we already have that base set of information that’s in Catalyst: we have all the connections to your ERP systems and we have the expertise to help you add on to that with more data sets that you want to bring in to the Data Lake.
We also have pre-built administrative tools for managing hierarchies and attributes. We have pre-built visualization and reporting packages and Data Cu