Here are three observations I plan on making at Data Modeling Zone that summarize the last 12 months on our data modeling profession:
- Our customer, the data scientist. I am finding more and more, a strong overlap between the activities of a data modeler and those of a data scientist. In the book Julia for Data Science, Zack Voulgaris covers the data science tasks including data preparation. Data preparation is all about understanding the data including its definitions, formatting, and lineage. It’s an activity we do quite often as modelers. These past 12 months we have started working more closely with data scientists and I can see this is only the beginning of a very close relationship where in some cases we are working with and in other cases working for the data scientist. Bill Inmon calls a subset of data preparation textual disambiguation.
- DW’s reincarnation as the data lake. Speaking of Bill, I always quote Bill Inmon’s original definition of a data warehouse (DW) as a subject-oriented, non-volatile, time-variant, integrated set of data. Bill has recently written Data Lake Architecture. Over the past months, the concept of a data lake has been discussed in many of my training classes and also during several of my consulting assignments. I see the data lake meeting Bill’s original DW definition plus accommodating more complex types of data (e.g. big data) such as sensor readings and documents.
- Beyond “NoSQL does not mean NoDataModeling.” For the last two Data Modeling Zone events, we have had sessions emphasizing the need for data modeling when NoSQL databases such as MongoDB or NoSQL environments such as Hadoop are being deployed. Over the last 12 months (and it is also reflected in the sessions at Data Modeling Zone), the focus has changed from “NoSQL does not mean NoDataModeling” to “How do we do it?” That is, what modeling techniques allow us to capture and communicate the more varied types of structures (such as nested arrays) common in NoSQL? I see this as a critical step for keeping the data modeler in the much-needed role of analyzing, documenting, and designing data requirements.
These are some of my observations. Please add your thoughts and also share your own observations.