Observations on the data modeling profession

Observations on the data modeling profession

Here are three observations I plan on making at Data Modeling Zone that summarize the last 12 months on our data modeling profession:

  1. Our customer, the data scientist. I am finding more and more, a strong overlap between the activities of a data modeler and those of a data scientist. In the book Julia for Data Science, Zack Voulgaris covers the data science tasks including data preparation. Data preparation is all about understanding the data including its definitions, formatting, and lineage. It’s an activity we do quite often as modelers. These past 12 months we have started working more closely with data scientists and I can see this is only the beginning of a very close relationship where in some cases we are working with and in other cases working for the data scientist. Bill Inmon calls a subset of data preparation textual disambiguation.
  2. DW’s reincarnation as the data lake. Speaking of Bill, I always quote Bill Inmon’s original definition of a data warehouse (DW) as a subject-oriented, non-volatile, time-variant, integrated set of data. Bill has recently written Data Lake Architecture. Over the past months, the concept of a data lake has been discussed in many of my training classes and also during several of my consulting assignments. I see the data lake meeting Bill’s original DW definition plus accommodating more complex types of data (e.g. big data) such as sensor readings and documents.
  3. Beyond “NoSQL does not mean NoDataModeling.” For the last two Data Modeling Zone events, we have had sessions emphasizing the need for data modeling when NoSQL databases such as MongoDB or NoSQL environments such as Hadoop are being deployed. Over the last 12 months (and it is also reflected in the sessions at Data Modeling Zone), the focus has changed from “NoSQL does not mean NoDataModeling” to “How do we do it?” That is, what modeling techniques allow us to capture and communicate the more varied types of structures (such as nested arrays) common in NoSQL? I see this as a critical step for keeping the data modeler in the much-needed role of analyzing, documenting, and designing data requirements.

These are some of my observations. Please add your thoughts and also share your own observations.

3 Comments

  1. Chip Hartney 3 years ago

    I really look forward to DMZ insight on “How do we do it?”

  2. Ken Hansen 3 years ago

    The critical distinction between a DW and a DL is that data in a DL is not usually “integrated” and integration is neither required nor expected. Additionally, DLs include a lot of data duplication – there is no normalisation. DLs reflect the data in multiple source systems and MDM is rare.

  3. Navin Ladda 3 years ago

    With the increasing use of Agile methodology, that advocates a single person playing many roles in a project team, coupled with Floating Enterprise license for tools such as Power Designer, I am seeing the importance of Data Architect role and Data modeling going down in the eyes of senior management. I see many Developers and Systems Analysts playing this role and creating Data models with nowhere near the analysis, thought of long term and re-use that a Data Architect would have done, as the end goal seems to be simply getting out a DDL. No thought is also given to re-use or maintenance of Data models as an asset. And adding meta-data to Data models is not even thought of as being an important task as Agile has made people think documentation is optional or technical debt that could be done later, and that later never comes as the next Agile project is ready for the crew to tackle. Adding to all of this mix is the Data Lake, which is being used as a big Sandbox and a license to bypass all good design principles in the name of Data exploration initiatives. Data Lake has its place, where it can co-exist with a Data Warehouse, but as organizations are figuring out what to make out of it, users are cementing its use in their daily activities in a variety of ways(ex: spreadsheets are becoming master sources of data), which will make it harder to change and correct in the future.

Leave a reply

Your email address will not be published. Required fields are marked *

*