Evolutionary or revolutionary approach to modeling NoSQL?

Evolutionary or revolutionary approach to modeling NoSQL?


The symbols we use on our data models were created with relational databases in mind. Luckily, conceptual and logical data models are independent of technology so we can still use our symbols for modeling non-relational (NoSQL) databases. When we reach the physical level however, things get more complicated –  a NoSQL database such as MongoDB or Cassandra supports different types of constructs than relational. Arrays are just one example – there are many examples where our physical modeling symbols can fall short of communicating. There are two approaches we can take to model these new types of structures such as arrays: evolutionary or revolutionary. An evolutionary approach builds upon our existing set of data modeling symbols. For example, we may add a new symbol to represent an array. A revolutionary approach comes up with a new set of symbols for modeling. Which approach do you think would be more successful, and why?


  1. Hal Metz 4 years ago

    The revolutionary approach makes sense because the entire data modeling landscape has changed completely. Our new understanding of data relationships and the marriage of data to usage as Big Data does makes, not just the symbols, but SQL mindset antiquated.
    However, the politics of change makes universal acceptance of a new set of more accurate and meaningful symbols for old ideas impossible or prohibitively expensive.
    Worse yet it would create a ‘two standards’ environment that is the bane for any technology.
    So . . . we create new symbols and add them to the portfolio of data modelling symbols in ways that accommodate the new without ‘desecrating the old’. Agile took this approach and added agile terminology sometimes simply duplicating existing system development life cycle terminology with new ‘agile terms’.

  2. mukesh 4 years ago

    As I understand working with NoSQL systems requires a fundamental mindset change. You stated Array is one example. And there are complex things like graphs & documents as well. It’s also expected to continue growing in future rapidly.
    So with my limited knowledge I feel revolutionary approach is needed. There are people who are expert than me in the area, I will just share my understanding on why…
    In reading I came across that, developers creating NoSQL-based applications frequently skip the traditional step of building conceptual and logical data models upfront and focus solely on low-level physical data models incorporated directly into the application logic. If we add Agile into the discussions, the approach may speed up the development cycle or may help scaling applications quickly in short terms but may end up impacting database manageability in future. From DevOps points of view is it really good to keep data models only in the application logic? If we are dealing with ever growing data & complex applications depending on/consuming them, having well (if not the 100% perfect) documented models will always be better. One reason people may not be approaching NoSql data modeling(the way relational db modeling) could be because its very dynamic & complex with the context of very big applications. And probably there is not enough documentation & guidance on how the modeling can be done effectively. So if not having appropriate symbols is stopping people from effective data modeling , then revolution should help.

  3. Eric 4 years ago

    I am for the evolutionary approach. The Entity/ Relation Model as developed by Codd was meant to model and document logical data organization. The RDBMS that are developed to support this model have been consistent with the model so modeling the physical implementation is almost the same. However, arrays, stacks, queues, linked lists, documents are data structures that existed before any RDBMS were developed before the computer were even invented. I do not think these NoSQL data structues are something new. New technology maybe but similar data structure concepts.

    I propose that we stopped modeling the physical NoSQL data structures. Here is why.
    1. The major point of Entity/Relation Model and Normal forms is to minimize data anomalies. To minimize data anomalies in an array is a lot of work. That is why new technology is able to help do the work. However, the people who consumes the data does not want to see arrays and wants to consume data without data anomalies. In fact, they still want to see data presented in its Entity/Relation model because it makes sense.

    2. NoSQL is not schema less or multiple schema. It is more deferred schema. Store all the data and when it comes time to render the information on a mobile devices, display monitors or reports the Schema(s) are then at that time required! Would you like to see 10 online records of your birth certificate some duplicate some displaying different location, birth date and parents? No you would like to see one copy with the right information. (Person with zero or one validated birth certificate)

    So I propose that we model the schema at the point of consumption of the data. A typical webpage will have main subject with zero, one or more auxiliary information and zero, one or more auxiliary information to the auxiliary information =) . The business rules on these schema(s) will then be retroactively applied to their NoSQL physical data structures enforced by the technology itself. Data are for people consumption. Technologies and computer systems merely generate, maintain and process them.

  4. Ted Hills 4 years ago

    It is true that the E-R model does not accommodate arrays and nested data structures, which are also not directly supported by SQL-only databases but are directly supported by most NOSQL databases. At the same time, the relational structures that E-R can model exist in NOSQL databases, too. So that is why I prefer an evolutionary approach. @Hal Metz mentioned how the agile approach added new terms to the existing terminology of system development, thereby accommodating the new without desecrating the old. I like this approach, too.

    In particular, I think we should preserve the use of rectangles to represent entity types (which we all incorrectly abbreviate to “entities”), and the use of lines between rectangles to represent relationships between entity types. That’s pretty basic. But we need to expand the “vocabulary” of relationship lines to include nesting, so that we can say that entity type B is nested in entity type A–but preserve the ability to say that entity type C is referenced by entity type A, because that still occurs in NOSQL databases, too.

    I’ve developed these ideas into a proposal for a new notation which I’m calling Concept and Object Modeling (COM) notation. I have a short article published by Dataversity (see link below) that describes it. To be more accurate, the graphical notation is evolutionary, but I think the concepts behind it, and the breadth of expression you get with COM, are revolutionary. But you can judge for yourself if you download and read the paper.

    I am working hard to get other materials ready to post in time for the NOSQL conference August 18-20. Please check my Web site, http://www.tewdur.com, at that time for a complete reference, a Visio stencil, and additional articles on COM.


  5. Elangovan N 4 years ago

    Evolutionary approach is the appropriate way. Because:
    1) The conceptual and Logical layers are still in tact and it is the question of how to extend the physical model to show the structures like Arrays. That too, the arrays resemble the
    “weak entity” of Peter Chen. Graphs, Social data, Documents, Videos are clearly falling into the category of text or binary objects which are already represented in data modeling.
    2) I would assume that an array is a weak entity, represent it in my logical model, and in the physical model I will want to have an additional marker to represent that it is an array
    and also will want to have additional data types like “StringArray”, “NumberArray” etc. To me it looks like something similar to the “PIC X(10) or PIC 9(10) Occurs N times” in COBOL data structures. The marker in the physical model should have the capability to show whether the Column of Array type should be created within the parent table (or collection) or as a separate collection. The modeling tool will understand this and create the DDL accordingly.
    3) While the reasons for modeling remains the same, it is just the question of accommodating the data structures that deviate the relational principles at the physical level (equivalent to de-normalization). Therefore, the evolutionary approach is better than revolutionary.

Leave a reply

Your email address will not be published. Required fields are marked *