Reverse Engineering a MongoDB Document

Reverse Engineering a MongoDB Document

You are the data modeler responsible for building a logical data model of an existing NoSQL MongoDB database. Your plan is to reverse engineer the existing database into a physical data model using your data modeling tool, and then translate this physical data model into a logical data model. Many of the MongoDB concepts translate nicely into traditional data modeling components, but the concept of an array does not seem to fit.

For example, there is a one-to-many relationship between Order and Order Line which can be implemented in one of three ways:

1) The physical data model can resemble in structure the logical data model and be implemented as a one-to-many relationship:


2) The Order entity can be collapsed down (embedded) into the Order Line child:



3) The Order Line attributes can be rolled up into the Order entity. The challenge with this repeating attributes technique is that we need to determine the maximum number of lines an order can contains (in this example, three order lines):



An array concept in MongoDB is most similar to this third model (rolling up), except that there is no limit to the number of children the parent can have. An Order can have three lines, but can also have three hundred lines.

How would you show this array concept on the data model diagram?



  1. Michael Silves 7 years ago

    I think arrays by definition have a sequence that should be preserved. Also, for the example given there may be a requirement that a given product could appear on more than one order line.

    I can’t copy a picture here!

    Order Number
    Order Number key
    Order Line Sequence key
    Product Id
    Product Quantity
    Product Amount

  2. To 7 years ago

    Conceptually, there is no difference between the first method, the standard 1nf normalized form, and the third, the MongoDB array. They both allow an Order to relate to zero, one or many OrderLines. In fact, it would not surprise me at all to find out that MongoDB, behind the scenes, physically maintained arrays in just that manner (normalized). So the whole “array” concept may just be syntactical sugar to save developers from having to form a pivot.

    Whether that is the case is immaterial. The standard logical data model accurately represents the array. So what’s the problem?

  3. Luc 7 years ago

    If the purpose of the exercise is to produce a logical model, then I endorse the solution given by Michael Silves. The constraint that a product appear only once in an order is probably not present in the physical implementation, and therefore it might be dangerous to introduce it in the logical model.
    If the goal is to produce a physical data model diagram using only “traditional” relational tools that don’t support the concept of repeating groups, than you have to be inventive. E.g. you could add the number of the repeating group between brackets after the attribute name:
    ProductId (1)
    ProductQuantity (1)
    This is assuming there could be more than one “array” in one record (like there can be many “OCCURS”-elements in a record in an IDMS (codasyl) db) so you would need some way of showing which attributes belong together in which group.
    If you’re using an UML-based modeling tool you could draw the repeating groups as separate classes (with a 1-many relationship) but add an appropriate stereotype to the relation to show that the child class is implemented as an “array” within the parent class.

  4. Madani BASHA 7 years ago

    I guess one could completely abstract the order lines as a BLOB as “order_lines” in a logical data model.
    Or show as “order_line (nnn)” if the data modeling tool would allow such a notation.

    If and when clarity is needed, it can be expressed as multiple occurrences of an “order_line” entity.
    I too share the caution expressed that it is not a good idea to assume that “Product ID” will be a unique key.
    In many cases either the same product appear multiple times in the same order or non-product lines are present.
    For example, “Buy n items of product X and get one free product X” schemes tend to list the free product X as a separate line with either $0.00 price or -ve $.cc figure indicating the discount.

    Physical table design would of course depend on the underlying data storage system.

  5. Mike Sadka 7 years ago

    I think diagram 1 is correct for the logical model, which should be independent of any implementation technology. But I don’t think any of the diagrams works as the physical model for a non-relational implementation. So I would not try to create a physical model for a noSQL database using relational notation, but would describe it using the native Mongo BSON and translate that into the logical model.

    But if I really had to represent this in a relational diagram I’d use a blob or lengthy text field named something like Order_Lines_Array. You could use two such fields – one for each array dimension, perhaps greater clarity? – but I think a single blob more accurately models the Mongo array datatype at the physical level.

  6. Clifford Heath 7 years ago

    Whether or not the lines of an order conceptually needs to be kept in sequence, users normally prefer the sequence to remain stable. When I check my supermarket docket, I expect to find things in the order they went through the checkout, not in alphabetical order, or by product code, or worse yet, random. Thus there is no real alternative to maintaining either a sequence number (I name these XyzOrdinal).

    To do it using a linked list (using a foreign key to the surrogate key on the previous entry) would be an exercise in pain requiring CTEs that rebuild the reversed sequence for display.

    Note: For legacy reasons, the approach using an Ordinal is not what I show in my Warehousing model (in ORM) at and in CQL here:

    (I hope those links get through the WordPress software! In case they don’t here they are in another form: Warehousing in ORM and Warehousing.cql)

Leave a reply

Your email address will not be published. Required fields are marked *