Response to Design Challenge #4 - Techniques in reviewing a data model

So what are some of the handy techniques we use when reviewing a data model? Click here to read the complete challenge. The more models we review, the more techniques we tend to apply either formally or instinctively. This design challenge contains several of the very useful techniques that were sent to me for reviewing a design, thank you for the great responses! A special thanks goes out to data modeling legend Graeme Simsion for his responses.

I placed each of these techniques within categories and there is no priority to the order I listed the categories in or the techniques within each category. They are all important techniques, and this design challenge focused on identifying the techniques rather than rating them. As you read each of the techniques, do you see any new ones you can start using?

(The name in parenthesis following each technique is the person who submitted the technique.)

[NOTE: Whenever you see notes like this within the square brackets, they are additional thoughts I have on the technique.]

Validating the model with the business and functional experts

Express relationships and subtype hierarchies as business assertions eg "Each Person may hold one or more accounts and each account must be held by one person."  Richard Barker's technique was the first I saw; Graham Witt's which incorporates entity and attribute definitions to provide a complete description of the model is the best.  The user can work alone on this (initially at least!) ticking off - or querying - assertions serially. (Graeme Simsion)  [NOTE: This is a very powerful technique, especially when the business or functional experts prefer to stay clear of the boxes and lines on a data model. We are simply reading them the story on the design. When you do this, I strongly recommend reading them what the relationship includes, as well as what the relationship excludes. For example, in Graeme’s example, if a Person can hold one or more accounts, that would mean a Person can not hold zero accounts. I would ask the business experts whether this is what was intended, and if so, we have it modeled correctly.]

  • Start to gently query the author about some of the non-obvious business rules during a model-walkthrough session. (Ashraf Kandil)

  • Perform transactional analysis. Ensure that each of the target transactions or desired reports can be produced based on the suggested model. (Ashraf Kandil) [This is a very useful technique when designing a data mart where validation can be done by reviewing a report and making sure everything on the report can be represented somewhere in the model. A word of advice here: there is a very good chance your design might require more flexibility than what appears in a given report. For example, if there are four levels of product reporting within a certain report, build your design with enough flexibility to handle the maximum number of levels your experts feel might exist in the near future.]

  • Data element and entity names

    • Check entity names for non-singular and group names - eg Customer List, Product Catalogue. For consistency (and in building assertions) entity names should describe one instance. (Graeme Simsion)

    • I look at each data element and quickly identify which data elements do not end in a valid classword. A classword represents a general domain for the data element and examples include AMOUNT, DATE, NAME, IDENTIFIER, INDICATOR, and CODE (there are others too). Finding data elements that don’t end in a valid classword represent a naming standard violation at a minimum, and can also signal a larger problem where multiple pieces of business information might be stored in the same data element. For example, is Gross Sales an amount, a weight, a value, or worse yet, a combination of these? That is, should the right data element name be Gross Sales Amount, Gross Sales Weight, etc.? (Me)

    • Ensure that if an entity/table/attribute/column naming standard already exists in the organization that it has been adhered to. Otherwise improvise with one and apply it consistently across the model. (Ashraf Kandil) [NOTE: As modeling tools continue to add features, naming standards validation should eventually be an automatic and not a tedious manual activity.]

     Model appearance

    • If you're presenting a diagram to a non-specialist, start with a "high level" diagram stripped of the "less important" entities.  Use your judgement here: the purpose is to get the person oriented; the actual verification will take place against the full model.  Another option is to draw the important entities larger - if your tool will let you. (Graeme Simsion) [NOTE: I build “high level” models often for the reason Graeme mentions. It’s similar to the concept of reading a table of contents to a book to get the high level understanding of that book as opposed to jumping right into page 1 of the text.]

    • A favorite technique of mine when review with the business, is to divide the logical model into business subject areas.  This is easily done with ERwin's subject areas.  Most often, the subject areas correspond to discrete business processes, e.g., add a customer, send an invoice, etc.  I walk the business through each subject area  - one at a time -  describing entities, data and relationships in the context of the business process and business rules. (Mary Niva)

    • Check model clarity suggesting the use of subject areas if overly complex. Try for a one page fit for each subject area. Investigate the potential of using colors to highlight important entities eg, yellow for transaction entities and plain white for valid-value set (lookup list) entities. (Ashraf Kandil)

    Rules of normalization 

    • Verify each entity is in third normal form. Ask intelligent questions about the ones that aren’t and evaluate the answers. (Ashraf Kandil)

    • Look for 3 (or more)-way intersection entities.  Check the rules to ensure 4th and 5th Normal form have not been violated (not so hard when you think "rules" rather than "multi-valued dependencies!") (Graeme Simsion)

    Definitions

    • Look at a selection of definitions - poor definitions are one of the most common faults in a model, and often overlooked as the focus is on the graphical representation.  A quick glance will usually tell you whether the modeler has taken this task seriously.  (Graeme Simsion)

    • Ensure that entities and attributes have good solid meaningful definitions that are more than just a repetition of the name. Encourage the provision of examples to enhance understanding. (Ashraf Kandil) [NOTE: How many of us have seen definitions that simply rephrase the data element name? These “quick and dirty” definitions can escape notice when searching for missing definitions.]

     Model flexibility

    • Look for "head scratchers" (self-referencing one-to-many and many-to-many relationships).  Is the flexibility of an infinite hierarchy / network really required or has the modeler just been too lazy to model the real structures? (Graeme Simsion)

    • Look at how "parties" have been handled.  Many, many modelers will generalise "all people and organizations of interest to the enterprise" into some sort of Party, Party role etc structure without checking whether the business needs that level of generalization eg the business may be quite happy to treat suppliers and shareholders as totally disjoint (even if in the real world they overlap). (Graeme Simsion)

    • Listen to the business case/problem and verify that the model is addressing it and not an academic exercise DON’T MODEL FOR THE SALE OF MODELING (Ashraf Kandil) [NOTE: There has to be good business reasons why we chose a particular design structure.]

     Keys

    • Look for meaningful keys (ie non-surrogate keys). My view is that these are acceptable sometimes, but we should always check that they are applicable for all cases, unique, and stable (the last of these three is the most common problem). (Graeme Simsion) [NOTE: If you do decide to create a surrogate key, make sure you define an alternate key on something real such as what the business would view as the primary key.]

    Specific to the physical data model

    • Check for Recursion. Look for any recursive relationships. Verify that they have been correctly resolved. Ensure that the resolution comes with a Type entity. (Ashraf Kandil)

    • Suggest a Super/Sub type resolution approach. (Ashraf Kandil)

    • Identify if the nature of the target system and verify if OLTP or DW. Check for any controlled redundancies for the latter. (Ashraf Kandil)

    • Verify if any logical model entities should be combined into physical tables to achieve performance expectations. Verify correct dimensional Hierarchies have been established taking into account business nature and physical modeling science. (Ashraf Kandil) [NOTE: Remember to apply the Denormalization Survival Guide here from Chapter 8 in my book – a shameless plug for my book!]

    • Perform throughput analysis. Identify that, where possible, high volume transactions have been well catered for to achieve performance expectations. (Ashraf Kandil)

    Meta data quality

    Does the meta data match the data? If at all possible, try to get at the actual data that will be stored in the data elements you are modeling. You might be surprised at what you find, but better to be surprised early on in a project than during the actual development. For example, I recently worked on an application where an email address was being stored in a company name field. It can sometimes be challenging to actually get to the data, but once you get access to the data simple SQL queries can help validate that the data element meta data matches the actual data. (Me)

  •  

    Need a refresher of this design challenge? Please click here