Techniques in reviewing a data model

Techniques in reviewing a data model

There are a number of techniques I apply when I review a data model. Using these
techniques over the years helped me quickly summarize models and look for ways
to improve them. Many times the 80/20 rule seems to prevail when reviewing a
model. That is, in 20% of the time we can validate 80% of the model. (That extra
20% can be a bit tough however!)

Many of these techniques I plan on one day publishing and making it a topic in
my modeling course. An example of such as technique is:

I look at each data element and quickly identify which data elements do not end
in a valid classword. A classword represents a general domain for the data
element and examples include AMOUNT, DATE, NAME, IDENTIFIER, INDICATOR, and CODE
(there are others too). Finding data elements that don’t end in a valid
classword represent a naming standard violation at a minimum, and can also
signal a larger problem where multiple pieces of business information might be
stored in the same data element. For example, is Gross Sales an amount, a
weight, a value, or worse yet, a combination of these? That is, should the right
data element name be Gross Sales Amount, Gross Sales Weight, etc.?

Now I know there are a lot more tips out there that you might use when you
review a model. Share these with me. I will compile some of these techniques
(including more of my own) and send them back out in this design challenge
response. Note that as with all design challenges if I use your technique (and
it’s not already on my list) I will give you credit in any documentation.
Looking forward to hearing your techniques!

Response

So what are some of the handy techniques we use when reviewing a data model? The
more models we review, the more techniques we tend to apply either formally or
instinctively. This design challenge contains several of the very useful
techniques that were sent to me for reviewing a design, thank you for the great
responses! A special thanks goes out to data modeling legend Graeme Simsion for
his responses.

I placed each of these techniques within categories and there is no priority to
the order I listed the categories in or the techniques within each category.
They are all important techniques, and this design challenge focused on
identifying the techniques rather than rating them. As you read each of the
techniques, do you see any new ones you can start using?

(The name in parenthesis following each technique is the person who submitted
the technique.)

[NOTE: Whenever you see notes like this within the square brackets, they are
additional thoughts I have on the technique.]

Validating the model with the business and functional
experts

  • Express relationships and subtype hierarchies as business assertions eg “Each
    Person may hold one or more accounts and each account must be held by one
    person.” Richard Barker’s technique was the first I saw; Graham Witt’s
    which incorporates entity and attribute definitions to provide a complete
    description of the model is the best. The usercan workalone on this
    (initially at least!) ticking off – or querying – assertions serially. (Graeme
    Simsion) [NOTE: This is a very powerful technique, especially when the
    business or functional experts prefer to stay clear of the boxes and lines on a
    data model. We are simply reading them the story on the design. When you do
    this, I strongly recommend reading them what the relationship includes, as well
    as what the relationship excludes. For example, in Graeme’s example, if a Person
    can hold one or more accounts, that would mean a Person can not hold zero
    accounts. I would ask the business experts whether this is what was intended,
    and if so, we have it modeled correctly.]
  • Start to gently query the author about some of the non-obvious business rules
    during a model-walkthrough session. (Ashraf Kandil)
  • Perform transactional analysis. Ensure that each of the target transactions or
    desired reports can be produced based on the suggested model. (Ashraf Kandil)
    [This is a very useful technique when designing a data mart where validation can
    be done by reviewing a report and making sure everything on the report can be
    represented somewhere in the model. A word of advice here: there is a very good
    chance your design might require more flexibility than what appears in a given
    report. For example, if there are four levels of product reporting within a
    certain report, build your design with enough flexibility to handle the maximum
    number of levels your experts feel might exist in the near future.]

Data element and entity names

  • Check entity names for non-singular and group names – eg Customer List, Product
    Catalogue. For consistency (and in building assertions) entity names should
    describe one instance. (Graeme Simsion)
  • I look at each data element and quickly identify which data elements do not end
    in a valid classword. A classword represents a general domain for the data
    element and examples include AMOUNT, DATE, NAME, IDENTIFIER, INDICATOR, and CODE
    (there are others too). Finding data elements that don’t end in a valid
    classword represent a naming standard violation at a minimum, and can also
    signal a larger problem where multiple pieces of business information might be
    stored in the same data element. For example, is Gross Sales an amount, a
    weight, a value, or worse yet, a combination of these? That is, should the right
    data element name be Gross Sales Amount, Gross Sales Weight, etc.? (Me)
  • Ensure that if an entity/table/attribute/column naming standard already exists
    in the organization that it has been adhered to. Otherwise improvise with one
    and apply it consistently across the model. (Ashraf Kandil) [NOTE: As modeling
    tools continue to add features, naming standards validation should eventually be
    an automatic and not a tedious manual activity.]

Model appearance

  • If you’re presenting a diagram toa non-specialist, start with a “high level”
    diagram stripped ofthe “less important” entities. Use your judgement
    here: the purpose is to get theperson oriented; the actual verificationwill
    take place against the full model. Another optionis to draw the important
    entities larger – if your tool will let you. (Graeme Simsion) [NOTE: I build
    “high level” models often for the reason Graeme mentions. It’s similar to the
    concept of reading a table of contents to a book to get the high level
    understanding of that book as opposed to jumping right into page 1 of the text.]
  • A favorite technique of mine when review with the business, is to divide the
    logical model into business subject areas. This is easily done with
    ERwin’s subject areas. Most often, the subject areas correspond
    todiscrete business processes, e.g.,add a customer, send an invoice, etc.
    I walk the business through each subject area – one at a time
    -describingentities,dataandrelationships in the context of the business
    process and business rules.(Mary Niva)
  • Check model clarity suggesting the use of subject areas if overly complex. Try
    for a one page fit for each subject area. Investigate the potential of using
    colors to highlight important entities eg, yellow for transaction entities and
    plain white for valid-value set (lookup list) entities. (Ashraf Kandil)

Rules of normalization

  • Verify each entity is in third normal form. Ask intelligent questions about the
    ones that aren’t and evaluate the answers. (Ashraf Kandil)
  • Look for 3 (or more)-way intersection entities. Check the rules to ensure
    4th and 5th Normal form have not been violated (not so hard when you think
    “rules” rather than “multi-valued dependencies!”) (Graeme Simsion)

Definitions

  • Look at a selection of definitions – poor definitions are one of the most common
    faults in a model, and often overlooked as the focus is on the graphical
    representation. A quick glance will usually tell you whether the modeler
    has taken this task seriously. (Graeme Simsion)
  • Ensure that entities and attributes have good solid meaningful definitions that
    are more than just a repetition of the name. Encourage the provision of examples
    to enhance understanding. (Ashraf Kandil) [NOTE: How many of us have seen
    definitions that simply rephrase the data element name? These “quick and dirty”
    definitions can escape notice when searching for missing definitions.]

Model flexibility

  • Look for “head scratchers” (self-referencing one-to-many and many-to-many
    relationships). Is the flexibility of an infinite hierarchy / network
    really required or has the modeler just been too lazy to model the real
    structures? (Graeme Simsion)
  • Look at how “parties” have been handled. Many, many modelers will
    generalise “all people and organizations of interest to the enterprise” into
    some sort of Party, Party role etc structure without checking whether the
    business needs that level of generalization eg the business may be quite happy
    to treat suppliers and shareholders as totally disjoint (even if in the real
    world they overlap). (Graeme Simsion)
  • Listen to the business case/problem and verify that the model is addressing it
    and not an academic exercise DON’T MODEL FOR THE SALE OF MODELING (Ashraf
    Kandil) [NOTE: There has to be good business reasons why we chose a particular
    design structure.]

Keys

  • Look for meaningful keys (ie non-surrogate keys). My view is that these are
    acceptable sometimes, but we should always check that they are applicable for
    all cases, unique, and stable (the last of these three is the most common
    problem). (Graeme Simsion) [NOTE: If you do decide to create a surrogate key,
    make sure you define an alternate key on something real such as what the
    business would view as the primary key.]

Specific to the physical
data model

  • Check for Recursion. Look for any recursive relationships. Verify that they have
    been correctly resolved. Ensure that the resolution comes with a Type entity.
    (Ashraf Kandil)
  • Suggest a Super/Sub type resolution approach. (Ashraf Kandil)
  • Identify if the nature of the target system and verify if OLTP or DW. Check for
    any controlled redundancies for the latter. (Ashraf Kandil)
  • Verify if any logical model entities should be combined into physical tables to
    achieve performance expectations. Verify correct dimensional Hierarchies have
    been established taking into account business nature and physical modeling
    science. (Ashraf Kandil) [NOTE: Remember to apply the Denormalization Survival
    Guide here from Chapter 8 in my book – a shameless plug for my book!]
  • Perform throughput analysis. Identify that, where possible, high volume
    transactions have been well catered for to achieve performance expectations.
    (Ashraf Kandil)

Meta data quality

Does the meta data match the data? If at all possible, try to get at the actual
data that will be stored in the data elements you are modeling. You might be
surprised at what you find, but better to be surprised early on in a project
than during the actual development. For example, I recently worked on an
application where an email address was being stored in a company name field. It
can sometimes be challenging to actually get to the data, but once you get
access to the data simple SQL queries can help validate that the data element
meta data matches the actual data. (Me)

1 Comment

  1. Madani BASHA 5 years ago

    Often a data model diagram by itself will be given to me the request “Can you see if this model is ok?”.
    My polite and firm response always is “Please give me the business context. Only then will I review.”

    Also I am often asked to review a data model after reverse engineering – with lttle else to go with.
    And the table in such models can have single element surrogate id-s, without any clue about the “natural key”.

    It is essential to avoid deep discussions about primary key, alternate key and/or surrogate key with business folks.
    Instead I found the following question very effective in extracting the required details:
    “How can you tell apart one row from another?”
    I persevere and persist until the following sentence can be completed:
    “Each row of this table represents <>”.

    And I am simply amazed just how poorly formed these surrogate keys can be.
    At times, a single table is indeed a mash of multiple tables, with different natural keys.
    All hiding in the same table, courtesy of ill-conceived surrogate id.

    All I can say is that the above technique works all the time. 🙂

Leave a reply

Your email address will not be published. Required fields are marked *

*