Logical Data Modeling in practice

Logical Data Modeling in practice

The Logical Data Model (LDM) is a model showing the rules behind how something works independent of context. Context includes factors such as database platform, reporting tools, and usage requirements. The LDM purely focuses on content, showing how a business area or application work at a point in time. This is the theory, but in practice what else do we typically include on the LDM? For this design challenge, we would like to know from a practical viewpoint what you typically show on a LDM. There are no wrong answers here. Simply answer Yes, No, or Sometimes to the following questions and feel free to add comments to explain your response.

  • Employee Hire Date
  • Employee End Date
  • Original Product Name (Note not necessarily what the product name is today but what it was when the product first appeared in the marketplace.)
  • First Salesperson Id (The relationship between a salesperson and a customer stores not just the current relationship but also the relationship of the salesperson who first made contact with the customer.)
  • Gross Sales Amount (There is a calculation that produces this measure.)
  • Customer Source System Id (Note this is the unique identifier to a customer in the Order System. It is a surrogate key.)
  • Monthly Sales Reporting Entity (This is a summary view of order line details.)
  • Employee Access Id (An employee foreign key on the Claim entity which is used to enforce security as to which employee is allowed to view which Claims.)

Response

The list of fields for this challenge was carefully chosen to cover a gray area that sometimes exist between logical and physical modeling: when to add derived data, certain time elements, and data elements containing terms such as ‘Original’. So would you include each of the following fields on the logical data model? There are no right or wrong answers – what do YOU do in practice when faced with these fields? I added my comments to the list of everybody’s comments including those from Dave Hay and Graeme Simsion.

Our Design Challenge group is growing rapidly, as indicated by the record number of responses to this challenge. Here are a few of the rules I extracted from our group’s comments as they might help you in your modeling decisions:

  • If you’re taking a photograph of how the business works, and you need this field to describe how the business works, this field belongs on the logical data model (LDM). If you’re capturing how field values change over time or calculated fields, in general they belong only on the physical.
  • My rule of thumb, is that if the data supports business-level rules rather than implementation-related needs, then it’s a strong candidate for the LDM.
  • I’m working on the basis that the business needs this data for the purposes that the resulting database is to serve – I’m modeling for a database, not reality in isolation.

By the way, our raffle winner for this challenge is Ben Ettlinger. Congratulations Ben! Ben, email me your address to get your goodie!

Field       Would you show this field on the Logical Data Model?
Employee Hire Date?

Why this field ALWAYS belongs on the LDM:

Brings to mind that while hire date is a good idea; hire data and start date could be 2 different things as employees might start different positions within the same company at different points in their career.

For most business purposes, critical – for benefits, salary actions scheduling, etc.

An employee will always have been hired.

In order to determine if an instance of an employee is ‘available’ for relationships in the model, the logical processing date would have to be between the hire date and end date. Physically, this might be replaced by an active flag or employment status code, but the presence of start and end dates provides the ability to determine if the employee is active.

This attribute is so fundamental to any usage of the data that I would probably choose to always include it.

While it is more straight forward to simply put the hire date as a separate attribute on the Employee entity doing so does not readily handle the case of a rehired employee.  That being said, the number of rehires is relatively low, so the impact is minimal.

Many business rules will depend on this, so in almost all cases where you’re capturing employee info, it’s a business-level attribute to know an employee’s date of hire.

Why this field SOMETIMES belongs on the LDM:

Only if it is relevant to an HR application….and we only did one….an exit interview application.

Depending on the complexity of the design, this would probably be generalized to an Employment Event. In a world where lifetime employment has vanished, employees may be hired multiple times, yet retain a consistent identity.

Dates are often required for time period analysis, typically in data warehouse models.

I’ll make this comment once here but it applies to all attributes about which you’re asking.  It depends on the subject of the model.  In an HR model we’d include Employee Hire and End dates.  In a model dealing with purchased or sold products we’d include Original Product Name and other sales information.

Depends on how generic the user is comfortable with, and how large the database will be – is it for a comprehensive system which will support employee life cycles, for example?   If it is, the preferred way to handle this date is to have an entity for employee status, with start and end dates for each relevant status. (Although it is possible that the active status start date might not be the same as the hire date).

Field       Would you show this field on the Logical Data Model?
Employee End Date?

Why this field ALWAYS belongs on the LDM:

As with the Employee Hire Date, in order to determine if an instance of an employee is ‘available’ for relationships in the model, the logical processing date would have to be between the hire date and end date.  Physically, this might be replaced by an active flag or employment status code, but the presence of start and end dates provides the ability to logically determine if the employee is active.

Must be nullable.  May need to be delineated by type of end.  For example, employee may be laid off but on salary continuance, requiring an end date for each of those termination phases.

Why this field NEVER belongs on the LDM:

Employment End date might make more sense – do Employees ever really end?

If employee is synonymous with employment, then the start and end dates are critical and required in the LDM.  If an employee can become unemployed then re-employed, then this date is probably in the wrong place.

Why this field SOMETIMES belongs on the LDM:

If this field is really the Employee Termination Date, I think it makes sense to always have on the model, as it is important business information. If this field is being used to capture begin and end employment over time, I would not put it on the logical but wait for the physical.

Only if it is relevant to an HR application….and we only did one….an exit interview application.

Employee may still be working for the company.

Dates are often required for time period analysis, typically in data warehouse models.

Same argument as Hire date, except that not as many business processes depend on this attribute.  It is NOT an implementation-related attribute.

Field       Would you show this field on the Logical Data Model?
Original Product Name?Note not necessarily what the product name is today but what it was when the product first appeared in the marketplace.

Why this field NEVER belongs on the LDM:

I say “never” because it’s not the part of the current point-in-time, but if it’s a business requirement identified during the logical modeling, I may have it there.

Can’t remember a situation where anybody would want to know what a product ‘used to be called’

If a product can change its name, this should be represented in a history of product information. There is probably no circumstance in which the “original” is any more important than the “eighth”.

We try not to include historical data in the LDM, they are included in the relational model.

On new databases, the answer would be that we would never have a field called “Original Product Name” – but would have a life product ID, with related nominative data assigned effective and end dates, kept in a table which would record history over time.

Why this field SOMETIMES belongs on the LDM:

This is good for a cross reference, sometime people only remember the original product name

This is part of history and I guess it depends on how you show “history”. If you archive the table when it changes (name of product in this case), then it can be shown as an older version of the same table (timestamp). No need to keep a column called Original Product Name as there may be many versions of Product Name.

If it is important for product service to identify a product by its original name, this should be retained as a history.

Depends on the usage of the application. If it’s basically a list of current employees then probably not. If it’s a project management system showing everybody that worked for the project then probably yes.

My gut feeling is other characteristics of the product would change in addition to the name, which would then trigger the question of whether or not changes are tracked, in general, and if a name change means it’s really a new product.  But to me, history is a physical implementation in which a change to a record results in history being recorded. So if the business wants it, the original name would be provided for on the physical side.  The logical side should represent the current name.

May be required for historical analysis – again in data warehouses.

This depends upon whether name changes can result from legitimate changes in state, or are merely the result of corrections. In either case, modified values should be audited in some fashion (this is more of a physical design and implementation question). In the case of the former, the state must either be explicitly captured, or a new instance of the entity recorded.

It seems inconceivable that the business has some requirement that concerns the first name of the product but not the intermediate product names.  Very commonly implemented, it would also be derivable if history was recorded.

This will often be needed to support ongoing business rules.  Such as legal or regulatory reporting.  If it’s just needed for trend analysis (as in a DW), or the like, then leave it off the LDM.

It’s probably a very valid business requirement to know the history of product names over time.  Will include if needed, although not necessarily by name — may be via inclusion of effective- / end-dating an entity containing “Product Name”.

Field       Would you show this field on the Logical Data Model?
First Salesperson Id?The relationship between a salesperson and a customer stores not just the current relationship but also the relationship of the salesperson who first made contact with the customer.

Why this field ALWAYS belongs on the LDM:

Once again, assuming it’s needed.  Message is I don’t “add the time dimension later”.

Assuming the requirement is that the original salesperson gets a residual, or if the current salesperson is not the original salesperson the commission is less

If it is important to the user community it should be included.  If it is calculated (derived) it should be flagged as such, and formula explicitly stored if it can be determined.  My views may be contrary to many on this question.  Many people think derived data should not be included in the logical model.  I disagree.  IF it is important to the users then include it, but with the proviso that it be flagged as derived.  Incidentally, SQL allows for derived data items, with ACTUAL (stored) or VIRTUAL (not stored, but realized at time of retrieval).

Why this field NEVER belongs on the LDM:

I think this answer depends on your business and what you are trying to capture.

I’ve never worked on a sales related application, but I could see the need if the would be a commission structure, or for example on an insurance premium application where the original salesman would be paid commission at each renewal.

As above, to me, the recording of history is a physical implementation.  On the logical side, the relationship between the two parties just exists and conceptually represents the current state.

If multiple sales people can have a relationship with a customer, this should be represented in a many-to-many relationship. The nature of that relationship (i.e., FIRST), might be a candidate attribute.

Again, this is an historical row in a resolution table between the salesperson and the customer. Depending on whether this represents a relationship or a contact, you would have appropriate type codes indicating the kind of contact or relationship ( i.e., cold call vs. assigned sales person).

This will be captured in the LDM via a relationship.

Why this field SOMETIMES belongs on the LDM:

If there is a reason to keep this information, then it is needed. As an example, if a commission is made for the first salesperson or if commissions are shared among all salespersons involved, then you need to show it. Or, if there is reporting involved. But if there is no reason and no one wants the information, then fuggedaboudit.

Important if there is either an effort to build repeat business, e.g., a model whereby the initial contact sends postcards or email to the customer, or if the salespeople are compensated by commission and therefore ‘own’ a customer’s future business, or if there is a sales efficiency measurement happening.

It depends if there are any benefits / rules connected to that first salesperson

First contact may have been made by someone other than sales…the service or marketing department for instance.

If the business requires this information (e.g. for payment of commissions).

May be required for historical analysis – again in data warehouses.

If a valid business requirement, this would probably be handled via an associative entity between “Salesperson” (perhaps a sub-type of “Individual” or “Party”, as needed) and “Contact” or “Sales Contact”.

Field       Would you show this field on the Logical Data Model?
Gross Sales Amount?There is a calculation that produces this measure.

Why this field ALWAYS belongs on the LDM:

Assuming you are talking about a sum of the extended values of the line items.  This is my main demonstration of how computed attributes work, including the “SUM-THROUGH” and “INFER-THROUGH” arguments.

Although calculated, this is something that people want to know. Therefore it must be in regardless whether it is actually stored anywhere or not

Why this field NEVER belongs on the LDM:

No. In a logical model there  is really no need for it if the gross amount is an aggregate. On some occasions we have de-normalized and included gross amounts for easier or faster retrieval. However, whenever we have do it, a trigger was created to update it from it’s aggregate parts, not directly from the application.

May be stored in the physical implementation for performance reasons.

This might be introduced in physical modeling.

I would generally not include calculated values in the logical model.

This is a derived field, and these generally have no place in the LDM.

Why this field SOMETIMES belongs on the LDM:

I try not to store columns that can be produced by calculations, unless the calculation takes a lot of work to produce or it is needed always.

If the information to produce the calculation is contained in the model, then it is not necessary to keep the end product. Again, if, for some reason, there is the possibility that the end result could change and you want to keep the original calculation as history, then you might need to keep it.

For accounting purposes.

Depends on tool, notation and if I’ve got a precursor conceptual model.  I don’t always get to make these choices.

Although this is a derived field, it is sometimes necessary to show derived data on the logical model to demonstrate to the business that a concept is in there.  To them, the model may not be complete without it and ‘trust me, it’s in there’ doesn’t always cut it.

Only in reporting databases, not in LDM’s for production databases.

If the business rules for gross sales amount keep changing over time, it cannot always be calculated (derived) and hence is required – especially in data warehouses for historical analysis.

If this attribute showed up as a requirement, the best way to identify this logically would be to include the Gross Sales Amount in the logical model, yet indicate that it is derived from other information.  Otherwise, you are not demonstrating it’s existence.  It then be physically implemented as a calculation, function, column

While I would make it a goal to never record this item in the row that contains the source fields (quantity and price), it might appear in other contexts in other tables, such as a particular salesperson’s monthly gross sales amount, on which other calculations might be made – i.e., compensation.

Although this is clearly derivable, I tend to include this type of attribute (as could appear in “Transaction Header” as a sum of all “Transaction Line Item” Sales Amounts), especially if there is repeated use of the attribute by multiple business processes.  (This is a great example of real-world practicality — the transaction is totaled up at the point of sale and printed on the receipt and therefore exists in the real world, so why not capture that total and avoid future joins and processing, no matter how negligible that would be.)

Field       Would you show this field on the Logical Data Model?
Customer Source System Id?Note this is the unique identifier to a customer in the Order System. It is a surrogate key.

Why this field ALWAYS belongs on the LDM:

If the logical model is being used to show that the business and user requirements are going to be captured in the database, and the source system id is required, then it’s existence can be modeled logically.

Why this field NEVER belongs on the LDM:

Part of the physical implementation.

I invariably have a PARTY ID, but solely as an internal identifier.  I rarely concern myself with source systems.

The best way to make sure a surrogate key doesn’t stay surrogate is by publishing it. Secondly a surrogate key doesn’t have any value for anybody in the business. Why did you create a surrogate key in the first place?

This name and definition are a little confusing.  Is there more than one source of customer ids?  What is it a surrogate key for? On the logical model, I don’t think you need to make a distinction between a customer id on one system vs. another.  A customer is identified by a customer identifier on the logical side.  How many there are on the physical side is a function of implementation.

Assuming that this question means a surrogate key to the CUSTOMER entity in the Order system, this should always be generalized to an external system relationship concept.

This is an implementation-level attribute.  Even if the model is modeling a customer management system, in which the customer’s source system may be a real business need, the ID is at the implementation level.  In the LDM it would be represented by a relationship.

Why this field SOMETIMES belongs on the LDM:

If a DW application with aggregates only, not important but might be part of the staging table record.

May be required for historical analysis – again in data warehouses.

It’s important to record this information in the logical model from the standpoint of capturing requirements. I would not however assert that it is a surrogate key, since it’s likely to be optional.

This depends on the usage. It would be preferable to have an alternate identifier cross reference table, in which the other identifiers are coded according to type (Order System).

This would be more likely seen in an enterprise-wide system, which would tie together all the systems with customer data.

A specific application system, however, is much more likely to simply contain the identifier in the customer table.

To the extent that this links disintegrated data among legacy systems, yes.  If you don’t do this, you may restrict or impede interaction among business units.  In this time of mergers and acquisitions, or just plain ol’ data integration efforts, you gotta’ do what ya’ gotta’ do

Field       Would you show this field on the Logical Data Model?
Monthly Sales Reporting Entity?This is a summary view of order line details.

Why this field NEVER belongs on the LDM:

Summaries don’t belong in a LDM

This is purely a physical construct to improve performance.  I see no reason to ever include it on a logical model.

This is a physical manifestation of a design, not a logical entity.

We will define the order line entity. In the relational model, we will indicate that this are monthly, yearly..  summaries.

Can be derived from detail data.

This seems very close to a calculated value and hence I would generally not include in the logical model.

We do not logically model summary entities; we treat them as physical only constructs.

Purely derived, possibly in the DW.

Why this field SOMETIMES belongs on the LDM:

If this was for a data warehouse, we may store it.

For DW aggregates, yes – for transactional system, the entity ID would be part of the sales records for the daily transactions.

This is something people want to know, therefore it should be in. However, even business people are so used to the notion that reports are extracted from a database that I might leave it out.

Possible reasons to model these entities as logical rather than physical only is that the logical key to the report may maintain it’s own business status.  For example an application that measures the compilation of monthly sales.  During the month, the lines in the report may be not-started, incomplete, full, etc…  There may be links from the report to the organizations responsible for filling in the report…etc.

One would prefer to retain this kind of information in a dimensional model, rather than in a relational model. I would not ordinarily include derived data in a logical model, but there have been instances when this was done, depending on the business requirements.

Certainly in a data warehouse.

If this means that I collect current month here and then delete it prior to collection for the next month, NO.  Monthly sales reporting is absolutely necessary, but should be accomplished by start and end date pairing.  (Some accounting months do not start on the first day of the month nor end on the last day of the month).

Field       Would you show this field on the Logical Data Model?
Employee Access Id?An employee foreign key on the Claim entity which is used to enforce security as to which employee is allowed to view which Claims.

Why this field ALWAYS belongs on the LDM:

This is a business rule

I find it hard to believe that only one employee would have the right to view a claim (what if he/she was sick or left the firm?) and I question if it’s really about employee access as opposed to responsibility for processing the claim.  If this is for viewing the claim, and only one person can do that, but another is processing the claim, we’ve got a problem. So, assuming the person who is allowed to view the claim is also the one working on the claim, then I’d definitely add it on the logical model.  If it truly is only for the ability to view the claim (and I’m having a lot of trouble with that concept), then I’d still add it, but I probably wouldn’t model it this way in the first place.

Why this field NEVER belongs on the LDM:

I would prefer a code that can classify a group of employees.  Each employee would be associated to a security group.  I like this better because the way it is stated on the left you would only be able to designate a single employee to a claim and that would be a mistake.

This is not part of the LDM

I have difficulty believing that only one employee would ever be able to view a claim. At a minimum, this should be a many-to-many relationship between EMPLOYEE and CLAIM, but I would go further and suggest two many-to-many relationships: one between EMPLOYEE and WORKGROUP, one between WORKGROUP and CLAIM.

Needed to enforce security

I would generally argue that this attribute doesn’t belong in ANY data model and that the related data ought not be stored in the database at all. It’s application-specific permissioning functionality.

This does not seem like a good location for that attribute.  Security would be best managed by a flexible, independent, which is best not implemented on the actual claim itself.

This is more likely to be retained in a cross reference type table, which would contain entitlements or varying levels of authorization.  This would avoid conflict where more than one employee would need access to a particular claim, especially considering the hierarchy needed for various levels of approval.  If the processing employee were the only FK in this table, the manager would not have access, unless an additional field were added – better to have a cross ref table than to be constantly adding columns.

Why this field SOMETIMES belongs on the LDM:

Could be designed in an associative entity (preferred) – the presence of it in the Claim entity is a denormalization.

Part of the physical implementation.

Depends how much rules are connected to this ID. If this key plays a major role in the process that is to be supported, I’d show it.

It could be possible that the physical security system can not handle this, so would have to be included in the architected system.  To that extent, I might allow this.  As stated, however (“[…]foreign key on the claim entity […]”) this is quite restrictive — should be on an associative entity between “Claim” and “Employee”.  If the situation is in direct response to Sarbanes-Oxley, HIPAA, GLBA, or the raft of other data protection considerations, I would be more inclined to include it in the model.

1 Comment

  1. Simon Downes 4 years ago

    An excellent post! I’ve recently moved from an environment where there was a hard and fast rule that derived data in any form should NEVER appear on an LDM, and the LDM was purely for atomic data.
    My current post has no such rules and I’m taking a more ‘it depends’ approach, so it was good to find this post support my thoughts! 🙂

Leave a reply

Your email address will not be published. Required fields are marked *

*