The model below is from an actual project (with the subject matter changed to protect the innocent entities):
The only demographics important to us in this model are State and Area Code. A Person can reside in at most one State (New Jersey, New York, etc.), and live within at most one Area Code (516, 212, etc.). A State contains many Area Codes and an Area Code belongs to one and only one State. Both a State and Area Code are optional for a Person. Therefore it is possible for a Person to have both a State an Area Code, either one, or neither.
Sounds like a simple model, right?
Here is the challenge: Is it possible to enforce on this model that if both an Area Code and State are populated for a given Person, that the Area Code is valid for that State? For example, the Person Bob lives in New York and therefore his area code must be one of the valid area codes for New York. This model enforces that the area code 212 for example belongs to New York, but does it enforce the valid combinations of State and Area Code within Person? If yes, explain how. If no, explain how (if it is possible) we can change this model to make it show this rule.
To summarize this challenge, is it possible to enforce on this model that if both an Area Code and State are populated for a given Person, that the Area Code is valid for that State?
It always helps to look at some sample data as well (which was conveniently omitted from the challenge description :)), so let’s look at 4 sample rows:
#1, Bob Smith, other fields…, New York, 212/New York
#2, Jane Doe, other fields…, California, 212/New York
#3, Francis Smithers, other fields…, New York, 973/New Jersey
#4, Steve Smith, other fields…,New Jersey
Note the ‘212/New York’ represents the foreign key back to Area Code, which is made up of the area code and the state code in this design.
This sample data reveals that our existing model cannot enforce a valid area code and state pair. We can do it through code but not directly in this model. One way of ‘fixing’ the design to handle this rule is using subtyping such as in the following diagram:
Before walking through this model, a quick disclaimer: I don’t like the ‘Person State’ and ‘Person Area Code’ either, but they will show the idea. A person communications entity with subtypes would probably be more ideal. So in our sample data, rows 1,2 and 3 use the Person Area Code subtype, which contains both the area code and state. Row 4 uses the Person State entity as only the state is known. By removing the optionality on the State and Area Code side of the relationships, we can enforce this rule in the model.
A warning though…it is easy to go ‘subtype happy’ and we need to realize that some rules simply may not belong on a data model, echoed in David Hay’s response below.
Speaking of responses, a lot of the responses explained the rule can be enforced because Area Code contains the state in its primary key and there is a relationship between Area Code and Person so State can always be derived. State will appear twice in the Person entity, once through Area Code and once directly from State. However, there are no data modeling police to ensure that they will always have the same value, as in rows #2 and #3 above.
Here are the responses (listed alphabetically)… some are similar to the thoughts you just read and some are quite different (which keeps our field so fun!). Due to the large number of responses, in several cases I selected one response to summarize several:
The person is not dependent or either the state or area code for its uniqueness because as stated the relationships are non-identifying. The model itself does not enforce the restriction of an area code for a person being restricted to a state. Application logic could make this a referential restriction but the model does not. Since state to area code is a many to one and area code to person is many to one, a hierarchy of state to area code to person could be constructed forcing the state code and area code to be included as part of the key to the person entity, thus enforcing the rule through the model. As the uniqueness of the person does not depend on either of these two entities, this model does not represent the real world but would enforce the rule.
This is a perfect example of how data models cannot enforce business rules. A data model is about structure. A business rule is a constraint on how the structure cannot be used, so it is not appropriate to represent such a rule on the same model.
In this case, if you remove the link from PERSON to STATE, you could say that the only way a PERSON can be recognized is to specify an AREA CODE (and by implication, a STATE), but that would not permit a PERSON to simply specify the STATE.
No, the constraint you describe must be described in the business rules model, which is a different model. (And is harder, since (Ron Ross notwithstanding) an acceptable modeling technique for rules has not yet been developed.
By the way, the entity class name should not be “Area Code”. That’s the name of the attribute that identifies a thing called a “Telephone calling area”. (We’re always getting entity classes and attributes mixed up.)
To enforce this rule, the model must be changed to include an exclusive arc on the relationship between STATE and PERSON, and the relationship between AREA CODE and PERSON. This would result in the relationships being read as “Each PERSON may reside in one STATE or may live within one AREA CODE”. In the case of a PERSON living within one AREA CODE, the identifying relationship between STATE and AREA CODE will specify the STATE associated with the selected AREA CODE, instead of the relationship between STATE and PERSON.
A sub-type of person_state or a person_state_code is needed to associate a person with his/her state of residence.
The model supports but does not enforce the rule; processing logic must be used to enforce the valid State / Area Code combination at the Person level. As long as State and Area Code are optional, the model can not be changed to show the rule.
Barry A. McConnell
You have a redundant fact here (either state or area code). Since you say that either is optional we cannot model this correctly, i.e. person is assigned to one area code which then determines exactly one state, we have to make the relationships from person mutually exclusive. If area code is known, it is recorded, otherwise record state. Implementation could determine how you go about changing from the more general (state) to the specific (area code) through DB triggers or program code.
Given a choice of two answers, I thought I’d go for the likely-to-be-less popular one and justify my choice. (The real answer is “can’t tell”).
Here’s my logic for the “yes” response: If the “contains” relationship between Person and State is defined to apply only when there is no relationship from the person to Area Code, then the rule is enforced. “Hey” you say, “where did you get that definition from? That’s a lot to read into ‘contains'”. Well, the alternative is that the modeler has specified an E-R model that supports redundant data (in the cases where we could determine a person’s area code via two different paths). In my experience, modelers are more likely to use ambiguous or misleading relationship names than to tolerate redundant data…
This sort of structure occurs really commonly – I had an email last week about exactly the same structure – and I’d hope Len Silverston and David Hay will be writing in saying “See page x of my book of patterns”. It illustrates a number of points quite nicely:
1. Relationship names matter. Change the name (and associated meaning!)here, and you change the rule. (“Who cares about relationship names?” said the DBA – “they don’t affect the DDL”)
2.There’s more than one possible answer to most data modeling problems, and one of the main reasons for differences in models is choices made as to where rules are held. In this case, if we don’t enforce the rule in the model, we have to enforce it elsewhere (code, manual procedure, data). And any solution will be a trade-off; here we can see trade-offs amongst rule enforcement, elegance, ease of programming, communication and redundancy.
3. Sometimes a simple additional diagramming convention can be useful. The “exclusivity” arc (Data Modeling Essentials pp140-141) is a well-known way of showing that only one relationship applies in each situation: add this to the diagram and the intent is clear.
4. We have several techniques for enforcing rules in data structure: basically domains (definitions of entities and attributes), subtypes, primary keys and relationships. It’s an interesting exercise to try to enforce the given rule with each of these. When I see an exclusivity arc, I always think “should I create a supertype?” – in this case a generic “location” entity, which might prompt some different implementation options.
Since you don’t show the attributes, the answer is “it depends”. Since State has an identifying relationship to Area Code, the primary key attribute(s) of State would be part of the primary key of Area Code. If no role names are assigned to the State primary key attributes on either relationship to Person then only one state would exist in Person and the FK relationship constraint to Area Code would ensure that the state value is valid for the Area Code and vice versa. If a role name is assigned to the state identifier on either relationship then 2 state values would exist and they could be inconsistent.