Let’s assume your organization has a globally agreed upon definition of a classification: “A way of grouping similar items together based on the physical attributes of the product.” So a candy product for example, might have these classifications:
- Category – sugar, chocolate
- Size of product – king-sized, fun-sized, etc.
- Seasonal product – yes/no
- Storage – frozen, room temperature
Your company a very large ERP tool that makes it relatively easy to add these new classifications without customizing the software. However, this ease of adding new classifications also has a dark side, as there have been attempts to add classifications that really aren’t classifications at all. They have been proposed as classifications because it is easier (and cheaper!) to add a new classification than to add a completely new field. In other words, it is easier to add a new row than a new column.
Would you add these “fake classifications” to the system? Lets say someone comes up to you, and says they want to add a new classification. It is the region the product sells in, such as North America, Europe, etc. You know that this has less to do with the physical attributes of a product and more to do with the relationship of a product to a geographical region. You believe it is definitely not a classification – would you add this one and others like it to the system? Yes or no, and why? If you answer “yes”, what effect will this have on your true classifications (and your business)? If you answer “no”, what good argument can you give to counter the time and money savings of adding these as classifications?
If we generalize this design challenge, it is really asking if it makes sense to use a data element for something not originally intended in the business or on the logical data model. So we need to first decide if these “fake” classifications are really classifications and if they are maybe we should broaden our logical definition (and therefore business view) to accommodate. If we decide that these are not valid logical classifications is there a chance that they are valid physical classifications? We know the physical data model can at times be substantially different from the logical to accommodate the reality of databases, reporting tools, existing applications, etc. This can include using techniques such as denormalization, partitioning, and indexes. Maybe there are good functionality or performance reasons why “fake” classifications should be stored with the rest of the classifications. If there is not a good logical or physical reason, I think it is safe to stay that this is a short-sighted move to save $$$ and time now (which, as we know does happen every once in a while, :O)). In this situation, I would tactfully present my thoughts to the project team and listen to their thoughts as well.
Before we read our responses, I know you’re wondering which piece of paper Sadie picked up for this challenge…drum roll please….Sadie picked up a piece of paper with David Hay’s name on it. Congratulations Dave and your prize is on the way. Enough from me, let’s hear what our group has to say. I’ve presented the responses in two categories, Yes and No, and within each category the responses are listed alphabetically by author. Hope you enjoy reading these responses as much as I did!
Yes, I would add these “fake” classifications
With the ERP tool, there are usually base tables and extension tables attached to them to handle either one-to-one or one-to-many relationships. One could probably use an existing column in a one-to-many extension (1:n) table to track where the candy/asset is being sold. This should not cause management heartburn b/c it would not require a schema change within the tool.
In a conceptual data model absolutely not. A data model is supposed to represent the true nature of all relationships, and the fact that a product may be sold in a region is a fact that deserves proper representation. Indeed, in your example, each of the kinds of classification may call for a separate entity.
Having said that, a particular physical database installation may only be concerned with products and their “predicates”, so as a practical matter, having a simple CLASSIFICATION and what I call CATEGORY and CATEGORY TYPE tables may be a very good solution.
Of course, once requirements change, the designer may be compelled to bring one or more CLASSIFICATIONS and CATEGORIES (of CATEGORY TYPE (“region”, for example) into separate tables.
Is that wishy-washy enough for you?
Pieter de Bruyn
Basically it is not a “fake classification”. If the Organisation has it’s data correctly classified, the region should be defined in a classification called Location. If the region has not been defined or added to the Location, first add the region and then use the relationship between Product and Location (the table indicating the specific location for a product, that includes the physical address) to indicate in which regions the product can be sold. This should not have any effect on the true classifications or the business, because it is already defined and part of the organisations solution.
I would prefer to keep the classification structure pure to its original intent. Considering it is a purchased application, one should confer with the vendor to understand the impact to the application and associated functions, such as reporting. Also, it would become very messy, if not impossible, to track attributes and other relationships associated with the “classifications in disguise”. However, I answer yes to this question because I am a realist and understand that in a crunch, things aren’t always implemented ideally. I would review all these limitations with the team members and if it is absolutely needed by the business, then I would add the fake classification. I would also discuss short term and long term implementation of this request and convince the decision makers to address the fake classification in a future release.
The ingredient list of a candy ultimately affects the salability of a product particularly as it fits the tastes preferences of a particular geography or cultural group. Thus, I would say that the information should be captured or tracked. For example, Hispanic and Spanish speaking people have some preference for cinnamon flavoring in chocolate. The problem is that the ingredient in this example cinnamon is a small constituent but important. Another consideration is that as people who favor this type of chocolate move about the world, the movement may manifest itself as geography.
I would add the row and have it act as a pointer to other model structures that contained the detail on these multi-faceted classifications.
These classifications are how the business see their information needs, they must be modelled at the information level. It is up to the system and database designers to decide how they can present the classifications to the business user in the most efficient manner. Modelling of classifications using tradition data modelling methods is fraught with danger. However using the Corporate Business Modelling Language (CBML), classifications can be handled rigourously and precisely, attributing information to the correct entity classes, keeping the subject entity-classes pure, while allowing them to be classified as the business requires them. So the true answer to your problem is of course Yes and No, since all business classifications need to be captured, but not as part of the data structure. If you require more information on CBML, then please initially see the web site cbml.info, which is still under development or contact me direct.
Given the forced choice of “yes” or “no” I could play it safe and say “no” for the moment – but I’ve said “yes” with qualifications, because I have more arguments on that side… The only immediate justification I can offer for standing in the way of this (presumably relevant) “someone who came up to me” is the statement that the classification has “more to do with the relationship with a geographical region”. If the database already contains information about geographical regions, then I don’t want to duplicate it.
I’m less swayed by the “you believe” and “true” vs “definitely not” arguments…
Let’s look at the problem a bit more carefully. It seems to me that the ERP package supports a “many-to-many” relationship between “classifications” of a particular kind and other entities e.g. in this example, a product could be sold in many regions. I base this on the statement that by using classifications, we avoid adding a column to the product table – which would be the usual way of supporting a many-to-one relationship with Region. Of course you can enforce a many-to-one relationship using constraints on the intersection table, but it’s not the most obvious way of doing it…
On this basis, we could use the “classification” facility to handle not only single valued attributes, but multi-valued attributes…
And at this point, I’m starting to say “if that’s the way the tool does it physically, get with the program!”… subject of course to not making a mess by duplicating stuff like Regional information.
Now there may be other good arguments against using the “classification” facility, but they’re not apparent in the description of the problem. What is apparent is an argument based on what is “true” vs “fake” rather than what is practical and workable. I don’t like these arguments: classification and entity definition is about usefulness, not truth.
Prima facie, “easy” and “cheap” is good! IF, as a data modeler, you have an uneasy feeling about the elegance of the solution, you’ll need to translate that into practical impacts on quality or cost: does it make the database less flexible, harder to understand, harder to program against, more difficult to enhance, less able to enforce business constraints…? Does it introduce redundancy with associated update costs? You may need to dig deep to answer these questions: and if the tool is well-designed and the “classification” technique solidly embedded, you may find that your fears are unwarranted.
More broadly, the issue of modeling and implementing classifications is one which is typically handled inconsistently and clumsily – even by modelers not constrained by ERP packages. Graham Witt and I have a good look at it and make some recommendations (essentially model consistently, implement according to facilities available) in our new Edition of Data Modeling Essentials (left the commercial for last…)
No, I would not add these “fake” classifications
Charles S. Anderson
I would not add the REGION as a classification. My argument would include the definition of a classification “A way of grouping similar items together based on the physical attributes of the product.” followed with the comment that location where a product sold is not a physical attribute of a product. I would also advise that the REGION or concept of a location can have many other values and relationships such as warehouse location, manufacture location. So by opening this short cut you leave yourself vulnerable to many other “exceptions”. I would recommend that the data model and ERP application be modified to include a LOCATION entity which would be much more beneficial by its flexibility and use in reporting sales information.
In order to facilitate my explanation, I would develop a data model diagram and use it to describe the need for separate tables. The picture would make it clearer for someone to understand that adding the new table provides the metadata that makes the model more understandable. Reuse of columns for multiple purposes leads to convoluted and difficult to understand database designs. One of the main complaints of business users is that the database is difficult to understand for users who are developing their own reports. I would also use this model with sample data to illustrate the potential for data quality issues from combining multiple non-related values in a column.
First, the “real” classifications would be corrupted with non related data and would likely add confusion to those who use the real classifications. Reporting is complicated as now, these non related “classifications” must be excluded from the real ones. Terminology is now corrupted and incorrect values are listed with correct ones. The initial meaning of the field will most likely be lost over time.
A region is a totally different domain with it’s own set of valid values. To make it a classification (and populate it with the other classification existing values) would be like adding apples and oranges into the same basket. Whenever I wanted fruit I could simply pull out something and I would get an apple or an orange. I wouldn’t care as long as I got a piece of fruit. But if I wanted only an orange, I might have to pull out a lot of apples to get to the orange. So, if someone wanted to list products by region, they may have to look through a long list of categories, sizes, seasonal, and storage lists to get to the region lists. Questions — If everything was in the same classification, how would they know they had a region list (or a category list, size list, seasonal list, or storage list) in the first place?
It’s important to keep the classifications standard across all parts of a company. Otherwise the classification loses its reason for being. Suppose you put “deer crossing” signs in places where there aren’t any deer. Eventually people will ignore the signs in areas with deer. Causing mountains of road kill. You don’t want the same chaos in your data models. Especially if you have a large ERP system that’s accessed by many people. Records will be inconsistent, data will slow down and costly mistakes will be made. In other words, don’t cut corners. It will come back to haunt you.
By adding a new “entity/column” we would be begin defining the geographical subject area which would support future development needs in addressing business requirements of a geographical nature. Kinda like a “pay now or pay later” situation as well as confusion to the business definition of the column because of “dual function” column content.
I would not add these “fake classifications”. It is true the one-time cost of design, implementation, and meta data creation occurs only once. However, the cost of:
1) Filtering out “product / region” from BI and other reporting will occur many times.
2) Explaining why “product / region” is not really a category will occur frequently.
3) One incorrect business decision based upon this bogus and confusing classification could be orders of magnitude times more than the cost of adding new columns.
This type of ERP tool customization (adding a new column), properly designed, documented, and communicated is justified in this particular case. However, “faking” the classification is not.
The ‘region’ data is probably already defined in another system, as a Sales Region, or a Financial ‘slice / view’. In this case there is no need to pay to add it to the Product data. The request from the user is for reporting and analyzing Sales, not for extending the product definition. If there is a ‘very large ERP tool’ at this company, then there is likely a data warehouse / mart / reporting / analytical environment as well. Assemble the required pieces for reporting in this DW environment.
1. There is always different column in the ERP application that would give this information like the inventory warehouse, Sold by sales rep attributes etc that would give this information.
2. There is a many to many relationship between product and region. It would be difficult for the business to maintain this data. Product can stop selling in a region or we can start selling the product in a region.
3.Product sold in region is a transational data and not attribute of the product. The data can be derived from transational data.
Christian A. Strasser
I would not primarily on the basis of mutual exclusivity. That is, a classification by definition consists of a set of choices that can only have one in each set be true. Thus, a product cannot be both a seasonal and non-seasonal product.
Adding fields to an item also allows for future expansion based on the fields. A better model would have product and sales_area as separate tables with a product-sales_area table between them. This would allow for much more flexibility.
As a classification, if something like the region changed or expanded, some sort of clumsy method of handling this would be needed (such as combo codes). Further if you wanted to track history of where a product was sold, it would not be possible.
As we know the entropy of the universe is increases , similary over time the entropy of the Data Model increases . The Universe slowly goes from a ordered energy/matter system to a ministronee soup of lower enger and matter , so a LDM goes from a Normalised model to a denormalised model ( star schema ). To keep a model in a normalized form requires energy or in this case BUDGET. Back to the question , altho it is easy to add the fake classification and uses less budget , this will be the start of the insidious decline of the model from a normalised model to a denormalised model.
While it would be relatively easy to accommodate the first couple of “fake classifications”, eventually the addition of more and more of these would cause enough database anomalies that the cost of having to write special code and/or queries for certain situations would outweigh the benefit of being able to include the “fake classifications” by just adding a new row. With the “region the product sells in” classification, I feel it would be a much better solution to add an associative table that links the region and product tables to indicate which products are sold in which regions.
The argument for adding these as classifications is incomplete. Existing reports and processes using ‘real’ classifications would have to be updated with logic to properly utilize/exclude the ‘false’ classifications. This will increase maintenance costs over time as specific data knowledge will be required for any new development or changes to existing work. Meta-data cost also rise. Finally, the logical model will end up generalized to the point that it will not be as useful as a tool to describe the business.