Data Model Size

There is a project manager looking to estimate development and support tasks based on the number of entities on the physical data model. How many entities would you think characterize a small, medium, and large data model?


  1. Mark Hoff 5 years ago

    Trying estimating development and support tasks IMO is more complex than simply looking at number of entities. In general, I would classify a model with less then 20 entities as small, 20-50 entities as medium and over 50 entities as large.

  2. Tom Faulkner 5 years ago

    It’s difficult to estimate data modeling efforts based on the number of entities because the majority of the time spent is really at the attribute level. For example, 3 entities could have 30 attributes (a smaller effort) or 300 attributes (more of a medium sized effort). In most cases, requirements include a list of attributes, no business definitions, and the question “When can you have this done?” When estimating modeling efforts I think it makes more sense to come up with a per-attribute estimate which takes into account tasks like; data analysis, getting your questions answered, and logical and physical modeling. Depending on factors like completeness of the requirements, experience of the modeler, and workload I it could range from 15 to 45 minutes per attribute. The level of automation also comes into play. Many ER diagramming tools have macro development tools which can help speed up manual tasks such as setting physical parameters, generating indexes, and naming physical objects. All that to say, I would never give a firm estimate without seeing the number of attributes.

  3. Pam Rivera 5 years ago

    I also consider the number of data elements – which will probably turn into attributes. How they get organized into entities is something we do while designing, not during estimation. After counting the attributes, I multiply by 10 – 20 minutes based on factors itemized in Tom’s comments, and if it a very new initiative (never been done before) add a factor of 1/3 to allow for meetings, conversations, explanations, etc. While the estimates I’ve provided seemed large at the time after the project is over and I’ve looked at our time-entry tools my estimates have been within 10% of the actual time spent, and many times even closer than that.

  4. AJ 5 years ago

    It depends on the number of different real world concepts that are modelled. The logical model would be a more realistic view on it. If there are a lot of physical entities supporting one simple concept like a credit card transaction, then the support effort would be less compared to when the same number of physical entities cover not only the credit card transaction, but also the credit card, the customer, the merchant, the address of the merchant and the customer, the credit card balance, the job security of the customer, the spouse of the customer, the car history of the cusomer, et cetera.

    So it all comes down to the conceptual and logical data models.

  5. Gordon Everest 5 years ago

    First it is important to distinguish a large DATA MODEL from a large DATABASE (i.e., BIG DATA). And I consider the data model to be the “logical” model of the business or user domain, not the physical implementation model. One measure of the size of a data model would be the total number of data items. I have seen it reported that the number of data items for an organization with a single line of business (thus excluding a conglomerate) tends to max out at about 20,000 data items no matter the size of the business (a volume measure). I would then consider anything over say five thousand data items to be a large data model. If you consider the average number of data items per record to be say 10 (pick your number) that would give about 2000 tables. However, I think the better measure of time for a data modeling effort would be the number of logical data items (or objects in a fact oriented modeling scheme such as ORM). Then there will be a range of minutes per data item to estimate the total effort. That number can be developed empirically as some people have reported above. It would be influenced also by such factors as the complexity of the problem domain, the experience of the modelers, the tools available to support the modeling activity, the availability and knowledge of the subject matter or user domain experts, etc. I would not include in that estimate physical implementation, or at least keep it separate. It is interesting to contrast this with the data model for a “big data” application. Some “big data” data models can have just a few (even just two) data items (if there is a logical data model at the outset at all!), which make that a very small data model. However there could be billions++ of entries/records for that model, hence “big” data.

  6. John Giles 5 years ago

    Two quick comments, please.
    Firstly, Steve, you are asking about the size of a model, and also asking about the size of the related development effort. Some of the comments already provided quite rightly correlate the two – if there’s more data (entities, attributes), then it’s probably quite reasonable to infer that typically you might expect there to be more development effort. Back some time in an ancient past, I encountered work on “Function Points”, used to size development effort. One factor (but only one) was the number of tables.
    Secondly, and my major point, is that the level of generalisation can impact the count of entities. If I have separate entities for vehicles such as Car, Truck, Motor Bike, Boat, and Aircraft, that’s five more specialised entities. Conversely, if I have one generalised Vehicle entity, it may replace the need for the five specialisations. Or I may implement the Vehicle supertype as well as the five subtypes i.e. have six tables. Rather than entering into a debate on the relative merits of such approaches, I simply wish to point out that when counting the number of entities in a model, generalisation / specialisation may impact the counts for what could be considered to be the same business area.

Leave a reply

Your email address will not be published.