Can an indicator be NULL?

Can an indicator be NULL?

An attribute that is defined as an indicator has only two values, such as Active/Inactive, Yes/No, True/False, and On/Off. I was asked during a training class this week if NULL (empty) can also be a value. Can an indicator attribute have three values, such as Active, Inactive, and NULL? That is, can an indicator attribute be optional? If an indicator attribute value is always required, than only two values are possible (e.g. Active/Inactive) and a NULL value, if it occurs, will be replaced with a default value (such as Active).

Which do you think is better and why, an indicator attribute that is defined as NOT NULL and uses a default value, or an attribute that is defined as NULL and allows the attribute to be optional?

7 Comments

  1. Greg Long 3 months ago

    A great question and one I think we all grapple with at some point in any design effort. I think a fact can remain unknowable, especially when a record is created, so NULL – ‘unknown’ – is probably a legitimate response. It raises the issue of knowing the state of completeness of a record – in this instance a NULL suggests the record is incomplete. So then, how might we maintain incomplete records – are they unreliable until resolved, do they have a low quality code, will they ever ‘get’ resolved?

  2. Carol Lehn 3 months ago

    It depends because we have to consider the business requirements and business process…

    If the indicator is on a new table, it should be Not Null with a Default UNLESS, at the time the data is being captured, the business partner providing the data doesn’t know. For example, the business process could involve several stages of data acquisition where the indicator isn’t known until the second stage. In that case, it would have to be nullable without a default.

    In the real world, an indicator could also be added to an existing, populated table. In that case, it should be defined as nullable without a default if you are unable to make assumptions about the appropriate values for existing rows. Once the indicator is set to the correct value for all existing rows, it could be changed to Not Null with a default for all new rows going forward.

  3. Eugene Hoffman 3 months ago

    A UI control that might typically be used with the binary indicator: radio buttons. In the case of a Null value, where allowed, it could represent no selection, thus useful for a validation check of a required field.

  4. Robert Gordon 2 months ago

    Since we model to abstract concepts from physical reality, I would have to say that an indicator (like a lightswitch) in general should not allow null values. One should probably enforce a default value constraint if it makes sense. Possibly (definitely for a data warehouse) one should allow specific alternate states that would include “UNKNOWN”. Thus, for example, if the value for the sales region attribute was missing or failed validation in (otherwise legitimate) sales order transaction data, sales reports would still account for all legitimate sales, but simply report some of those sales to a region of “UNKNOWN”, along with legitimate regions—but all sales would be reported and appear in the total sales numbers.

    Nulls in general should be frowned on. For numeric attributes that may be involved in calculations and/or aggregations, nulls may be a necessary evil (although we could certainly code a numeric value that is out of bounds in the business rules, and handle it appropriately in applications—such as a negative value where none should appear, or a date of 31-Dec-9999). For text attributes one should generally favor either a default or else specific alternate values representing meaningful exception states—“we don’t know”, “the value didn’t pass validation”, or other discrete values that would be handled by applications according to strict business rules, and could direct data stewards to investigate and attempt to have data integrity problems fixed upstream. A goal of continuous data quality improvement would be to ferret out the root causes of quantities, counts and dollars being reported to “unknown” categories and ultimately reducing those to zero.

  5. Brian G Nagel 2 months ago

    It is based on the business rule that the indicator is representing. If a form filled out by a customer has optional indicators such as are you married if the customer does not enter yes or no this does not indicate they are married or not they just choose not to answer. So holding the null value actually indicates that they choose not to answer not whether or not they are married. So null representing unknown would apply.

    If the business rule is we want to default the answer to not married but have a second indicator that shows they did not answer then nulls would not be acceptable in either indicator.

  6. Ian Varley 2 months ago

    I think if you want to accurately model knowledge about the real world, like whether an actual light switch is turned on or off, NULL (I.e. “unknown”) is unavoidable; you cannot have perfect knowledge. (You could alternately go with a more clear 3-valued enum of “on”, “off”, and “unknown”, since NULL doesn’t have to mean “unknown” in all cases).

    If, on the other hand, you want to capture information about the state of a software system, it’s totally legit (and much simpler) to go with two values and a default, and dissow NULL. That’s like saying, “the only thing that can change the state of this value is the software, which we can control completely, so there’s no such thing as not knowing the value”. In that case, the state of that value is the ground truth. For example, if I flip a toggle switch on my phone to enable airplane mode, there’s no place for NULL; the truth of the matter is dictated by the position of the toggle.

  7. Matt Taylor 4 weeks ago

    We made this a standard set of values for an indicator at my company: Yes, No and Unknown (using 1,0,NULL). So if you want to know if I am holding a quarter in my hand the answer, when I show you my hand, is yes I have a quarter or no I don’t, but if you can’t see my hand, or haven’t asked me the question then the answer is “I don’t know”. There are many cases in business where we are collecting data about an object or an event and at the time we collect the data the answer of yes or no may not be known, so in that case the answer has to be “unknown”. Reminds me of “Schrödinger’s cat”…

Leave a reply

Your email address will not be published. Required fields are marked *

*