The Importance of Mutual Exclusivity and Data Collection Fatigue

As I stated in our last blog, “Data As An Asset > Data As An Expense“, there is an investment required in data to both create and maintain that data as one of your company’s most valuable assets. In product data, there are many best practices that guide businesses on how to properly apply the data as an asset concept. One of those best practices is an understanding and application of mutual exclusivity to your data collection mechanisms, especially on how it generates data collection fatigue.

Data collection fatigue is the primary reason why initial data collection results in poor data quality. It is caused by a combination of bad UI/UX design and poor product taxonomy design. As users encounter friction in the product onboarding process, they make poor choices for the data they submit. THe more friction they encounter, the poorer their choices become. We won’t deal with UI/UX friction in this blog, but we will deal with some elements of product taxonomy design and how it affects data collection fatigue.

Mutual exclusivity ensures that each piece of product data is distinct, non-overlapping, and meaningful within a structured hierarchy. This includes your taxonomy hierarchy design, attribution strategy, and the values you allow within those attributes. Redundant taxonomy nodes and attribution lead to confusion in the data collection process, which increases data collection fatigue, which results in a decrease in data quality.

There are three distinct areas that mutual exclusivity applies in product data collection: hierarchy, attribution, and allowable values. Most important;y, we’ll include examples where a lack of adherence to the best practices of mutual exclusivity cause issues in your data collection AND data presentation systems, leading to poor customer experiences and issues attempting to maintain a cohesive brand message.

1. Mutual Exclusivity in Product Hierarchy

Mutual exclusivity within your product hierarchy is the most visible application of this best practice. Every product should have a single terminal node (the most granular node in a taxonomy), and the reasons for this are very important to maintaining data quality. 13 years ago, someone I worked with coined the phrase “A pea is a pea is a pea” (Thank you Tom) to understand mutual exclusivity, so let’s use that phrase as an example of why this is so important.

Grocery
├── Produce
│   ├── Fresh Vegetables
│   ├──  ├── Peas
│   ├──  ├── Beans
│   └── Canned Vegetables
│   ├──  ├── Peas
│   ├──  ├── Beans
│   └── Frozen Vegetables
│   ├──  ├── Peas
│   ├──  ├── Beans

In this simplified example, we have groupings of nodes in the hierarchy for fresh, canned, and frozen vegetables. Under each of those sets of nodes are nodes for peas and beans in each format: fresh, canned, and frozen. Each of these seems to follow the concept of mutually exclusivity, as there is only one terminal node per product type: a frozen pea only has one location to be assigned.

However, this is problematic for two reasons. The first is that the attributable difference between a pea and a bean is that it is either a pea or a bean. This can be handled through an attribute, reducing the number of nodes required to classify a record while ensuring that the attributes collected for both peas and beans are kept in sync. If someone adds an attribute to Frozen Peas, in this example they must add it to the 5 other nodes, seeing as all of these nodes would have similar attribution. That is a point of failure in a product taxonomy maintenance process that can be easily avoided.

The second issue is that the attributes you would collect on frozen vegetables, canned vegetables, and fresh vegetables are basically the same attributes. The delivery formats (packaging type, storage type, etc.) should be attributes on a single node for “Vegetables”. The entire complexity of how this set of nodes is managed could be collapsed into a single node with a common attribution set, simplifying the data collection process.

That simplification avoids data collection fatigue, where users inputting the data become exhausted with the complexity and start submitting classifications and data in the quickest way possible without worrying about the quality of that data. If the data fatigue starts at the classification step imagine how fatigued that user is by the time they start inputting data.

Why It Matters

The above examples are what is known in the taxonomy world as attribute stuffing. Any time the name of a node includes an attributable data point in that naming, it is frowned upon by taxonomists. The reason for this is simple: If a user searches for “Peas” in the above example, they could simply classify their product against the first peas node regardless if this is the correct classification. Now that the product is classified incorrectly, the data collection process starts at a deficient state. The data collection fatigue starts early in this example, and will cause issues throughout the rest of the data onboarding process.

This example might seem extreme, but you see this every day in websites you visit. How often do you click down the hierarchy on a web site (for those of us that still do) and find the products you expect AND several products that just don’t make sense in that node? That site that used to just sell books is FILLED with examples like this. It always starts with classification issues, and the data quality only suffers beyond that.

There are more extreme examples of violations of mutual exclusivity in hierarchy design, like when latops and tablets blend for 2-in-1 devices or when a product can be both furniture AND decor. However, this example is the most common example I see when reviewing product taxonomies: Attributes stuffed into the names of nodes that lead to additional unnecessary complexity in hierarchy design, which leads to mis-classifications of products. Mis-classifications of products lead to poor data collection, products not being findable where they should, confusing search experiences, and missed sales opportunities that are all avoidable with some very simple tweaks to your taxonomy design.

What Can You Do About It

The simplest answer would be to let trained product taxonomists handle the development and maintenance of your product taxonomy, but that isn’t feasible for many businesses. Therefore, the following questions are good to ask when putting your nodes together:

Is there somewhere else someone would expect to find similar products?
Could I use an attribute to define the difference between two products I may expect in these nodes?
Can a data inputter find this node without having to click to multiple end nodes?
Are the attributes I would ask on these two nodes any different from each other?

2. Mutual Exclusivity in Product Attribution

Mutual exclusivity within product attributes is more difficult to see than within hierarchy design. The application of the mutual exclusivity best practice involves not having multiple attributes with the same definition applied to a single product. It seems simple, but it creates chaos when you look deeper into its application.

Let’s go back to our peas example. Let’s assume that the issue with the nodes is behind us and we are now filling out attribute data on that node. The following attributes are shown to you in the data onboarding system:

Storage Type
Type
Package Type

There are three “type” attributes shown. The storage type attribute states where you are supposed to store this product. The type attribute differentiates between fresh, canned, and frozen. The package type attribute states whether the product is in a bag, a can, or a plastic container.

It may seem simple to answer each of these questions, but imagine having to do this for 10, 50, or 1000 products. Eventually, the inputter of this data is going to start finding shortcuts to answer these attributes quickly. Data collection fatigue sets in when the friction in the onboarding process is filling out the same attribute multiple times. Although you could state that there are slight variations in the definitions of these attributes, those differences are not significant enough to warrant all three attributes. You could infer from one attribute everything that you gain from the next two attributes.

Why It Matters

This type of attribute confusion will increase data collection fatigue. When data collection fatigue sets in, data inputters start looking for the fastest possible button to click to move on to the next attribute or sku. They aren’t looking for the right answer to the question: They are looking at how to not have to think about what you’re asking for by selecting the first possible value that makes any sort of sense, and the more fatigue that sets in the less the value has to make sense to that person.

This is ecommerce and website experience implications. If someone experiencing this data collection fatigue chooses “can” for a type but storage of “Refrigerator”, as data onboarders will when they get to item 500 out of 1000, your canned peas my show up in your fresh peas experience. It will also look odd to the customer when the peas say they are canned but require refrigerator storage, a bad customer experience.

Again, this is a simplified example. The real examples of this are generally around the attributes defining color, material, and applications of products. In reviewing product taxonomies, our number 1 goal is to simplify this attribution, not simply add more attributes to collect every possible data point. Simplified attribution with clearly definable attributes reduces data collection fatigue. 17 color attributes does not.

What Can You Do About It

My number 1 suggestion when it comes to understanding attribution in a product taxonomy is clear: Balance your need for detailed attribution with the complexity the data inputter will find in completing that data. If you are building onboarding friction into your design, people inputting data will get fatigued, fill out data the fastest way possible, and your experiences will suffer. Therefore, define each of your attributes within a category and review those definitions against every other attribute definition in that category. If there isn’t significant differences in the definitions, use a single attribute.

3. Mutual Exclusivity in Allowable Values

When you set a list of allowable values within an attribute, often called a LOV or a Picklist, care must be taken to understand the definition of each value in that list. However, you must also understand how you are going to apply that list, as the difference between single select and multiple select lists is vitally important to understanding mutual exclusivity.

Let’s go back to our peas example. Imagine you have a single-select attribute labelled “Certifications” that contains the following list of values:

Certified Organic
Certified Vegan
Organic

It’s blatantly obvious in this example that Certified Organic and Organic overlap. Seeing as Certified Organic would show up first in the list, based on alphabetical order, I’d probably pick that value. But not everyone would, and seeing as only one value can be selected from this list anyone selecting Organic would remove their products from any experience depending on the value Certified Organic.

Why It Matters

My example above is made up to make a point obvious, but mutual exclusivity in attribute values is a real problem. Every time someone adds “Blue”, “Light Blue”, and “Dark Blue” to a color attribute LOV they violate mutual exclusivity. But every time someone adds “Metal”, “Wood”, and “Metal and Wood” to a multi-select LOV for a material attribute they violate mutual exclusivity in a different way. Where all light blues are blue but not all blues are light, multi-select attributes create situations where someone can select “Metal” and “Wood” for one product and “Metal and Wood” for another, again compromising a customer experience.

Having to determine which is the right value to pick when onboarding product data into a system shouldn’t involve deciphering the intent of the person who created the list of values. Did they want you to select the individual materials or the combined values? Or both? What if you have plastic in your product as well? Do you need to click “Plastic” or ask for a new value of “Wood, Metal, and Plastic”?

What Can You Do About It

The easiest methodology for avoiding the data collection fatigue associated with LOV mutual exclusivity is a functioning data governance framework that includes a process for vetting LOV additions. It is far too easy for a single person in a silo to create this kind of problem unwittingly. It may seem like over-engineering to do this, but any values added to a LOV should be vetted by more than a single person. That gives the opportunity for the taxonomist asking for the value add to define their reasoning, and an opportunity for that reasoning to be challenged where appropriate.

Outside of a governance process, good product taxonomy discipline must be implemented. Trained product taxonomist will catch and understand how to resolve these issues without creating more friction in the data onboarding process. Whether your business utilizes a third party (shameless plug for our services here) or hires trained taxonomists, the ultimate goal should be an investment in your data quality that avoids the headaches of that call at 8pm the night before Black Friday, where the CMO wants to know why a specific experience is failing.

In Conclusion: Data Collection Fatigue Is the Enemy

There are many ways to spin what has been said above to turn it into a sales pitch or make you believe that TrailBreakers is the only company qualified to solve this issue. That’s not what this blog is about. Yes, TrailBreakers is passionate about product data, and we like solving these problems. But the more important takeaway from this blog should be: How am I going to fight data collection fatigue?

I’ll discuss UI/UX in a separate blog shortly, but product taxonomy ties hand-in-hand with UI/UX as the reason for data collection fatigue. You can have the best UI in the world, but it will still create poor data if you’re not applying other data collection principles correctly. Paying attention to the structures that enable data creation is just as important as monitoring the data itself.

The impacts of poor data quality are well known. If the data is poor, people can’t find your products on search engines or your website, don’t trust your data when they do find it, buy the wrong product and have to return it, or buy someone else’s products. This increases the cost of acquiring the next customer to replace the lost customer or the costs involved with the returns policy, sapping margins and deluding net income.

Therefore, the application of the concept of data as an asset requires that the product data asset be governed under many best practices, including mutual exclusivity.