Improvements to coding quality: summer 2022

Over the last 12 months our Workometry text analysis service has been undergoing rapid change. We’ve been focussing on improving both the model quality and how the results are communicated.

The following are a selection of the many changes we’ve made to the quality of the analysis. Our clients continue to tell us that we’re multiple times better than the so called ‘text AI’ that is built into their current tool. They see us as an powerful upgrade to those systems.

A better base coding model

Our approach to almost all projects is to start with a well-developed coding model and then adapt / fine tune it to better match client needs.

Our current base model includes about 250 themes that are frequently mentioned in employee comments. To build a model we need a large number of example sentences. Currently the base model has about 2m carefully labelled examples.

Workometry Coding Interface

A screenshot of the the tool we’ve built for our coding team to fine-tune and customise coding models. With the tool coders productivity has increase ~10x whilst removing the need for them to have data science skills.

Significant improvement in coding quality

There are two measures that we are optimising to continually improve our coding quality:

  • False positives. This is the most important. It’s the number one metric for us as we’ve found users far less tolerant to miscoded answers than they are to missed answers. Our goal is to have < 1% false positives, a target we are exceeding in most themes.

  • False negatives. These tend to naturally increase as false positives decrease, however it is possible to decrease these by ‘shifting the curve’.

We’ve recently made changes to our ‘AI’ that has resulted in a 50% decrease in false negatives without impacting false positives.

The key benefit of these quality improvements is that they deliver notable downstream-benefits when we use classified text in later machine learning approaches.

How are themes being used?

Whilst our first step is to identify the themes used in text, to enable detailed understanding of comments it’s important to show how each theme is used. Often employees will discuss a theme in different ways.

Same theme,
different uses

Identifying the different ways a theme is used is essential
for improving comprehension and effective decision making.

A new approach looks at each sentence mentioning a theme and automatically groups similar sentences together. We employ an algorithm to help reveal the best sentence which summarises the group and which sentences are seen as most quote-worthy.

For each group we can show where in the organization respondents are most likely to be, what are the key words and phrases specific to the group and which other themes are mentioned alongside the group.

Which themes are important?

Whilst showing which themes are most used in an answer set is important, the decision maker will want to identify the themes most important for ‘driving’ what they’re interested in improving. For example bullying is a relatively infrequent topic in many organizations, but it is a leading indicator on many of the key outcomes.

We’ve recently implemented a new set of algorithms to enable decision makers to focus on which themes they need to focus on for each defined outcome. When the theme is identified we provide interactive dashboards to reveal more information and make taking action easier and more focused.

Ability for a final user to improve results

Occasionally a user will disagree with our theming, or feel that a new theme should be added. Each of our returned sentences now include a unique URL which, when clicked, enables a user to raise an issue or make a suggestions. These links can embedded into dashboards making it easy for users to alert our team.

At the backend we’ve built a review process to ensure one of our coders can update the model and data as needed.

This feedback loop is proving to be a useful addition to our quality improvements. It is especially useful when feedback incorporates organization-specific language.

Summary

We’re now providing analysis that is of higher quality than human researchers and at a fraction of the time and cost. We believe that as this quality improves it increases the value of your text data, improves the accuracy of decision making and encourages the analyst to ask for more rich qualitative data.

NewsAndrew Marritt