In this special guest feature, Wilson Pang, CTO of Appen, offers a few quality controls that organizations can implement to allow for the most accurate and consistent data annotation process possible. Wilson joined Appen in November 2018 and is responsible for the company’s products and technology. Wilson has over seventeen years’ experience in software engineering and data science. Prior to joining Appen, Wilson was Chief Data Officer of CTrip in China, the second largest online travel agency company in the world where he led data engineers, analysts, data product managers and scientists to improve user experience and increase operational efficiency that grew the business. Before that, he was senior director of engineering in eBay in California and provided leadership to various domains including data service and solutions, search science, marketing technology and billing systems. Wilson obtained his Masters and Bachelor’s degrees of Electric Engineering from Zhejiang University in China.
Using poor-quality data to train your machine learning system is like preparing for a physics test by studying geometry. You’ll learn something, but your efforts probably won’t help you answer your test questions correctly. For example, if you train a computer vision system for autonomous vehicles with images of sidewalks mislabeled as streets, the results could be disastrous. In order to develop accurate algorithms, you will need high-quality training data. To generate high-quality data, you’ll need skilled annotators to carefully label the information you plan to use with your algorithm.
When we talk about quality training data, we’re talking about both the accuracy and consistency of those labels. Accuracy is how close a label is to the truth. Consistency is the degree to which multiple annotations on various training items agree with one another.
Here are a few quality controls that organizations can implement to allow for the most accurate and consistent data annotation process possible.
Standard Quality-Assurance Methods Offer a Baseline
Typically, organizations that are creating high-quality training data sets use three standard methods for ensuring accuracy and consistency: gold sets, consensus, and auditing.
- Gold sets, or benchmarks, measure accuracy by comparing annotations (or annotators) to a “gold set” or vetted example. This helps to measure how well a set of annotations from a group or individual matches the benchmark.
- Consensus, or overlap, measures consistency and agreement amongst a group, and does so by dividing the sum of agreeing data annotations by the total number of annotations. This is the most common method of quality control for projects with relatively objective rating scales. The goal is to arrive at a consensus decision for each item. Any disagreement amongst the overlapped judgments is typically arbitrated by an auditor.
- Auditing measures both accuracy and consistency by having an expert review the labels, either by spot-checking or reviewing them all. This method is important for projects where arriving at a consensus judgment may not be feasible—tasks such as transcription, where auditors review and rework the content until it reaches the highest levels or accuracy.
Detailed Controls Can Provide In-Depth Quality Assessment
These baseline quality measurements are a solid foundation for monitoring the quality of data annotations. However, no two AI projects are alike, and organizations should establish quality assessments more tailored to a specific initiative. Organizational leaders responsible for AI initiatives can create in-depth quality controls by considering the following processes:
- Multiple quality measurement metrics: Leverage multiple methods of quality measurement discussed earlier.
- Weekly data deep-dives: Implement a project management team to investigate data weekly and set stretch productivity and quality goals. For example, if you require data that is accurate 92% of the time, set a stretch goal of 95% and try to ensure your annotation process exceeds your initial goal.
- Management testing and auditing: To build up your project managers’ quality-assurance skill set, ask them to complete perform annotation work and quality audits so that they have first-hand experience of the annotation process. This gives the management team a 360-degree view of the project and a full understanding of the annotators’ process.
High-Quality Training Data Helps Mitigate Bias
High-quality training data ensures more accurate algorithms, and it can also help mitigate the potential bias in many AI projects. Bias can manifest as uneven voice or facial recognition performance for different genders, accents, or ethnicities. Fighting bias during your data annotation process is another way to infuse your training data set with quality.
To avoid bias at the project level, organizations should actively build diversity into the data teams defining goals, roadmaps, metrics, and algorithms. Hiring a diverse team of data talent is easier said than done, but if the makeup of your team isn’t representative of the population the algorithm will impact, then the end product risks only working for, or appealing to, a subset of people, or discriminating against certain subsets of the population.
It’s important to also account for bias at the data level. Here are considerations to help you mitigate bias in training data:
- When internal team members label the data, they will always add some bias because they have expectations about what their system should conclude. If you decide to use an internal team, consult an outside source to help foster an objective annotation environment.
- Find or create a representative training dataset. Quantity always helps, especially if you’re using data from internal systems. Try to find the most comprehensive data, and experiment with different datasets, metrics, and segmentation to ensure you’ve covered the bases.
- If you’re engineering or annotating data, take care to design the instructions and tasks for your annotators in ways that don’t bias them from the outset. It’s important that annotators have enough instruction to correctly perform the task, but not know what the data will be used for, which can bias behavior.
- Check for implicit bias in the data as part of your quality-assurance process.
- Once your product is live, monitor performance using the data it generates to determine whether it’s delivering equitable opportunities and outcomes for all users.
High-Quality Data Is an AI Keystone
High-quality training data is necessary for a successful AI initiative. And, while the quality-assurance processes are numerous, they are a vital component of your AI initiative. Quality training data not only begets algorithms that work in the real world, it also helps mitigate some of the bias inherent in manual data annotations. Before you begin to launch your AI initiative, develop data quality assurance practices to realize the best return on your investment.