Stanford Researchers Detail New Method for Error Detection in Perception Data
Autonomous and semi-autonomous vehicles are increasingly common, with most relying primarily on AI-powered cameras that rapidly detect vehicles, people, and obstacles within the frame and use that information (among other information, like depth sensor data) to operate or augment the operation of a vehicle. The AI models used in this process, of course, are trained with training datasets. But, Stanford researchers explained in a recent blog post, there’s a problem: “Unfortunately, many datasets are rife with errors!” In that blog post, they outlined how their team—composed of Stanford researchers Daniel Kang, Nikos Arechiga, Sudeep Pillai, Peter Bailis, and Matei Zaharia—used new tools to detect errors in those datasets.
Any errors in these kinds of datasets can pose serious problems, because the AI models are evaluated for how they stack up against those training datasets. The researchers demonstrated the problem by citing a public autonomous vehicle dataset from an otherwise-unidentified “leading labeling vendor that has produced labels for many autonomous vehicle companies” where “over 70% of the validation scenes contain at least one missing object box!”
To detect these errors, the researchers developed an abstraction method called learned observation assertions (LOA). “LOA is an abstraction designed to find errors in ML deployment pipelines with as little manual specification of error types as possible,” they wrote. “LOA achieves this [by] allowing users to specify features over ML pipelines.”
The team created an example LOA system, called Fixy, to illustrate the process. “Fixy learns feature distributions that specify likely and unlikely values (e.g., that a speed of 30mph is likely but 300mph is unlikely),” reads the abstract of the paper. “It then uses these feature distributions to score labels for potential errors.”
“We can specify the following features over the data: box volume, object velocity, and a feature that selects only model-predicted boxes that don’t overlap with a human label,” the blog explained. “These features are computed deterministically with short code snippets from the human labels and ML model predictions. Fixy will then execute on the new data and produce a rank-ordered list of possible errors.”
The team evaluated Fixy against Lyft’s Level 5 perception dataset and a dataset from the Toyota Research Institute. “LOA was also able to find errors in every single validation scene that had an error, which shows the utility of using a tool like LOA,” they wrote. Further, LOA was able to find 75% of the total errors identified within a selected scene from the Toyota dataset.
To learn more, read the blog post here.