AI is a common buzz word these days, but most consumers probably aren’t aware how it’s interwoven in their everyday lives. Some of us in the analyst and tech press communities may also scoff at how often the term is used for some technologies that hardly resemble true artificial intelligence. That said, there are a few platforms, beyond just powerful data centers, that are a natural for AI processing and the NNs (Neural Networks) that drive them. One of those is AI inferencing (using the AI to infer information, versus training an NN) at the edge, and in your pocket, with a smartphone.
As you might imagine, smartphone platforms from Android to Apple vary greatly, but there are common workloads like speech-to-text translation, and recommender engines (like Google Assistant and Siri), that make heavy use of common AI NN models, and they do so on-device for speed and latency advantages.
Measuring AI Performance In Mobile Devices
With any new device introduction and with hot, new applications, tech-savvy consumers and tech press members want to be able to measure and glean relative performance of devices across the various mobile ecosystems. Further, AI processing performance in smartphone testing and reviews is trending at the moment, so of course there are multiple tools appearing in the major app stores that claim to be able to measure AI performance of phones and other mobile devices. And — you guessed it — these apps certainly aren’t all created equal.
In an effort to sort through this a bit, Marco and I took a deep dive look over at HotHardware at performance of various flagship Android phones across three popular AI benchmarks that produce vastly different results in some cases.
The key here is to better understand what a specific benchmark metric is actually testing. Does the test represent as close to real-world workloads as possible? An ideal benchmark uses actual applications that a consumer would use, but short of that it could employ the same core software components of popular apps instead, to represent realistic performance expectations. And in this case, that means we need to understand what NNs these benchmark tools are testing against, and what mathematical precision and AI algorithms are being used to process workloads on them.
What’s The Right Yardstick For AI Benchmarks?
What makes for a good AI benchmark for mobile devices is a relatively deep, nuanced subject, but the long and short of it is virtually all mobile NPUs (Neural Processing Units, or dedicated AI engines) employ either INT8 or quantized mathematical precision, or FP16 floating point precision, to make use of popular NNs like ResNet-34 or Google’s DeepLab-v3 for image classification and segmentation in apps, for example. Is that a cat or a dog? What sort of color balance should be applied in this camera shot? These are the kinds of questions the AI is trying to infer answers for from the phone’s environment, in an imaging workload example at least, though there are many others.
Currently, INT8 precision is considered good enough for most consumer mobile applications, and advancements in compression techniques and advanced quantitization continue to elevate INT8 precision on mobile devices, while still reaping the benefits of lower power consumption versus FP16. FP16 offers better precision, but is more costly on critical smartphone power budgets.
As a result, the majority of AI enabled mobile applications out there employ INT8 for its power efficiency. Not all benchmarks currently available weight a mobile platform’s performance the same way, however. Some apps emphasize FP16 precision, even though it’s not practically employed nearly as often as INT8. Further, AI platform SDKs (software development kits) from Qualcomm and others are highly optimized for INT 8. So the question becomes, what do the various test results from some of these benchmarking apps really mean, in terms of real world AI performance in handsets and other mobile devices? As you can see in the in the scores HotHardware collected, some rank leading mobile silicon platforms from Qualcomm and Huawei quite differently. It’s readily apparent, however, that Qualcomm Snapdragon 865 devices seem to have a significant lead when it comes to INT8 NPU processing throughput, and also what is likely closer to current real-world AI performance in mobile apps. It’s also worth pointing out that Qualcomm’s Snapdragon Mobile Platforms also power the vast majority of Android phones in the US currently, so the company’s influence on the ecosystem is deep.
An Analysts Take From The Very Early Innings Of Mobile AI
Artificial Intelligence and Machine learning at the edge is a fast-changing field that is delivering ever more capable and promising solutions that will enrich our everyday lives in many ways. As a result, benchmark metrics and the apps that are used to measure them are going to have to evolve and change with the times quickly as well. In addition, as with traditional benchmarks for PCs, press, tech enthusiasts, and savvy consumers are going to be looking more closely at AI benchmarks in the days ahead, as AI becomes an even more critical component of the mobile experience and available platform solutions on the market.
As such, it will be incumbent upon these benchmark app developers and the press to sort through the finer points of what makes for a quality mobile AI benchmark, and also what is a truer measure of performance for your own personal pocket AI assistant. Right now, if a benchmark isn’t employing commonly used NNs and realistically representing the importance of INT8 precision, you have to question how valuable that test is for the average consumer. There are no absolutes here, however. The current landscape is shaping up this way but again, AI technologies are moving at a frenzied pace and the rest of the industry will need to keep up.