Although known for their networking prowess, Layerscape processors are gaining
traction in artificial-intelligence applications. These applications include
security and surveillance, home and building automation, factory safety and
machine inspection. The reason is that Layerscape’s connectivity and
general-purpose processing enable these processors to address applications
where wired and wireless communications is a key requirement, and powerful
multicore CPUs can tackle multiple computationally intensive tasks.
For those surprised that the networking-centric Layerscape family is
considered for AI designs, I’ve got news. Layerscape executes AI
algorithms quite well, and it’s a good fit for a lot of designs. On the
hardware side, Layerscape combines either the efficient Cortex-A53 or the
powerful Cortex-A72 CPUs from Arm with sizeable caches and DRAM bandwidth.
Figure 1 shows how key functions in a design using Layerscape for AI-based
image processing can map to a Layerscape LS1043A or LS1046A processor. Cameras
and radar sensors connect via USB or Ethernet. Ethernet can also connect to a
WAN uplink and to the LAN (also available via PCIe-connected Wi-Fi) if this
system is an edge gateway. The four CPUs handle application logic, networking
functions, capture of camera and radar data and AI-based classification of
this data.
Figure 1: Mapping AI-Enabled Application to Layerscape
The software side is at least as important. Frameworks—software
libraries for AI-related numerical computation—o ptimized for mobile
and embedded devices instead of servers are coming to market, enabling
performance increases. These include open source frameworks, such as
Google’s TensorFlow Lite and Tencent’s NCNN, and commercial
engines like DeepView from Au-Zone. By optimizing models through judicious
pruning (eliminating less-useful neural-network parameters) and quantization
(for example, mapping floating-point value to eight-bit integers), these frameworks
reduce memory and computation required to crunch models. In the case of video
analysis, faster performance can be seen in 5-10x gains in frames per second.
Another software approach is to bypass implementing models with generic
frameworks and taking a bespoke approach to developing models optimized for a
specific hardware target. Optimizations beyond pruning and quantization (for example,
relying on the similarity among adjacent frames in a video stream to quickly
find previously detected objects) can extract further performance. Companies
like Pilot.AI and Invision.AI have ported their object-detection models to
Layerscape, achieving movie-quality frame rates.
Invision.AI, Au-Zone and a stealth startup with AI software optimized for
edge computing and IoT endpoints recently presented their software at a
webinar hosted by NXP. These companies made interesting points about the cost, risk and time to
market advantages of performing AI on Layerscape. Companies already fielding
Layerscape-based designs can add AI capability without redesigning their
hardware, provided the design has CPU headroom. We’ve seen this with
companies looking to add video surveillance to their enterprise access points
or home automation to their residential gateways.
A system-level approach can also rationalize limited hardware resources
available for retrofitting AI and streamline upgrading systems already in the
field. For example, a first level of AI classification can be added to an IP
camera, smart door lock or other device, taking advantage of any available
processing headroom and memory. This level can extract features or do other
preliminary classification, cascading the results downstream to the associated
Layerscape-powered camera headend or home-automation hub to complete the
analysis process. If a deployment has insufficient resources, it need not be
ripped out and replaced but instead supplemented with an adjunct Layerscape
system or module for the AI functions.
Figure 2 shows this approach in the context of a roadside unit (RSU). These
are systems deployed throughout a smart city to help implement an intelligent
transportation system (ITS). They monitor roads and intersections with various
sensors and communicate with vehicles and adjacent RSUs. NXP has shown RSU
demos in the past, see
https://www.nxp.com/intelligentRSU. In the Figure 2 example, the vehicles, cameras and radars preclassify the
data they capture, communicating their findings to the RSU. The RSU tracks and
plots vehicles and pedestrians, analyzes their motion and queuing, controls
traffic signals and communicates with other systems—a big load that
would be even bigger if a first level of processing hadn’t been done
near the various sensors.
Figure 2: Cascaded AI Can Play a Role in the Smart City
The Layerscape recipe of combining processing and I/O is well suited to
supporting AI. We find the Arm Cortex-A72 CPU—the workhorse used in
many Layerscape processors with one to 16 cores—performs about as well
as a single thread of a server-grade processor or a single core of a PC-grade
processor. We’ve seen this result on benchmarks in the SPEC suite, in
networking tasks and in video compression.
The Arm Cortex-A53 CPU—the lower-cost stablemate of the
Cortex-A72—works well for applications when paired with optimized
software and in less-demanding situations. For example, a video surveillance
system operating at only 8fps can compress this video in the H.264 format
using only a single Cortex-A53 CPU, with cycles remaining for other tasks. An
adjacent Cortex-A53 CPU running commercial AI software can identify bodies at
this frame rate or faster.
Layerscape’s abundant USB, Ethernet and PCI ports can connect to
cameras, radar modules and other sensors generating input to be analyzed.
These I/O ports are also essential for LAN and WAN connections. It’s
hard to imagine a system using AI that doesn’t also communicate.
Competing processors may have useful multimedia engines but cannot match
Layerscape’s interfacing options and networking performance.
In conclusion, Layerscape can support AI functions. Developers need not rely
on an expensive coprocessor add-on or think their only option is a competing
chip with hardware acceleration but without Layerscape’s networking and
I/O or cost efficiency. Nor must one wait to implement AI. Get started today!