Arm adds neural networking instructions to Cortex-M

Arm has added neural network processing instructions to its Cortex-M architecture, aiming at products at the outside edge of IoT networks, such as devices that can recognise a few spoken words without connecting to the cloud – vocal wake commands for example.

The ‘M-Profile Vector Extensions’ (MVEs) have been announced under the brand ‘Helium’, and are loosely analogous to Neon SIMD (single-instruction multiple-data) extensions for the firm’s high-end Cortex-A cores. Helium extensions will also handle digital signal processing, delivering more performance than existing DSP instructions which were created to turn the Cortex-M3 into the Cortex-M4.

Arm describes Helium as the “optimised SIMD capabilities of Neon technology, tailored to the M-profile architecture, plus new programming features and data types for emerging use cases”.

Along with the standard 32bit Armv8-M instructions come fixed-length 128bit vectors (with gather load and scatter store, low overhead loops and predication) and increased arithmetic support (fixed and floating point – including half and single-precision float and 8bit integer and complex maths – together covered by ~150 instructions, including 8bit vector dot product).

At the same time, the hardware that supports the new instructions includes security features via extensions to ‘TrustZone for Armv8-M’ and PSA principles.

Overall, from the Armv8.1-M instruction set architecture (ISA) as it will be known, up to 5x performance increase is predicted compared with the existing Armv8-M architecture (estimate based on complex FFT in int32) and up to 15x in machine learning (based on matrix multiplication in int8).

Armv8.1-M core implementation options will include:

Helium omitted, with optional scalar floating point (with or without double precision support)
Helium with support for vectored integer only, with optional scalar floating point (with or without double precision support)
Helium with vectored integer plus floating point (support vectored single-precision and half precision) with scalar floating point (with or without double precision support)

Will the instruction set instructions result in a large increase in silicon footprint?

No, according to Arm, particularly as some existing v8-M hardware will be re-used when executing Helium instructions. No exact figures are being released.

As well as in voice processing, applications are foreseen in vibration analysis and vision.

Toolchains from multiple vendors are available today, as are models. Silicon products including Armv8.1-M are expected within two years.

In addition to vector processing, Helium highlights include:

Interleaving and de-interleaving load and store instructions (VLD2/VST2 with strides
of 2, and VLD4/VST4 with strides of 4)
Vector gather load and vector scatter store – memory access of elements in a vector
register, with address offset of each element in the vector, defined using elements
in another vector register.
Allows software to handle arbitrary memory access
patterns and can be used to emulate special addressing modes like circular addressing,
which are often used in signal processing.
Can also help accelerate non-sequential
accesses of data elements in arrays in various data processing tasks
Vector complex value processing supporting integers (8, 16 and 32bit) and
float (32bit) – for example VCADD, VCMUL, VCMLA instructions
Lane predication
Bit integer support

Arm adds neural networking instructions to Cortex-M

Leave a Reply Cancel reply

Sudoku

Arm adds neural networking instructions to Cortex-M

Get Electronics Weekly every day

Leave a Reply Cancel reply