Arm adds neural networking instructions to Cortex-M

Arm has added neural network processing instructions to its Cortex-M architecture, aiming at products at the outside edge of IoT networks, such as devices that can recognise a few spoken words without connecting to the cloud – vocal wake commands for example.

Arm-Helium The ‘M-Profile Vector Extensions’ (MVEs) have been announced under the brand ‘Helium’, and are loosely analogous to Neon SIMD (single-instruction multiple-data) extensions for the firm’s high-end Cortex-A cores. Helium extensions will also handle digital signal processing, delivering more performance than existing DSP instructions which were created to turn the Cortex-M3 into the Cortex-M4.

Arm describes Helium as the “optimised SIMD capabilities of Neon technology, tailored to the M-profile architecture, plus new programming features and data types for emerging use cases”.

Along with the standard 32bit Armv8-M instructions come fixed-length 128bit vectors (with gather load and scatter store, low overhead loops and predication) and increased arithmetic support (fixed and floating point – including half and single-precision float and 8bit integer and complex maths – together covered by ~150 instructions, including 8bit vector dot product).



At the same time, the hardware that supports the new instructions includes security features via extensions to ‘TrustZone for Armv8-M’ and PSA principles.

Overall, from the Armv8.1-M instruction set architecture (ISA) as it will be known, up to 5x performance increase is predicted compared with the existing Armv8-M architecture (estimate based on complex FFT in int32) and up to 15x in machine learning (based on matrix multiplication in int8).

Armv8.1-M core implementation options will include:

  • Helium omitted, with optional scalar floating point (with or without double precision support)
  • Helium with support for vectored integer only, with optional scalar floating point (with or without double precision support)
  • Helium with vectored integer plus floating point (support vectored single-precision and half precision) with scalar floating point (with or without double precision support)

Will the instruction set instructions result in a large increase in silicon footprint?

No, according to Arm, particularly as some existing v8-M hardware will be re-used when executing Helium instructions. No exact figures are being released.

As well as in voice processing, applications are foreseen in vibration analysis and vision.

Arm-Helium-programmingToolchains from multiple vendors are available today, as are models. Silicon products including Armv8.1-M are expected within two years.

In addition to vector processing, Helium highlights include:

  • Interleaving and de-interleaving load and store instructions (VLD2/VST2 with strides
    of 2, and VLD4/VST4 with strides of 4)
  • Vector gather load and vector scatter store – memory access of elements in a vector
    register, with address offset of each element in the vector, defined using elements
    in another vector register.
    Allows software to handle arbitrary memory access
    patterns and can be used to emulate special addressing modes like circular addressing,
    which are often used in signal processing.
    Can also help accelerate non-sequential
    accesses of data elements in arrays in various data processing tasks
  • Vector complex value processing supporting integers (8, 16 and 32bit) and
    float (32bit) – for example VCADD, VCMUL, VCMLA instructions
  • Lane predication
  • Bit integer support

 

Steve Bush

Steve Bush is the long-standing technology editor for Electronics Weekly, covering electronics developments for more than 25 years. He has a particular interest in the Power and Embedded areas of the industry. He also writes for the Engineer In Wonderland blog, covering 3D printing, CNC machines and miscellaneous other engineering matters.

Leave a Reply

Your email address will not be published. Required fields are marked *

*