FPGA fabrics meet CPU architectures

Performance and integration challenges could lead to engineers re-examining the role of the ‘humble FPGA’, says Alok Sanghavi of Achronix Semiconductor.

Speedcore eFPGAs combine the advantages of FPGAs with CPUs or SoCs

Physical limitations are starting to restrict how much further process geometry shrinkage can take us in terms of boosting processor throughput.

As this happens, designers are debating how they can re-think designs so that they do not need to pack even more transistors on to a chip to achieve speed increases.

One of the biggest innovations in this industry will come from a fundamental re-application of a technology that has been known and understood for some time: the humble FPGA.

A bit of history

The stand-alone FPGA was introduced in 1984. At this time, Xilinx and Actel started to introduce products that were primarily used in low-volume industrial applications and prototyping, as a useful ‘plaster’ to patch holes in system logic.

Altera (acquired by Intel in 2015), Lucent and Agere drove FPGAs into networking and telecomms applications.

Subsequent process shrinkage, reduction in mask costs, and the integration of SRAM blocks, large MACs, sophisticated configurable I/O and banks of SerDes, has precipitated a growth in the use of FPGAs since 1995.

Over the past 10 years, FPGAs have continued to proliferate and prices have fallen to the point that they are adding significant value, even to high-volume applications in functions previously associated only with DSPs, GPUs, and MCUs.

Lately, with low-cost, low-power FPGAs we are arguably entering a new age of FPGA development. Designers now integrate them into datacentre systems, such as hardware accelerators providing packet inspection, database acceleration and security, machine learning and software‑defined networking.

Taking FPGA fabrics into CPUs

This integration has historically met with resistance, however. SoC developers were reluctant to discount size, speed or cost worries, although this is no longer the case.

FPGAs have advanced by orders of magnitude in every aspect. Nor is it the case that CPUs have advanced to the point where they can, reasonably, take on the required processing loads. CPUs provide general compute flexibility, but they carry too much processing overhead for many highly repetitive tasks.

Every modern CPU will essentially employ the load-store/modified Harvard architecture, wherein instructions and data are stored separately and transmitted along different signal pathways. Instruction sets will be communicated on the control plane. This describes how data will be acted upon, as well as administering ‘housekeeping’ for the overall system.

Structural differences

The constant need to continually load (often highly complex) instructions and store the resulting data, the need to keep a number of different hard-wired fabrics on standby, to act on data within the same chip, and the need to continually switch context (every 100 cycles or so) to carry out different tasks, make the CPU relatively inefficient at handling complex, yet largely consistent, data plane operations.

An FPGA is an array of logic blocks connected by reconfigurable routing channels that can be reconnected in such a way as to perform very specific functions.

Both FPGAs and CPUs process instructions and data seperately using memory and logic. However, there is one major difference.

FPGAs use memory to configure lookup tables, multiplexers, partially populated interconnect matrices, and a number of other elements.
A CPU is optimised for rapid context switching. It loads instructions and data from registers and memories, and then, within a few cycles, it will load a whole new set of instructions and data.

The process of reconfiguring an FPGA’s functionality, on the other hand, is a relatively resource-intensive process, requiring the movement of new configurations into the configuration RAM. It is therefore impractical to do too often. However, once configured, an FPGA can emulate digital logic at speeds analogous to ‘hard wired’ circuits.

So, just as the CPU excels at peforming varied tasks, the FPGA excels at performing repetitive (and particularly highly parallelised) tasks that repeat for thousands of cycles and are only occasionally redefined.

Industry activity signals

The integration of FPGA fabrics into CPU devices is coming. The industry has shown considerable faith that these two device types can drive value through closer integration.

Intel’s $16.7bn acquisition of Altera is testimony to this. Intel is hoping to develop modules that use both Altera’s FPGAs and Intel’s CPUs to accelerate the datacentre performance.

Given the drive towards greater performance, lower costs, and more efficient power use, it is inevitable that these two structures will now start to move into the same device, with FPGA fabric integrated on to CPUs as IP blocks.

Achronix hopes to be a major catalyst for this, with its Speedcore embeddable FPGA IP. This is built on its Speedster family of standalone FPGAs, already used in the industry and described by the company as having high performance and a sophisticated routing architecture. Speedcore eFPGA can be used to extract maximum advantage from the integration of FPGA fabric into CPU or SoC devices.

One such advantage is that SoCs with an eFPGA fabric are capable of higher performance than is possible with a separate FPGA chip.

This is partly due to the higher bandwidth that can be made available. Furthermore, as there is no need for signals to go through SerDes and protocol encoding like PCIe; latencies can be reduced by a factor of 10. eFPGA elements can be constructed in such a way as to be cache-coherent to speed system performance.

Power consumption

Along with these performance advantages, integration of an FPGA fabric significantly reduces power consumption. Cost saving is also realised at the wider system level due to the ability to remove supporting components, such as cooling, passives, clock generators, level shifters and power regulators.

We’re approaching a point where FPGA fabrics can be integrated into everyday CPU or SoC designs usefully, affordably, and practically, without adding impractical amounts of time or complexity to the design process.

It is inevitable that the industry will head in this direction. The really exciting questions are what unexpected applications will be found for this general architectural tactic, and how will the eFPGA industry respond in order to satisfy the demand.