This information all existed in the discord but I wanted to share it with everyone.
So we’ve developed an FPGA accelerator over the past few months in M.2 (same as nVME drives) form factor designed to operate both standalone and in conjunction with GPUs.
The first version to be released has 4x high speed PCIe lanes to communicate between the system/GPUs as well as 512MB or 1GB of onboard DDR3 along with a 100k+ LE or 200k+ LE FPGA of high speed grade. We’ve named it the Acorn, and the three models are the CLE-101, CLE-215, and CLE-215+
General expectation is it will provide performance roughly scaled with price/performance of the VCU1525, but it has a unique role and is not applicable to all of the same algorithms. Its performance in this role is dominated by its interconnect bandwidth and not its processing power.
It is capable of providing up to 30MH of lift to a mining system with GPUs on a hand full of algorithms or operate independently at higher-than-GPU level hashrates for other non-memory intensive algorithms (Keccak, etc). I will be releasing it alongside our mining software and bitstreams to support hybrid GPU acceleration. This project was not developed commercially, it was developed out of a product for my day job for internal use in our own mining systems to give an edge to traditional PCs and gaming systems turned miners.
The accelerator works by streaming high bandwidth hash state between GPUs and the FPGA over PCIe., allowing each piece of hardware to handle the portion of the algorithm it is best at. In general this means memory bandwidth or area heavy portions of the algorithm may be handled by the GPU, and hash algorithms designed for hardware implementations are handled by the FPGA. This approach works for any algorithm whose internal state is 256 bit (60Mh gains) or 512 bit (x16r, Lyra2Rev2, etc.) or smaller. The accelerator supports rapidly reconfiguring its algorithms from on-board DDR to enable handling of per-block or period (TimeTravel10) re-sequencing. It was designed originally to provide performance gains (especially for older GPUs with poor cores) and power savings for ETH by way of offloading the opening and closing Keccak calculations, as well as hash-selection to improve locality of reference for early ETH rounds.