Open hardware is less important than open software but that doesn't mean it isn't desirable. But anyway, vector v gpu:
Think AVX2 instead of GPU. You extend support for MIMD architecture will all the benefits of Out-of-Order execution and deep pipelines, but add data-level parallelism through SIMD or "vector" operations. This can allow significant performance gains for fundamentally sequential operation that still needs to perform linear algebra operation. Furthermore a lot of the speedup relies on unrolling sequential math operations and manually breaking data dependencies in the code, which has diminishing returns for larger vector/matrix dimensions (you'd spend more time loading/storing into GPU registers even if you had no bandwidth concerns over PCIe).
Vector processors/architectures are commonly found in special purpose architectures called Digital Signal Processors (DSPs), which are important for a variety of applications like automotive, aerospace, controls, audio, IoT, or anywhere with real-time data acquisition. FPGAs are also popular for this task.
However a lot of those devices are pretty cheap (or really expensive, not a lot of middle ground due to economies of scale) and either underclocked or overspec'd - meaning you either pay out the ass for an overkill processor or pay out the ass for a complicated systems architecture to use multiple chips on the same board (with proprietary tooling, shout-out to ADI)
Cheap(ish), low(ish) power, high(ish) clock, CPUs with 256 width vector operations are highly attractive in a number of markets. The fact that its open source makes it even more attractive, if you can afford to do a run of them.
Think AVX2 instead of GPU. You extend support for MIMD architecture will all the benefits of Out-of-Order execution and deep pipelines, but add data-level parallelism through SIMD or "vector" operations. This can allow significant performance gains for fundamentally sequential operation that still needs to perform linear algebra operation. Furthermore a lot of the speedup relies on unrolling sequential math operations and manually breaking data dependencies in the code, which has diminishing returns for larger vector/matrix dimensions (you'd spend more time loading/storing into GPU registers even if you had no bandwidth concerns over PCIe).
Vector processors/architectures are commonly found in special purpose architectures called Digital Signal Processors (DSPs), which are important for a variety of applications like automotive, aerospace, controls, audio, IoT, or anywhere with real-time data acquisition. FPGAs are also popular for this task.
However a lot of those devices are pretty cheap (or really expensive, not a lot of middle ground due to economies of scale) and either underclocked or overspec'd - meaning you either pay out the ass for an overkill processor or pay out the ass for a complicated systems architecture to use multiple chips on the same board (with proprietary tooling, shout-out to ADI)
Cheap(ish), low(ish) power, high(ish) clock, CPUs with 256 width vector operations are highly attractive in a number of markets. The fact that its open source makes it even more attractive, if you can afford to do a run of them.