This might even allow some level of memory mapping in all directions. Unfortunately PCIe switches are not very hackable as such, but maybe I can spin a board for this purpose... after I get the hack more industrialized.
Yeah, I was thinking about custom switch in fpga, for research purposes. Of course if I wanted real crunching performance, it's cheaper to just buy some gpus.
Wouldn't the gigabit LAN be a better fit for this? If you want to make a cluster, you need to make some custom hardware to connect to, that can facilitate the communication. At this point you're likely spending more that if you just bought a real desktop for more performance. I can see the fun factor in hacking the system together, though.
There are two ways of doing clusters - one is a message passing paradigm, which you can do over Ethernet (to an extent - I’d still take USB3 for 4x the bandwidth) - and the other is direct memory access a’la Cray.
What really motivated me to do this hack is the relative abundance of stuff I can now plug into an FPGA :)
True. And with RPi4 having a 1000baseT, it’s not as painful as it seems. Perhaps even the driver can be coaxed into some form of DMA and MPI that is a bit lower latency than IP stack.
With secondary IP layer on 802.11, it might actually work reasonably well.
So your plan is to attach an FPGA to the PCIe bus? To allow the FPGA to access peripherals on the Pi side or do you want the FPGA to make the pi a lot more powerful?
Both. The FPGA can interconnect the 16 RPi4s at 40 Gbps and also interconnect the 16 1G Ethernet at 16 Gbps, even interconnect the 16 HDMI, MPI and GPIO, depending on the FPGA. The FPGA can add 256 GB DDR3 and lots of other IO's like SATA and HDMI. (see my other comment for a $159 FPGA).
The FPGA can act like a switch, an IO and memory extender and still have room for up to 300 ARM or Risc-V softcores.
See my other posts on this page.
You want FPGA's with cheap SERDES links that support PCIe Gen 2 at 5 Ghz. The best fit is Lattice ECP5-5G but that's $5 per link. The MPF300 is $10 per 12,5 Gbps link on the discounted developmentboard (with desoldering). A retail priced Cyclone 10CX105 also $10 per link with a smaller 10CX105 at $14.6. But very potent FPGAs that can be a small GPU in itself.
I now plan a crowdfunding for our own FPGA in an ASIC, that would bring $0.25 with a hundred links. This HN pages shows me there will be enough buyers for a $300 364 core RPI4 compatible cluster (100 BCM2711 chips connected to 1 FPGA plus 100GB DDR4) but without the RPi4 boards.
Instead of attaching RPi4 or BCM2711, you could have 100 SATA, SSD, 30 HDMI, 10G or a rack full of PCIe servers connected to this interconnect FPGA.
You are welcome to help realise the project or the crowdfunding.
I suggest using the 16 x 12,5 Gbps serdes links of the MPF300 Polarfire FPGA ($159 if you desolder one from a development board) at 5 Gbps speeds to interconnect 16 RPi4. You have 64 ARM cores and 300 softcores on the FPGA with 264 Gbps theoretical bandwidth and 64, 256 to 512 GB of DDR3.
Around $719 for the RPI+FPGA for 364 cores 64 GB, more for the extra DRAM. You can add GPU cards of course.
If you make a 4-6 layer PCB, you could attach the 16 x 1G and HDMI to the FPGA as well for even more interconnect bandwitdh. Email me for the details or collaborating on building it.
Is it really unused? The RPi4 schematic is incomplete, but it at least shows the USB 2.0 pins of the USB-C port going somewhere; they might be going directly to that built-in USB controller in the main SoC.
I wonder why? The silicon required is far far simpler for host mode (since you only need a single memory buffer, and you fully dictate the timing, so can pass all the complex stuff up to software).
That’s actually not the case. Historically, low amounts of RAM and IO were the bottlenecks, we’re talking quad-core arm-v8 with a pretty beefy vector gpu/coprocessor.