Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hmm, how about using this for fast interconnect for making rpi clusters?


This might even allow some level of memory mapping in all directions. Unfortunately PCIe switches are not very hackable as such, but maybe I can spin a board for this purpose... after I get the hack more industrialized.


Yeah, I was thinking about custom switch in fpga, for research purposes. Of course if I wanted real crunching performance, it's cheaper to just buy some gpus.


Wouldn't the gigabit LAN be a better fit for this? If you want to make a cluster, you need to make some custom hardware to connect to, that can facilitate the communication. At this point you're likely spending more that if you just bought a real desktop for more performance. I can see the fun factor in hacking the system together, though.


There are two ways of doing clusters - one is a message passing paradigm, which you can do over Ethernet (to an extent - I’d still take USB3 for 4x the bandwidth) - and the other is direct memory access a’la Cray.

What really motivated me to do this hack is the relative abundance of stuff I can now plug into an FPGA :)


I was thinking Ethernet because, A:It's cheap to buy a switch and cluster 100 RPi, B: You can have a desktop with a faster NIC and keep the RPis busy.

But as with everything high performance, it depends entirely on the use case.


True. And with RPi4 having a 1000baseT, it’s not as painful as it seems. Perhaps even the driver can be coaxed into some form of DMA and MPI that is a bit lower latency than IP stack.

With secondary IP layer on 802.11, it might actually work reasonably well.


Or use the PCIe mod and use an infiniband card for low latency and high thoughput.


At that price point, you could probably get more performance out of a server with 2-4 sockets.


I suggest the RoarVM message passing Smalltalk Virtual Machine [1][2]. Erlang/OTP with actors would be my second choice.

[1]Ungar, D., & Adams, S. S. (2009). Hosting an object heap on manycore hardware. https://sci-hub.tw/10.1145/1640134.1640149

[2]RoarVM demo https://www.youtube.com/watch?v=GeDFcC6ps88


So your plan is to attach an FPGA to the PCIe bus? To allow the FPGA to access peripherals on the Pi side or do you want the FPGA to make the pi a lot more powerful?


Both. The FPGA can interconnect the 16 RPi4s at 40 Gbps and also interconnect the 16 1G Ethernet at 16 Gbps, even interconnect the 16 HDMI, MPI and GPIO, depending on the FPGA. The FPGA can add 256 GB DDR3 and lots of other IO's like SATA and HDMI. (see my other comment for a $159 FPGA). The FPGA can act like a switch, an IO and memory extender and still have room for up to 300 ARM or Risc-V softcores.


What FPGA are you planning on using?


See my other posts on this page. You want FPGA's with cheap SERDES links that support PCIe Gen 2 at 5 Ghz. The best fit is Lattice ECP5-5G but that's $5 per link. The MPF300 is $10 per 12,5 Gbps link on the discounted developmentboard (with desoldering). A retail priced Cyclone 10CX105 also $10 per link with a smaller 10CX105 at $14.6. But very potent FPGAs that can be a small GPU in itself.

I now plan a crowdfunding for our own FPGA in an ASIC, that would bring $0.25 with a hundred links. This HN pages shows me there will be enough buyers for a $300 364 core RPI4 compatible cluster (100 BCM2711 chips connected to 1 FPGA plus 100GB DDR4) but without the RPi4 boards. Instead of attaching RPi4 or BCM2711, you could have 100 SATA, SSD, 30 HDMI, 10G or a rack full of PCIe servers connected to this interconnect FPGA. You are welcome to help realise the project or the crowdfunding.


I have no idea what I'd use this for but it sounds awesome.


Particle systems [1].

AR. VR [2][3][4][5].

Image processing, neural nets.

But not in programming languages like C, Java or Linux.

Only in massively parallel message passing languages[4].

I suggest a Smalltalk message passing scalable cluster VM like the RoarVM [6][7][]8].

[1] Shadama 3D. Yoshiki Oshima. https://www.youtube.com/watch?v=rka3w90Odo8

[2] OpenCobalt Alpha. https://www.youtube.com/watch?v=1s9ldlqhVkM&t=13s

[4] OpenCroquet Alan Kay Turing lecture. https://www.youtube.com/watch?v=aXC19T5sJ1U?t=3581

[5] OpenCroquet Alan Kay et al. https://www.youtube.com/watch?v=XZO7av2ZFB8

[6] RoarVM paper. David Ungar et al. https://stefan-marr.de/renaissance/ungar-adams-2009-hosting-...

[7] RoarVM Demo. https://www.youtube.com/watch?v=8pmVAtk612M

[8] RoarVM Demo. https://www.youtube.com/watch?v=GeDFcC6ps88


I said what I'd use it for ;)


There's also InfiniBand which can reach much greater bandwidths.


> Wouldn't the gigabit LAN be a better fit for this?

I might be showing my age but...

Imagine a Beowulf cluster of these!

(For those who don't know the reference, this used to be a common saying in slashdot back in the day.)


+5 Funny


Until metamoderator wrath.

To everyone who feels left out: old slashdot.org "humor".


"I might be showing my age"

What, above 25?


I suggest using the 16 x 12,5 Gbps serdes links of the MPF300 Polarfire FPGA ($159 if you desolder one from a development board) at 5 Gbps speeds to interconnect 16 RPi4. You have 64 ARM cores and 300 softcores on the FPGA with 264 Gbps theoretical bandwidth and 64, 256 to 512 GB of DDR3. Around $719 for the RPI+FPGA for 364 cores 64 GB, more for the extra DRAM. You can add GPU cards of course. If you make a 4-6 layer PCB, you could attach the 16 x 1G and HDMI to the FPGA as well for even more interconnect bandwitdh. Email me for the details or collaborating on building it.


You'll lose fast storage. So you end up with devices with 40MB/s uSD cards and multi-gigabit interconnect.


Well, with the 4GB, and rumoured 8GB, you can just load everything to RAM on boot and not worry about storage at all.


You could just use an USB device for higher throughput. The Sandisk Extreme Go, for example, is basically a small SSD in a USB drive form-factor.


If I’m reading it right, don’t you give up USB to get this PCIe access?


Bet the broadcom SoC has a USB controller built in, but unused... You could hook it up?


Is it really unused? The RPi4 schematic is incomplete, but it at least shows the USB 2.0 pins of the USB-C port going somewhere; they might be going directly to that built-in USB controller in the main SoC.


https://lb.raspberrypi.org/documentation/hardware/raspberryp...

This says, there's OTG controller intended to be used in "peripheral device only" mode.

I haven't found datasheet for BCM2711, so it's hard to tell.


> peripheral device only mode.

I wonder why? The silicon required is far far simpler for host mode (since you only need a single memory buffer, and you fully dictate the timing, so can pass all the complex stuff up to software).


Maybe they don't have the VBUS circuitry setup for connecting self-powered devices in host mode.


There's an XHCI controller in the device tree, but my guess is that there's some silicon bug in it, hence the third party chip off of PCIe instead.


Also it's probably not routed from the BGA, so it would not be possible to use anyway.


Yes, that's my point.


The raspberry pi is slow enough that your network isn't going to be the biggest bottleneck.


That’s actually not the case. Historically, low amounts of RAM and IO were the bottlenecks, we’re talking quad-core arm-v8 with a pretty beefy vector gpu/coprocessor.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: