Something I've struggled to implement on Linux is cross-process multicast notifi...

pdkl95 · on May 29, 2015

The solution for this kind of problem will depend heavily on what type of messages you're trying to send. When messaging becomes that complex, there are often other things that impact the overall design in important ways that need need to be considered.

For example, if I was implementing something that is usually associated with user events (rare-ish, basically zero bandwidth, complex signal with stateful messaging semantics), I would probably just write a simple server to manage it all. This has an advantage of centralizing any messaging complexity and lets you manage any multi-message state easily. Rebroadcasting messages to allow peer-to-peer messaging would be a trivial addition. This would probably UNIX sockets if the connections are persistent (which can changed into AF_INET{,6} sockets easily, if you wanted to add network support).

For something that requires very low latency (e.g. audio/MIDI), in the past I have had to use shared memory, which has zero overhead once it is setup (no syscalls or context switches). Here, the need for low latency dictated the design. Of course, this means managing locks. Not fun, but a cost that is sometimes worth paying.

There really isn't a one-size-fits all solution.

edit:

Or, as troglobit said, TIPC. I keep forgetting that we now have it has an option. :(

ridiculous_fish · on May 29, 2015

Thanks for the reply. My use case is very simple: a stateless "something happened" notification, which can be delivered asynchronously. Coalescing or even occasional drops are fine.

I did originally use a Unix domain socket server, but that added a lot of complexity: one has to arrange for it to be launched, guard against the possibility that it gets stuck, version it, deal with permissions, etc.

My new solution on Linux is a total hack: there's a FIFO, and to post a notification, you write to it. Clients see that the FIFO became readable, and that change represents the notification. The sender then drains the data it wrote, so that the FIFO becomes unreadable again. This is a total abuse of FIFOs, but it's proven to be much simpler than trying to manage a separate server.

I've never heard of TIPC. From a little searching it looks like it's very capable but geared towards clusters, and is overkill for my use case. What do you think?

teddyh · on May 29, 2015

> there's a FIFO, and to post a notification, you write to it. Clients see that the FIFO became readable, and that change represents the notification. The sender then drains the data it wrote, so that the FIFO becomes unreadable again.

Beware: I tried that once, and it was unreliable. Only some clients woke up.

bdash · on May 29, 2015

We had a similar problem to solve for iOS and OS X and settled on the using FIFOs in that manner as well. A colleague wrote up a blog post about the various alternatives that were evaluated before settling on that approach: https://realm.io/news/thomas-goyne-fast-inter-process-commun...

ridiculous_fish · on May 29, 2015

Ha ha ha, that's great! If we are sent to programmer purgatory, at least we'll have each other.

pdkl95 · on May 29, 2015

If drops are fine, TIPC is probably overkill. I would probably just wrap something generic using UNIX domain sockets up into a library and re-use that as needed.

Depending on your permissions requirements[1], and if you really only need a signaling flag, have you considered the filesystem? Just touch a file in a well-defined directory named after the event that happened, and poll it periodically. Removing the file clears the flag. Signals can coalesce, but you should never drop any. You can poll a directory (that will normally be empty) without much CPU load (the directory inode will be cached most of the time). You could setup a multiple listeners by giving them their own "inbox" directory, like. e.g.:

    # send a notification
    touch "${HOME}/.${app_name}rc/messages/${destination}/${signal_flag_name}"

Using the filesystem opens up the possibility of a message sender being anything that can generate - even indirectly - an open(O_CREAT). Your signals also persist across programs shutdowns and crashes - you can send and receive even when the other side isn't running - and your state can persist across reboots. Also, you can leverage some of the guarantees provided by the kenel's vfs layer. For example, rename(2) is atomic, so you can send small data payloads by writing to a different name first.

    FLAG_PATH="${HOME}/.${app_name}rc/messages/${destination}/${signal_flag_name}"
    # using the PID ($$) to not collide with other message senders
    TEMP_PATH="${FLAG_PATH}-new-$$"
    echo -e "foo=bar\nbaz=quux\ncount=42" > "${TEMP_PATH}"
    mv "${TEMP_PATH}" "${FLAG_PATH}"

As an optional linux-specific feature, you can extend that technique to be event-driven (no polling loop) by telling the kernel to notify you about file-create events by listening to the directory (NOT the file) with inotify(7) for the IN_CREATE messages. Those events can be received either in a simple blocking style by letting poll(2) wake-up your process. Alternatively, you can receive events in in a non-blocking style with poll(2) if you give inotify_init1(2) the IN_NONBLOCK flag. The man oage inotify(7) should have an example.

[1] this can get annoyingly complicated - but certainly not impossible - if you have to care about user/group permissions, esp. on the directory. Making a group specific to the message sending can help.

ridiculous_fish · on May 29, 2015

Thank you for this thoughtful reply. There's a variety of options if I'm willing to poll, including shared memory or the filesystem idea you outline, but I hope to avoid polling for hygienic reasons. I also explored inotify but found it to be unreliable (https://github.com/travis-ci/travis-ci/issues/2342).

pdkl95 · on May 29, 2015

That's why I like inotify in blocking mode - the call to poll() is just to wake up the process (I think you could just blocking-read the inotify file handle? I haven't tried it directly). The point of using inotify is that you don't need to poll, because the kernel send your process reliable events instead over a file handle. The use of poll(2) is just a consequence of the interface using a file handle.

As, I originally said, though, there is certainly no one-size-fits-all solution, these are just a few of the available options, which may not be apropriate for your situation.

ridiculous_fish · on May 29, 2015

I like blocking inotify in principle - the problem is that it just didn't work! I think there is a gap in the Linux APIs in this area. Its multicast IPC mechanisms are just too heavyweight.

rdtsc · on May 29, 2015

You can do it with shared memory. One writer writes and multiple readers can observe. I did this for both low latency and throughput reasons.

In general you have to be very careful how you handle it and consider various consistency and failures scenarios.

The main part of memory layout looks something like this:

  [write_counter][......buffer.....]

This is owned and updated by the writer. Readers have a read_counter that they maintain in their own context (not shared).

You'd probably have to declare this using the 'volatile' keyword. Otherwise compilers will optimize away access to part of this (seemingly) unused variables.

Then it works like this:

The writer_counter value is always counting up when writer writes. Data gets written to buffer. Then "write_counter" is incremented. Both reader and writer index into buffer by using {write|reader}_counter%buffer_size.

Also, note these counters also function as total counts of items written and so each reader cand determine how far ahead the writer is.

Another note: depending on the sizes of your counter it will not necessarily be updating atomically. Compiler could separate the update as multiple instructions and say, increment the lower part of the value, then the upper part. Writer could get pre-empted between those two instructions, so you could get this strange torn value. In this case because of the % you'd still fall into the range of the buffer. But, you might be reading data you didn't expect. Whether that works for your use case or not you'll have to see.

EDIT: Don't also forget about the slow and stupid multicast mechanism -- writing to a file. Some file operations can be atomics (renaming a file). And some operating systems let you watch the files for changes.

gnachman · on May 29, 2015

I haven't tried it, but you should be able to have multiple processes listen on the same UDP port bound to localhost using SO_REUSEPORT. Send broadcast UDP packets (using SO_BROADCAST) so they all get it.

From here: http://www.kohala.com/start/mcast.api.txt

"More than one process may bind to the same SOCK_DGRAM UDP port if the bind() is preceded by:

int one = 1; setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one))

"In this case, every incoming multicast or broadcast UDP datagram destined to the shared port is delivered to all sockets bound to the port. For backwards compatibility reasons, THIS DOES NOT APPLY TO INCOMING UNICAST DATAGRAMS -- unicast datagrams are never delivered to more than one socket, regardless of how many sockets are bound to the datagram's destination port."

troglobit · on May 28, 2015

TIPC is pretty neat, it's available in the kernel since quite a long time (AF_TIPC) and works between processes on one or many nodes.

http://tipc.sourceforge.net/

ploxiln · on May 29, 2015

Classic signals are tricky... but you can send a signal to all members of a "process group".

see: man killpg, man setpgid

also, "zeromq" has a pubsub mode, though I've never used it and I'm not sure about its limitations

easytiger · on May 29, 2015

zeromq over loopback or with pgm is prob as good/fast as UDS

vezzy-fnord · on May 29, 2015

You can actually implement a basic pubsub (both one-to-many and many-to-many) mechanism using FIFOs and file system permissions in a particular fashion known as a fifodir:

http://skarnet.org/software/s6/fifodir.html

http://skarnet.org/software/s6/ftrig.html

ekiru · on May 29, 2015

dbus signals can be multicast. Specifically, dbus signals without a destination are routed to all connections with match rules (added with org.freedesktop.DBus.AddMatch on the message bus) which match the signal.