Hacker Newsnew | past | comments | ask | show | jobs | submit | mrkeen's commentslogin

This is so dramatic it's hard to recover the original complaint.

Dansup has built a photo-sharing app on top of ActivityPub, and we humans are a lost cause because the app doesn't also do text-only messages?

Is that the gist of it?


Not really.

The gist of it is if Google decides to build GMail but Gmail silently deleted emails that it did not find entertaining enough so you didn’t even know they were ever sent to you.

The article is saying some people see ActivityPub as a communication protocol like Gmail where you expect all messages to be delivered, while others see it as an entertainment protocol where the goal is to entertain the user.


It's more like Usenet users complaining that NZB downloaders don't let their users read text posts. Nobody using an NZB downloader gives a fuck about text posts. They're not there to chat with their fellow humans, they're there to download files. Both the text posts and binary files are transmitted by the same substrate, NNTP, but the protocol clearly has multiple groups of people using it for very different purposes.

Comparing it to email is inappropriate, because email is addressed to you, and you get upset if email servers/clients drop emails. But newsfeeds are not addressed to you. Neither is RSS/Atom. ActivityPub, generally speaking, isn't either. How you choose to experience messages coming your way is up to you. This whole article is making the assumption that if you want something more different, e.g. Pixelfed, PeerTube, Lemmy (Fediverse Instagram/YouTube/Reddit), it basically must also be Mastodon/Pleroma (Fediverse Twitter). Why must it?


Yes but the author lumps all decentralized and/or social networks together when all he really means to talk about is ActivityPub specifically.

What is the image sharing platform's criteria for "entertaining enough"? Is it whether or not the message is an image?

> This is so dramatic it's hard to recover the original complaint.

I'm curious if this message is new to many here?

What makes it feel "dramatic"? I get the impression people say something is "dramatic" when it doesn't really land or connect? Because when something punches me in the gut, I don't say "that was dramatic", I say "that was compelling".

I'm over 40, and these kinds of concerns (technology serving people's deeper needs rather than serving them up fleeting entertainment) has been on my radar for 15+ years. Back then, I was expecting to go into public administration, policy analysis, or "technology for good" to use what might be a naive phrase.


That will be the case when Alice stands close to where C happens, and Bob stands close to where D happens.

It's a little trickier to imagine introducing cause-and-effect though. (Alice sees that C caused D to happen, Bob sees that D caused C to happen).

I think a "light cone" is the thought-experiment to look up here.


There is distinction between seeing when events happened, and when they really happened. The latter can be reconstructed by an observer.

In special relativity, time is relative and when things actually happened can be different in different frames. Casually linked events are always really in the same order. But disconnected events can be seen in different orders depending on speed of observer.


> But disconnected events can be seen in different orders depending on speed of observer.

What are "disconnected events"? In a subtle but still real sense, are not all events causally linked? e.g. gravitationally, magnetically, subatomically or quantumly?

I can understand that our simple minds and computational abilities lead us to consider events "far away" from each other as "disconnected" for practical reasons. But are they really not causally connected in a subtle way?

There are pieces of space time that are clearly, obviously causally connected to each other. And there are far away regions of the universe that are, practically speaking, causally disconnected from things "around here". But wouldn't these causally disjoint regions overlap with each other, stringing together a chain of causality from anywhere to anywhere?

Or is there a complete vacuum of insulation between some truly disconnected events that don't overlap with any other observational light cone or frame of reference at all?


We now know that gravity moves at the speed of light. Imagie that you aretwo supernovas that for some unknown reason, explode at essentially the same time. Just before you die from radiation exposure, you will see the light pulse from each supernova before each supernova can 'see' the gravitational disruption caused by the other. Maybe a gravity wave can push a chain reaction on the verge of happening into either a) happening or b) being delayed for a brief time, but the second explosion happened before the pulse from the first could have arrived. So you're pretty sure they aren't causally linked.

However if they were both triggered by a binary black hole merger, then they're dependent events but not on each other.

But I think the general discussion is more of a 'Han shot first' sort. One intelligent system reacting to an action of another intelligent system, and not being able to discern as a person from a different reference frame as to who started it and who reacted. So I suppose when we have relativistic duels we will have to preserve the role of the 'second' as a witness to the events. Or we will have to just shrug and find something else to worry about.


Causality moves at the speed of light. Events that are farther apart are called spacelike and aren't causally connected.

I think you might be confusing events that have some history between them, and those are influence each other. Like say right now, Martian rover sends message to Earth and Earth sends message to them, those aren't causally connected cause don't know about the other message until light speed delay has passed.


> But wouldn't these causally disjoint regions overlap with each other

Yes.

> stringing together a chain of causality from anywhere to anywhere?

No? Causality reaching one edge of a sphere doesn't mean it instantaneously teleports to every point in that same sphere. This isn't a transitive relationship.

> What are "disconnected events"?

The sentence you're responding to seems like a decent definition. Disconnected events are events which might be observed in either order depending on the position of an observer.


If Bob and Alice are moving at half the speed of light in opposite directions.

Everything being passed by object reference just means every case is equally unclear.

  answer = frobnicate(foo)
Will frobnicate destroy foo or not?

If you mean that it can modify it, you should say that. It can't destroy it as that term is generally understood.

No. It can’t. It can only destroy its own reference to foo, not the calling scope’s reference.

Right, but I don't care about the reference to foo (that's a low-level detail that should be confined to systems languages, not application languages) I was asking about the foo.

foo is a name. It's not at all clear what you mean by "the foo" ... the called function can modify the object referenced by the symbol foo unless it's immutable. If this is your complaint, then solve it with documentation ... I never write a function, in any language, that modifies anything--via parameter or in an outer scope--without documenting it as doing so.

> I don't care about the reference to foo (that's a low-level detail that should be confined to systems languages, not application languages)

This is not true at all. There's a big difference, for instance, between assigning a reference and assigning an object ... the former results in two names referring to the same object, whereas in the latter case they refer to copies. I had a recent case where a bug was introduced when converting Perl to Python because Perl arrays have value semantics whereas Python lists have reference semantics.

There seem to be disagreements here due entirely to erroneous or poor use of terminology, which is frustrating ... I won't participate further.


>> I sorely miss it in Python, JS and other languages. They keep me guessing whether a function will mutate the parent structure, or a local copy in those languages!

> Python at least is very clear about this ... everything, lists, class instances, dicts, tuples, strings, ints, floats ... are all passed by object reference. (Of course it's not relevant for tuples and scalars, which are immutable.)

Then let me just FTFY based on what you've said later:

Python will not be very clear about this ... everything, lists, class instances, dicts, tuples, strings, ints, floats, they all require the programmer to read and write documentation every time.


Right, but that reference is all the function has. It can’t destroy another scope’s reference to the foo, and the Python GC won’t destroy the foo as long as a reference to it exists.

The function could mutate foo to be empty, if foo is mutable, but it can’t make it not exist.


>> I sorely miss it in Python, JS and other languages. They keep me guessing whether a function will mutate the parent structure, or a local copy in those languages!

No mention of references!

I don't care about references to foo. I don't care about facades to foo. I don't care about decorators of foo. I don't care about memory segments of foo.

"Did someone eat my lunch in the work fridge?"

"Well at least you wrote your name in permanent marker on your lunchbox, so that should help narrow it down"


Then I don’t know what you mean. If you have:

  foo = open(‘bar.txt’)
  answer = frobnicate(foo)
  print(foo)
then frobnicate may call foo.close(), or it may read foo’s contents so that you’d have to seek back to the beginning before you could read them a second time. There’s literally nothing you can do in frobnicate that can make it such that the 3rd raises a NameError because foo no longer exists.

In other words,

>> I sorely miss it in Python, JS and other languages. They keep me guessing whether a function will mutate the parent structure, or a local copy in those languages!


  #!/usr/bin/env python3
  import inspect
  
  def frobnicate(unfrobbed: any) -> None:
      frame = inspect.currentframe().f_back
      for name in [name for name, value in frame.f_locals.items() if value is unfrobbed]:
          del frame.f_locals[name]
      for name in [name for name, value in frame.f_globals.items() if value is unfrobbed]:
          del frame.f_globals[name]
  
  foo = open("bar.txt")
  answer = frobnicate(foo)
  print(foo)

  
  Traceback (most recent call last):
    File "hackers.py", line 20, in <module>
      print(foo)
            ^^^
  NameError: name 'foo' is not defined
Be careful with the absolutes now :)

Not that this is is reasonable code to encounter in the wild, but you certainly can do this. You could even make it work properly when called from inside functions that use `fastlocals` if you're willing to commit even more reprehensible crimes and rewrite the `f_code` object.

Anyway, it's not really accurate to say that Python passes by reference, because Python has no concept of references. It passes by assignment. This is perfectly analogous to passing by pointer in C, which also can be used to implement reference semantics, but it ISN'T reference semantics. The difference comes in assignment, like in the following C++ program:

  #include <print>
  
  struct Object
  {
      char member{'a'};
  };
  
  void assign_pointer(Object *ptr)
  {
      Object replacement{'b'};
      ptr = &replacement;
  }
  
  void assign_reference(Object &ref)
  {
      Object replacement{'b'};
      ref = replacement;
  }
  
  int main()
  {
      Object obj{};
      std::println("Original value: {}", obj.member);
      assign_pointer(&obj);
      std::println("After assign_pointer: {}", obj.member);
      assign_reference(obj);
      std::println("After assign_reference: {}", obj.member);
      return 0;
  }

  $ ./a.out
  Original value: a
  After assign_pointer: a
  After assign_reference: b

Just like in Python, you can modify the underlying object in the pointer example by dereferencing it, but if you just assign the name to a new value, that doesn't rebind the original object. So it isn't an actual reference, it's a name that's assigned to the same thing.

ANYWAY, irrelevant nitpicking aside, I do think Python has a problem here, but its reference semantics are kind of a red herring. Python's concept of `const` is simply far too coarse. Constness is applied and enforced at the class level, not the object, function, or function call level. This, in combination with the pass-by-assignment semantics does indeed mean that functions can freely modify their arguments the vast majority of the time, with no real contract for making sure they don't do that.

In practice, I think this is handled well enough at a culture level that it's not the worst thing in the world, and I understand Python's general reluctance to introduce new technical concepts when it doesn't strictly have to, but it's definitely a bit of a footgun. Can be hard to wrap your head around too.


They seem to be using "destroy" in some colloquial sense, actually meaning "modify".

I'm truly not sure, but you're probably right.

Damn that's fast! I'm gonna stick my business logic in there instead.

It's not even that bad.

As long as it's actual json, it doesn't matter if it's pretty-printed or not, since `jq` can fold and unfold it at will.

I frequently fold logs into single lines, grep for something, then unfold them again


Weird, that's exactly how I feel reading Go:

  func (lst *List[T]) Push(v T) {
    if lst.tail == nil {
        lst.head = &element[T]{val: v}
        lst.tail = lst.head
    } else {
        lst.tail.next = &element[T]{val: v}
        lst.tail = lst.tail.next
    }
}

And this one doesn't even have the infamous error-checking.


You cherry picked a contrived example but that's one of the cleanest generics implementations.

Now imagine if it had semicolons, ->, ! and '.


> You cherry picked a contrived example

List.add is contrived? What are you doing that's simpler the list.add?

> but that's one of the cleanest generics implementations.

You're saying it's typically worse than this?


He is referring to

  &element[T]{val: v}
the & is a pointer, which is common across most languages, but the [T] is a dynamic type T. Otherwise it would be just

  &element{val: v}
He says that element[T] is a clean/simple implementation of generics.

Please, it supports a hole at best. Maybe a pit. No way will this let you construct an abyss.

Supporting recursion only to a depth of 1000 (or whatever) is equivalent to supporting loops of up to 1000 iterations.

If I put out a language that crashed after 1000 iterations of a loop, I'd welcome the rudeness.


Plenty of languages, including very serious ones like C and Rust, have bounded recursion depth.

Then let me rephrase:

If every iteration of a while-loop cost you a whole stack frame, then I'd be very rude about that language.

This works, btw:

  #include <stdio.h>

  long calc_sum(int n, long acc) {
    return n == 0
      ? acc
      : calc_sum(n-1, acc+n);
  }

  int main(void) {
    int iters = 2000000;
    printf("Sum 1...%d = %ld\n", iters, calc_sum(iters, 0));
    return 0;
  }

> If every iteration of a while-loop cost you a whole stack frame, then I'd be very rude about that language.

Well, sure, but real programmers know how to do while loops without invoking a function call.


> Your logs are lying to you. Not maliciously. They're just not equipped to tell the truth.

The best way to equip logs to tell the truth is to have other parts of the system consume them as their source of truth.

Firstly: "what the system does" and "what the logs say" can't be two different things.

Secondly: developers can't put less info into the logs than they should, because their feature simply won't work without it.


That doesn't sound like a good plan. You're coupling logging with business logic. I don't want to have to think if i change a debug string am i going to break something.

You're also assuming your log infrastructure is a lot more durable than most are. Generally, logging is not a guaranteed action. Writing a log message is not normally something where you wait for a disk sync before proceeding. Dropping a log message here or there is not a fatal error. Logs get rotated and deleted automatically. They are designed for retroactive use and best effort event recording, not assumed to be a flawless record of everything the system did.

> You're also assuming your log infrastructure is a lot more durable than most are.

Make actions, not assumptions. Instead of using a one machine storage system, distribute that storage across many machines. Then stop deleting them.

> Dropping a log message here or there is not a fatal error.

I would try to reallocate my effort budget to things that actually need to work.

Drop logging completely, and come back to it once you have a flawless record of everything the system did. The reconsider whether you need it.


> You're coupling logging with business logic

Yes, the system shall not report that "User null was created" if it was actually "User 123 that was created".

String? Not a chance, make a proper type-safe struct. UserCreated { "id": 123}

> I don't want to have to think if i change a debug string am i going to break something.

Good point, you should probably have a unit test somewhere.


Your logic wouldn't be dependent on a debug string, but some enum in a structured field. Ex, event_type: CREATED_TRANSACTION.

Seeing logging as debugging is flawed imo. A log is technically just a record of what happened in your database.


> Design decisions like write-ahead logs, large page sizes, and buffering table writes in bulk were built around disks where I/O was SLOW, and where sequential I/O was order(s)-of-magnitude faster than random.

Overall speed is irrelevant, what mattered was the relative speed difference between sequential and random access.

And since there's still a massive difference between sequential and random access with SSDs, I doubt the overall approach of using buffers needs to be reconsidered.


Can you clarify? I thought a major benefit of SSDs is that there isn't any difference between sequential and random access. There's no physical head that needs to move.

Edit: thank you for all the answers -- very educational, TIL!


Lets take the Samsung 9100 Pro M.2 as an example. It has a sequential read rate of ~6700 MB/s and a 4k random read rate of ~80 MB/s:

https://i.imgur.com/t5scCa3.png

https://ssd.userbenchmark.com/ (click on the orange double arrow to view additional columns)

That is a latency of about 50 µs for a random read, compared to 4-5 ms latency for HDDs.


Datacenter storage will generally not be using M.2 client drives. They employ optimizations that win many benchmarks but sacrifice on consistency multiple dimensions (power loss protection, write performance degrades as they fill, perhaps others).

With SSDs, the write pattern is very important to read performance.

Datacenter and enterprise class drives tend to have a maximum transfer size of 128k, which is seemingly the NAND block size. A block is the thing that needs to be erased before rewriting.

Most drives seem to have an indirection unit size of 4k. If a write is not a multiple of the IU size or not aligned, the drive will have to do a read-modify-write. It is the IU size that is most relevant to filesystem block size.

If a small write happens atop a block that was fully written with one write, a read of that LBA range will lead to at least two NAND reads until garbage collection fixes it.

If all writes are done such that they are 128k aligned, sequential reads will be optimal and with sufficient queue depth random 128k reads may match sequential read speed. Depending on the drive, sequential reads may retain an edge due to the drive’s read ahead. My own benchmarks of gen4 U.2 drives generally backs up these statements.

At these speeds, the OS or app performing buffered reads may lead to reduced speed because cache management becomes relatively expensive. Testing should be done with direct IO using libaio or similar.


At the 4K random reads impacted by the fact that you still cannot switch Samsung SSDs to 4K native clusters?

I think that is a bigger impact on writes than reads, but certainly means there is some gap from optimal.

To me a 4k read seems anachronistic from a modern application perspective. But I gather 4kb pages are still common in many file systems. But that doesn’t mean the majority of reads are 4kb random in a real world scenario.


That’s literally faster to do a full table scan below a particular table size.

SSDs have three block/page sizes:

- The access block size (LBA size). Either 512 bytes or 4096 bytes modulo DIF. Purely a logical abstraction.

- The programming page size. Something in the 4K-64K range. This is the granularity at which an erased block may be programmed with new data.

- The erase block size. Something in the 1-128 MiB range. This is the granularity at which data is erased from the flash chips.

SSDs always use some kind of journaled mapping to cope with the actual block size being roughly five orders of magnitude larger than the write API suggests. The FTL probably looks something like an LSM with some constant background compaction going on. If your writes are larger chunks, and your reads match those chunks, you would expect the FTL to perform better, because it can allocate writes contiguously and reads within the data structure have good locality as well. You can also expect for drives to further optimize sequential operations, just like the OS does.

(N.b. things are likely more complex, because controllers will likely stripe data with the FEC across NAND planes and chips for reliability, so the actual logical write size from the controller is probably not a single NAND page)


SSD controllers and VFSs are often optimized for sequential access (e.g. readahead cache) which leads to software being written to do sequential access for speed which leads to optimization for that access pattern, and so on.

It depends on the side of read - most SSD’s have internal block sizes much larger than a typical (actual) random read, so they internally have to do a lot more work for a given byte of output in a random read situation than they would in a sequential one.

Most filesystems read in 4K chunks (or sometimes even worse, 512 byes), and internally the actual block is often multiple MB in size, so this internal read multiplication is a big factor in performance in those cases.

Note the only real difference between a random read and a sequential one is the size of the read in one sequence before it switches location - is it 4K? 16mb? 2G?


That's not possible. It's not an SSD thing either, it always applies to everything [0].

Sequential access is just the simplest example of predictable access, which is always going to perform better than random access because it's possible to optimize around it. You can't optimize around randomness.

So if you give me your fanciest, fastest random access SSD, I can always hand you back that SSD but now with sequential access faster than the random access.

[0]: RAM access, CPU branch prediction, buying stuff in bulk...


Some discussion in the FragPicker paper (2021) FWIW: https://dl.acm.org/doi/10.1145/3477132.3483593

> Our extensive experiments discover that, unlike HDDs, the performance degradation of modern storage devices incurred by fragmentation mainly stems from request splitting, where a single I/O request is split into multiple ones.


SSD block size is far bigger than 4kB. They still benefit from sequential write

Read up on IOPS, conjoined with requests for sequential reads.

Same with doing things in RAM as well. Sequential writes and cache-friendly reads, which b-trees tend to achieve for any definition of cache. Some compaction/GC/whatever step at some point. Nothing's fundamentally changed, right?

pity Optane which solved for this quite well, was discontinued.

It really is a shame optane is discontinued. For durable low latency writes there really is nothing else out there.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: