Hacker Newsnew | past | comments | ask | show | jobs | submit | eximius's commentslogin

These strategies only really work for stream processing. You also want idempotent APIs which won't really work with these. You'd probably go for the strategy they pass over which is having it be an arbitrary string key and just writing it down with some TTL.


More of a data structure than an algorithm, if we're being pedantic.


How many trillions of dollars have we spent on these things?

Would we not expect similar levels of progress in other industries given such massive investment?


I’m not sure even $1T has been spent. Pledged != spent.

Some estimates have it at ~$375B by the end of 2025. It makes sense, there are only so many datacenters and engineers out there and a trillion is a lot of money. It’s not like we’re in health care. :)

https://hai.stanford.edu/ai-index/2025-ai-index-report/econo...


I wonder how much is spent refining oil and how much that industry has evolved.

Or mass transit.

Or food.


Or on "a cure for cancer" (according to Gemini, $2.2T 2024 US dollars...)


10 year survival is 50% in 2024 in the UK. It was 25% in the 1970s.

Age-standardised deaths in the US are down by a third since the 1990s.


> Rather than defining a single interface and mock on the producer side that can be reused by all these packages

This is the answer. The domain that exports the API should also provide a high fidelity test double that is a fake/in memory implementation (not a mock!) that all internal downstream consumers can use.

New method on the interface (or behavioral change to existing methods)? Update the fake in the same change (you have to, otherwise the fake won't meet the interface and uses won't compile!), and your build system can run all tests that use it.


> The domain that exports the API should also provide a high fidelity test double that is a fake/in memory implementation (not a mock!)

Not a mock? But that's exactly what a mock is: An implementation that isn't authentic, but that doesn't try to deceive. In other words, something that behaves just like the "real thing" (to the extent that matters), but is not authentically the "real thing". Hence the name.


There are different definitions of the term "mock". You described the generic usage where "mock" is a catch-all for "not the real thing", but there are several terms in this space to refer to more precise concepts.

What I've seen:

* "test double" - a catch-all term for "not the real thing". What you called a "mock". But this phrasing is more general so the term "mock" can be used elsewhere.

* "fake" - a simplified implementation, complex enough to mimic real behavior. It probably uses a lot of the real thing under the hood, but with unnecessary testing-related features removed. ie: a real database that only runs in memory.

* "stub" - a very thin shim that only provides look-up style responses. Basically a map of which inputs produce which outputs.

* "mock" - an object that has expectations about how it is to be used. It encodes some test logic itself.

The Go ecosystem seems to prefer avoiding test objects that encode expectations about how they are used and the community uses the term "mock" specifically to refer to that. This is why you hear "don't use mocks in Go". It refers to a specific type of test double.

By these definitions, OP was referring to a "fake". And I agree with OP that there is much benefit to providing canonical test fakes, so long as you don't lock users into only using your test fake because it will fall short of someone's needs at some point.

Unfortunately there's no authoritative source for these terms (that I'm aware of), so there's always arguing about what exactly words mean.

Martin Fowler's definitions are closely aligned with the Go community I'm familiar with: https://martinfowler.com/articles/mocksArentStubs.html

Wikipedia has chosen to cite him as well: https://en.wikipedia.org/wiki/Test_double#General .

My best guess is that software development co-opted the term "mock" from the vocabulary of other fields, and the folks who were into formalities used the term for a more specific definition, but the software dev discipline doesn't follow much formal vocabulary and a healthy portion of devs intuitively use the term "mock" generically. (I myself was in the field for years before I encountered any formal vocabulary on the topic.)


> "mock" - an object that has expectations about how it is to be used. It encodes some test logic itself.*

Something doesn't add up. Your link claims that mock originated from XP/TDD, but mock as you describe here violates the core principles of TDD. It also doesn't fit the general definition of mock, whereas what you described originally does.

Beck seemed to describe a mock as something that:

1. Imitates the real object.

2. Records how it is used.

3. Allows you to assert expectations on it.

#2 and #3 sound much like what is sometimes referred to as a "spy". This does not speak to the test logic being in the object itself. But spies do not satisfy #1. So it is seems clear that what Beck was thinking of is more like, say, an in-memory database implementation where it:

1. Behaves like a storage-backed database.

2. Records changes in state. (e.g. update record)

3. Allows you to make assertions on that change in state. (e.g. fetch record and assert it has changed)

I'm quite sure Fowler's got it wrong here. He admits to being wrong about it before, so the odds are that he still is. The compounding evidence is not in his favour.

Certainly if anyone used what you call a mock in their code you'd mock (as in make fun of) them for doing so. It is not a good idea. But I'm not sure that equates to the pattern itself also being called a mock.


> 3. Allows you to assert expectations on it.

I think this is the crux that separates Fowler's mock, spy, and stub: Who places what expectations.

Fowler's mock is about testing behavioral interaction with the test double. In Fowler's example, the mock is given the expectations about what APIs will be used (warehouseMock.expects()) then those expectations are later asserted (warehouseMock.Verify()).

Behavioral interaction encodes some of the implementation detail. It asserts that certain calls must be made, possibly with certain parameters, and possibly in a certain order. The danger is that it is somewhat implementation specific. A refactoring that keeps the input/output stable but achieves the goal through different means must still update the tests, which is generally a red flag.

This is what my original statement referred to, the interaction verification. Generally the expectations are encoded in the mock itself for ergonomics sake, but it's technically possible to do the interaction testing without putting it in the mock. Regardless of exactly where the assertion logic goes, if the test double is testing its interactions then it is a Fowler mock.

(As an example: An anti-pattern I've seen in Python mocks is asserting that every mocked object function call happens. The tests end up being basically a simplified version of the original code and logic flaws in the code can be copied over to the tests because they're basically written as a pseudo stack trace of the test case.)

In contrast, a stub is not asserting any interaction behavior. In fact it asserts nothing and lets the test logic itself assert expectations by calling the API. ie:

> 3. Allows you to make assertions on that change in state. (e.g. fetch record and assert it has changed)

How is that change asserted?

A Fowler stub would be:

> myService = service.New(testDB.New()) > myService.write("myKey", 42) > assert(myService.read("myKey") == 42)

A Fowler mock would be:

> testDB = testDB.New() > testDB.Expect(write, "myKey", 42) > myService = service.New(testDB) > myService.write("myKey", 42) > testDB.Verify()

These concepts seem distinct enough to make mock a simple.

Fowler's spy seems to sit half-way between mock and stub: It doesn't assert detailed interaction expectations, but it does check some of the internals. A spy is open-ended, you can write any sort of validation logic, whereas a mock is specifically about how it is used.

I have used spys in Go basically whenever I need to verify side effect behavior that is not attainable via the main API.

By Fowler's definition, nocks are a niche test double and I suspect that what many folks would call a mock are not technically a mock.


This is pretty great! The main thing you need for durable execution is 1) retries (absurd does this) 2) idempotency (absurd does this via steps - but would be better handled with the APIs themselves being idempotent, then not using steps. Though absurd would certainly _help_ mitigate some APIs not being idempotent, but not completely).


> idempotency (absurd does this via steps - but would be better handled with the APIs themselves being idempotent, then not using steps

That is very hard to do with agents which are just all probabilistic. However if you do have an API that is either idempotent / uses idempotency keys you can derive an idempotency key from the task: const idempotencyKey = `${ctx.taskID}:payment`;

That said: many APIs that support the idempotency-key header only support replays of an hour to 24 hours, so for long running workflows you need to capture the state output anyways.


I was not thinking of the agent case specifically. But yes, you have to make the APIs idempotent, either with these step checkpoints or by wrapping the underlying API. It's not hard to make a postgres-transaction-based idempotency layer wrapper, then you can have a much longer idempotency TTL.

> so for long running workflows you need to capture the state output anyways.

That would be a _very_ long running workflow. Probably worth breaking up into different subtasks or, I guess as Absurd does it, step checkpoints.


Por que no los dos? They are not mutually exclusive, really. Perhaps the first skews the initial population and the second exacerbates it.


Sure, but is one more relevant?


It's mostly happy for people who don't have to deal with it.

If you are an "end user" who just wants to run your damn code without caring about your dev environment, then `bazel run|build|test //thing/to/run:target` is about as good as you can get! _If bazel is already set up_, I don't have to worry about my environment! It just works.

If your environment has a lot of churn and there isn't a team who makes sure bazel is actually configured correctly, then, yea, it is massive overkill for a lot of things and if you try to do things how you normally would and not the bazel way, you'll have a bad time.

There are other benefits - sometimes you want public APIs so it _can_ be used, but you want visibility rules to limit _who_ can use it. It is great for it's cacheability and dependency tracking - if you need advanced build tooling, it has what you need!

But there is a very real chance you don't need any of these things and so the cost is not worth it.

(I, personally, hate dev environment churn, so just having the CLI tooling uniformity is enough for me.)


Why does this 1 hour old post have comments from 29 days ago?


Mods sometimes merge posts. Whether any particular merger makes sense can be a bit of a toss-up.


There is (some) truth to this but it fundamentally _does not_ replace foundational learning.


Not saying it does but what I am saying is it’s not two brackets anymore. It’s not Math & Literacy defined by one’s ability to do long divisional polynomials and identify the problematic grammar of some sub paragraph from a book from 1880s.

The reality is the testing and scores don’t reflect reality and everyone’s in arms over it instead of looking at the testing methodology.


But they do reflect ones ability to learn to get to the heights of education where you can be trusted to wield the new tools wisely.


Right… like a trained monkey to put the blocks in the hole. I have yet to see anyone wield tools wisely that requires higher education outside of CERN.


Visibility systems are great!

> If a change module A has a problem, you must roll back the entire monolith, preventing a good change in module B from reaching users.

eh. In these setups you really want to be fixing forward for the reason you describe - so you revert the commit for feature A or turn off the feature flag for it or something. You don't really want to be reverting deployments. If you have to, well, then it's probably worth the small cost of feature B being delayed. But there are good solutions to shipping multiple features at the same time without conflicting.


There is a point in commit throughput at which finding a working combination of reverts becomes unsustainable. I've seen it. Feature flagging can delay this point but probably not prevent it unless you’re isolating literally every change with its own flag (at which point you have a weird VCS).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: