Memory layout of your code has such a big impact on performance on modern computers that measuring performance without removing that variable leads to wild goose chases, where you think you improved something, but in reality you incidentally got the compiler to move the code around a bit.
Emery Berger has an excelent talk on this [1], and a causal profiler that they developed called Coz[2].
Branch miss-predicts might be somewhat invariant to that, but still, one of the main points of the talk is that people do too much eyeball statistics, mistaking the variance of the underlying stochastic process with actual signal.
Causal profiling still uses hypothesis testing, so I don't get your point. Ultimately, one needs to know if software A is 'better' or 'worse' than software A' for any given definition of those terms.
Coz appears to use a very good method for profiling multi-threaded applications, but which seems to be not applicable to single-threaded applications and I do not see how it would help to avoid the variability caused by different memory layouts, which, as you rightly point out, may confound profiling results.
Yes, just because it's from a team connected with the old Stabilizer tool, it doesn't mean it shares the insights and benefits of that kind of testing. At least, that's my understanding based on my limited knowledge of coz, and a little bit more knowledge of https://github.com/ccurtsinger/stabilizer
I've been working on algorithm benchmarking software, and I observed that small changes to algorithm code, even in parts that would not execute, would have a large effect on execution speed.
Even better they seem to have a solution to the problem. Going to explore further to see how we can use this technique.
Kindof off-topic, but has anyone had any success in using perf_event_open from within WSL, or is that unsupported? I tried `sysctl kernel.perf_event_paranoid=0` and the following, which works under "real" linux, but fails under WSL:
struct perf_event_attr attr = {
.type = PERF_TYPE_HARDWARE,
.size = sizeof(struct perf_event_attr),
.config = PERF_COUNT_HW_INSTRUCTIONS,
.disabled = 1, /* only need to read once, not updating values, but fails with 0 also */
.exclude_kernel = 1,
.exclude_hv = 1
};
int fd = perf_event_open(&attr, 0, -1, -1, 0); /* fails, returns -1 */
Microsoft's kernel appears to have the right CONFIG_* as well, so I feel like it must be supported in some capacity, even if limited.
The name poop does not hold in the intercultural sense. Indeed, in northern dutch poep means shit. In southern dutch poepen means having sexual intercourse.
I've never used Zig or C++, mostly Bash, Python, and Lisps, but have recently began learning C. I see C++ in search results, and it looks like C with higher level features in a register that reminds me of Java. I see this, and it looks similer: C with higher level features that remind me more of Python. I'm sure it's got other important features and priorities that set it apart (oft' discussed here), but that's what I see, and I dig it.
To be clear there are definitely a few ideoms I'd need to learn before considering this readable (it ain't psudocode), but I generally see all the building blocks and recognize what they're achiving. Maybe I'd have put them together slightly differently, but that could be equally true in Python. I'd rather learn this next than C++.
The density of how it's been written reminds me of some lisp code I've read, some implementations of state machines can be hard to follow too (in the classic "but where does it do the thing?" sense). I attribute a lot of the stuff that I don't follow to the lower-level C influence; I'm working for the first time with memory managment, structs, static types, pointers, bitwise opperations, etc.
Think I made my point in there somewhere >z>
Would love to know where you're coming at it from, I'd imagine that the C or 'kinda functional control flow' aspects might explain what looks most forign to you too (unless you have broader experience with low level languages)?
Youre mostly right, but Zig is not higher level than C. In some ways, its lower level, as in C the allocator you use is mostly abstracted away, while in zig its usually required to be specified.
Looks fine to me. Syntax is the run-of-the-mill post-Java style currently favored, as exemplified by e.g. Haskell or Rust. Not a fan personally, but this is what is popular now, so that's what we have to live with ¯\_(ツ)_/¯.
Style-wise, looks fine. Code split up into functions where it makes sense, inlined where it doesn't. I've seen worse - for example, code that takes the equivalent of that, and splits it into 100 functions across 10+ files.
> I've seen worse - for example, code that takes the equivalent of that, and splits it into 100 functions across 10+ files.
This so many times. Will never understand why people are so afraid of multi thousand lines of properly structured code but will happily giggle when the same structure is split across 10-20 files. And no, it’s not for reuse sake.
Many thousand lines of structured code that does not get broken into logical modules ends up an unnavigable and unmaintainable mess. I would not hesitate to split up such code.
Encapsulation and modularity are good engineering practices. Personally, I will not contribute to any project that opposes such things
Memory layout of your code has such a big impact on performance on modern computers that measuring performance without removing that variable leads to wild goose chases, where you think you improved something, but in reality you incidentally got the compiler to move the code around a bit.
Emery Berger has an excelent talk on this [1], and a causal profiler that they developed called Coz[2].
Branch miss-predicts might be somewhat invariant to that, but still, one of the main points of the talk is that people do too much eyeball statistics, mistaking the variance of the underlying stochastic process with actual signal.
Coz is pretty trivial to set up with Zig too.
1. https://m.youtube.com/watch?v=r-TLSBdHe1A
2. https://github.com/plasma-umass/coz