Wait a sec. You can't bust out numbers like that without being clear about a few key points:
1. What was std::vector's removal algorithm like? Are you sure that .remove() wasn't shifting all elements in the vector? The "correct" removal algorithm in that case looks like this: Swap the target element to the end, and then decrease the vector's size by 1. But if I remember correctly, std::vector's remove method doesn't work like that, so of course it will have terrible worst-case performance if it was shifting all elements of the array when you remove the first one.
2. What's causing the worst-case slowdown? Are you sure you weren't simply seeing the effects of the L2 CPU cache being invalidated?
A linked list will thrash the CPU cache constantly, whereas vectors thrash the cache much less since the elements are stored in contiguous memory. For a modern game running on a modern CPU, vectors are almost always the better choice for that reason.
EDIT: I'm not sure your numbers are trustworthy. For example, there's no way it rendered 35859663 frames in 100 seconds. That would be 358,596 FPS.
Assuming your FPS was actually something like 358, not 358,596, that's still an extremely high FPS. When your FPS is so high, it becomes difficult to accurately measure the reasons for worst-case performance. A sudden spike within a single frame could be for any of a dozen reasons, such as CPU cache eviction, a GPU pipeline stall, or some other program on your computer taking CPU resources.
>Assuming your FPS was actually something like 358, not 358,596, that's still an extremely high FPS. When your FPS is so high, it becomes difficult to accurately measure the reasons for worst-case performance. A sudden spike within a single frame could be for any of a dozen reasons, such as CPU cache eviction, a GPU pipeline stall, or some other program on your computer taking CPU resources.
As an aside, programmers need to stop using FPS and other inverse units of measurement when performance tuning. It's much easier to reason about 2.79~ milliseconds than it is to reason about 358 FPS. For instance, assuming a single threaded application, let's say I know my game simulation alone can be updated at 125FPS, while my rendering code can be updated at 30FPS. How many FPS does my game run at when I run both together? You'd be hard pressed to figure that out without converting the two to 8 and 33⅓ milliseconds per tick, respectively, making the combination of the two obviously 41⅓ milliseconds, and thus about 24.2FPS.
Yes, your comment about performance measurements under lights loads not accurately reflecting performance under greater loads is true, but 2.79 milliseconds really aren't so far from the standard target of 16.66 milliseconds per frame than the 358 vs. 60FPS comparison would make one guess.
> Are you sure you weren't simply seeing the effects of the L2 CPU cache being invalidated
Perhaps, but I doubt it's the cause of the worst case scenario, particularly since I'm doing identical loops over the physics and renderable objects, and as you say the linked list should have the worse performance.
EDIT: One more small point - if the programmer is relying on the order of the items at all, the "tail tuck" method of deletion would not work.
EDIT[2]: Addressing your edit - the numbers are only for the action of applying physics and adding/deleting objects - the code should reflect that. There is no actual rendering time or AI or anything else which would prevent us from seeing upwards of 100,000 "frames" per second.
The times are also fairly stable (the standard deviation of max times is around 20%) between runs.
Well, your comment led with "Linked lists have more consistent performance than std::vector." But the tests just don't support that conclusion, and the truth is the opposite in a real-world environment: lists will kill your performance.
Before you'll see any downsides of list, you'll have to introduce memory fragmentation into your test and ensure that your list's allocations end up all over your address space, like they would be in a real engine. At that point you'll see a huge slowdown compared to vector due to CPU cache thrashing.
I apologize that my comment probably came off as a bit too upfront. I need to work on that.
I was only saying that it isn't really a good idea to present numbers like that without having a clear idea of what's causing the worst-case performance. Glancing over the code, and assuming the first bullet fired is always the first one to impact the ground, it looks like the code is always going to run std::vector::erase on the first element of the vector, which will always shift the 499 remaining elements. It's possible that's the reason for the worst-case performance, rather than vector being inherently a worse choice than list.
It was very counterintuitive and weird to realize that vector is almost always a better choice than list due to how powerful the CPU caches are. CPU caching is so good that taking full advantage of caching will usually outweigh algorithmic tweaks, like whether you're using a list or a vector. So the power of vector is that its elements are in contiguous memory, whereas the downfall of list is that they usually aren't.
It's difficult to prove that's true except in real-world scenarios. Unless a test is designed to introduce memory fragmentation, the cache performance penalty of using list won't really show up in the results. All of the list allocations will be pretty much contiguous in a test environment, so the tests will show performance on par with vector. But in the real engine, you'll be left scratching your head why everything's running so slowly, when the answer is because the elements are sprayed all across your memory space and it's causing the CPU cache to stall when they're accessed...
EDIT:
There is no actual rendering time or AI or anything else which would prevent us from seeing upwards of 100,000 "frames" per second.
When your FPS is higher than, say, 1,000, it becomes very difficult to pin down the reasons for performance fluctuations. An engine under load behaves differently than an artificial test, so you'll need to measure the performance impact in a real engine, especially when the framerates in the test are so high.
Your monitor is capped at 70 fps, or some high-end monitors can do ~130. Something like 100,000 fps isn't an accurate reflection of the behavior of the real system. Most of the discrepancy is due to memory fragmentation causing CPU thrashing, but your specific test might be running into some other discrepancies, like:
- since your test is running for 100+ seconds and your timing measurements are taking into account an entire frame, some frames are going to run interrupts. If an interrupt takes a long time to process, that load will show up in your timing measurements. The result is that vector (or list) will be unfairly pinned as the reason for the performance issue, when interrupts were the cause.
I'm not confident about the second one, but the point is that there are all kinds of conflating variables.
> So the power of vector is that its elements are in contiguous memory
In either case, you'll end jumping all over memory for the actual items in either a std::vec or a linked list, since they are both holding pointers to the actual objects (and based on how the nodes are being created, the linked list containers are closer in memory to the nodes).
That said, my premise was not that linked lists were faster to iterate over, it's that they offer a consistent addition/removal time, which is frequently better for a game engine than access with huge deviations.
There are many games out there which boast 100+ FPS on my gaming computer. However, they still look like crap because they will have intermittent drops to 1 or even .5 fps. I (and many others) would prefer a constant 50 (or 30) FPS than a stuttering 140.
Real time OSes offer the same commitments: constant gaps between program resumes, even if that constant delay is large.
Okay, there's some kind of failure to communicate, and the frustration level is rising. The failure is probably on my end, so I apologize.
In either case, you'll end jumping all over memory for the actual items in either a std::vec or a linked list, since they are both holding pointers to the actual objects (and based on how the nodes are being created, the linked list containers are closer in memory to the nodes).
That said, my premise was not that linked lists were faster to iterate over, it's that they offer a consistent addition/removal time, which is frequently better for a game engine than access with huge deviations.
CPU caching doesn't work like that. When you access any pointer, the CPU cache will automatically cache 256 bytes from the access address. So if your object is at address "x", everything from [x, x+256) will be put into the CPU cache.
Your Projectile class contains 6 floats, so each Projectile is 24 bytes. When you're using std::vector, all of your projectiles are stored contiguously in memory. That means when you access a projectile, you'll automatically cache (256 bytes / sizeof(Projectile) - 1) = the next 9 projectiles. So you get a bunch of performance for free, simply by using vector.
When you use list, you lose all of that caching. The reason is because list's memory allocations are all over the place. When you use list, the memory region [x, x+256) is solely that one projectile. That means your CPU will end up caching a bunch of irrelevant bytes of memory. That also means the CPU will have to cache each projectile individually as they're accessed. You're trading automatic caching of 10 projectiles for having to cache each individual projectile. Since taking advantage of CPU caching gives you 10x or 100x speedups, that's a very bad tradeoff.
Addition/removal from a container is a tiny, tiny fraction of tininess of what a modern game engine does. It's so tiny that add/removal performance is almost always irrelevant unless you're adding/removing a huge number of elements every frame, which is rare. Iteration performance is almost the only thing that matters for a game engine, because every single frame you'll be iterating over hundreds of containers. If each iteration causes your CPU cache to thrash, then you're in for a world of pain.
The takeaway is that list is an awful choice for any realtime game, and it's dangerous to give the impression that it's better. If someone were to take your tests at face value and start using list in their own game engines, a few months later they're going to be wondering why their performance is so terrible and why their profilers aren't showing the cause. As far as performance characteristics go, memory fragmentation / CPU caching matter so much that discounting them is a recipe for disaster.
There are many games out there which boast 100+ FPS on my gaming computer. However, they still look like crap because they will have intermittent drops to 1 or even .5 fps. I (and many others) would prefer a constant 50 (or 30) FPS than a stuttering 140.
Agreed! But list vs vector isn't the cause. A lot of the stuttering has to do with loading resources from disk, flushing the GPU pipeline, or other I/O intensive actions. On a modern CPU, add/removals from containers aren't the reason for the slowdown.
Either way, good luck, and keep on making cool stuff.
That's true, but in modern game engines, objects of the same type share an allocation pool. So a bunch of Projectiles should still be in contiguous memory, since they're using the same pool allocator.
At that point, vector is still a better choice because it prevents the CPU from accessing a bunch of intermediate list nodes which will pollute the CPU cache. If you use a vector, there are no intermediate objects, so less stuff winds up in the cache. There's also less memory fragmentation overall.
As long as you're using a vector implementation that grows by a factor of 2 every time it expands, then there shouldn't be any significant cost of enlarging the vector. If it's just storing pointers, it won't invoke any copy constructors. It can even be optimized to a straight-line memcpy, which is blazingly fast on a modern CPU.
> That's true, but in modern game engines, objects of the same type share an allocation pool.
I don't think pool allocation guarantees memory contiguity: what if we have a pool with two elements free, those elements being the first and last ones, and we allocate two elements and stick pointers to them in our vector?
> a vector implementation that grows by a factor of 2....shouldn't be any significant cost
It depends on what you mean by "cost". I agree that the enlargement cost is acceptable from an overall CPU budget sense, but each enlargement can still cause a latency spike, since the cost for enlargement is still large (especially considering the cost of taking the heap lock) for _that_ insertion. Sometimes consistency is more important than throughput.
> vector is still a better choice because it prevents the CPU from accessing a bunch of intermediate list nodes which will pollute the CPU cache
Who said anything about intermediate list nodes? The nice thing about an intrusive linked list is that once you pay for accessing an element's cache line, the rest of the accesses are nice and local. There are no intermediate nodes, and all costs are smooth and consistent. Besides: you can issue a memory prefetch instruction for node B while you process node A, further hiding latency.
While a vector is value types might be best in most situations, if you can't use that, an intrusive linked list may or may not be better than a vector of pointers.
Good points! I love tech chats like this. There are a few caveats:
A pool should be trying to make new allocations as contiguous as possible. That's accomplished by wrapping the allocations. For example, let's say we have a pool with slots A, B, C, D, E. We allocate A through D, then C is later freed. The pool shouldn't put the next allocation at C. It should be placed at E. That way, the objects are still roughly sorted by creation time: A is younger than B, which is younger than D, and so on.
(The next allocation should go in C after that, because there are no more slots. But by that time D and E might have been freed, so it's still in sorted order.)
That way, if you access any given element, it will cache the next element in memory. Due to the sorted nature of the allocations, that means even if you're iterating over them using some other container like a list or vector of pointers, accessing an element will still likely cache the next iterator's element. Not guaranteed, but likely.
About the vector enlargement: Usually the cost is nil, because as long as the vector stays under 1MB, you can simply preallocate large blocks of memory at startup for each type of vector. Therefore the cost of expansion is just a few arithmetic instructions for bookkeeping purposes, not anything expensive like a heap lock.
Good point about the intrusive list. I personally wouldn't like it because it makes the code more complicated, but as long as you're using a pool allocator, the memory characteristics of an intrusive list shouldn't be too different from the scheme I described above.
1. What was std::vector's removal algorithm like? Are you sure that .remove() wasn't shifting all elements in the vector? The "correct" removal algorithm in that case looks like this: Swap the target element to the end, and then decrease the vector's size by 1. But if I remember correctly, std::vector's remove method doesn't work like that, so of course it will have terrible worst-case performance if it was shifting all elements of the array when you remove the first one.
2. What's causing the worst-case slowdown? Are you sure you weren't simply seeing the effects of the L2 CPU cache being invalidated?
A linked list will thrash the CPU cache constantly, whereas vectors thrash the cache much less since the elements are stored in contiguous memory. For a modern game running on a modern CPU, vectors are almost always the better choice for that reason.
EDIT: I'm not sure your numbers are trustworthy. For example, there's no way it rendered 35859663 frames in 100 seconds. That would be 358,596 FPS.
Assuming your FPS was actually something like 358, not 358,596, that's still an extremely high FPS. When your FPS is so high, it becomes difficult to accurately measure the reasons for worst-case performance. A sudden spike within a single frame could be for any of a dozen reasons, such as CPU cache eviction, a GPU pipeline stall, or some other program on your computer taking CPU resources.