Well, yeah, but if you need some of the allocated data at the end of the operation, you'll end up copying it, which will have a runtime cost, might introduce bugs (because of pointer invalidation and the ease of forgetting to update some pointers) and then none of the two will get counted as "cost of manual memory management." C++ with its value semantics encourages unnecessary copying tremendously and it's never counted as "cost of memory allocation" or "producing garbage", on the contrary, Stroustrup says things like "C++ is my favorite GC language because it generates so little garbage to begin with."
This is not to say that it's fair to call OCaml "efficient" in the memory department based on a GC benchmark; TFA is full of examples where OCaml allocates things on the heap that TFA recommends to allocate elsewhere and shows you how to maul your code to get there.
My only point is that how well a programming system uses memory is a very hard question because (A) there are many different use cases and (B) you can't isolate "memory performance" into a few easily measurable things like time spent allocating, time spent in GC and peak memory use - there are other things like what your program has to do outside of the allocator to cope with its semantics and how the performance of code using the memory objects is affected by the layout encouraged by the allocator and these things cannot be measured in isolation from the rest of the program.
This is not to say that it's fair to call OCaml "efficient" in the memory department based on a GC benchmark; TFA is full of examples where OCaml allocates things on the heap that TFA recommends to allocate elsewhere and shows you how to maul your code to get there.
My only point is that how well a programming system uses memory is a very hard question because (A) there are many different use cases and (B) you can't isolate "memory performance" into a few easily measurable things like time spent allocating, time spent in GC and peak memory use - there are other things like what your program has to do outside of the allocator to cope with its semantics and how the performance of code using the memory objects is affected by the layout encouraged by the allocator and these things cannot be measured in isolation from the rest of the program.