Note that this implementation has undefined behavior according to the standard b...

wahern · on Jan 31, 2020

I'm not very familiar with the C++ standard, but in the C standard a load from a different union member than the last store isn't undefined behavior, it's unspecified behavior. That's a big difference. So long as the store doesn't generate a trap representation (and most architectures don't have such representations anymore) when reinterpreted as the new type, then you can rely on the behavior, it's just that you'll need to refer to compiler documentation to understand the values reliably produced.

Moreover, there are additional guarantees when reading through char types, as well as unsigned types, so depending on the precise code things may very well be totally well defined.

C doesn't disallow type punning. The gotchas mostly have to do with visibility to the compiler, otherwise the compiler may reorder loads and stores. The safest way to do type punning is through union members as the standard makes additional guarantees that restrict the types of optimizations a compiler can make.

slavik81 · on Jan 31, 2020

Type punning through unions is not defined behaviour in the C++ standard, but is a common compiler extension.

kps · on Jan 31, 2020

Early C used type punning instead of casts; e.g. from the Sixth Edition kernel:

    /*
     * structure to access an integer
     */
     struct
     {
       int  integ;
     };

w0utert · on Jan 31, 2020

>> Note that this implementation has undefined behavior according to the standard because union members are accessed that haven't been written to.

Except when the union member you are reading is a 'common initial sequence' [1] of the union member that was written, to be precise ;-). But that's not the case here.

[1] https://stackoverflow.com/q/34616086/2757035

zvrba · on Jan 31, 2020

The implementation is allowed to define UB to have a defined behaviour.

Inityx · on Jan 31, 2020

> standard library is allowed to do stuff normal libraries are not allowed to do.

How does this work, given that it's still written in C++? Is there special casing in the compiler to define the behavior?

jlarocco · on Jan 31, 2020

Undefined behavior in C++ and C means that it's up to the implementor to decide what happens in a situation. Usually it ends up being whatever is most convenient or most performant on the target architecture.

One consequence is that portable code can't depend on any particular behavior because it can be different between implementations. Every implementation will do something, but each one may do something completely different.

Another consequence is that "undefined behavior" isn't undefined in the context of a specific implementation because you can look at what it does and see how it's defined in that implementation. In this case, libc++ is essentially part of the implementation, so it's fair game for it to depend on implementation details of clang.

MaxBarraclough · on Jan 31, 2020

> Undefined behavior in C++ and C means that it's up to the implementor to decide what happens in a situation

> Every implementation will do something, but each one may do something completely different.

If I'm reading this correctly, you're saying that we can depend on each compiler providing a consistent way of handling each kind of undefined behaviour.

That's not correct. That describes implementation-defined behaviour, which is different. [0]

Compilers do not have to decide what behaviour should result from a particular kind of undefined behaviour, and then commit to ensuring that behaviour occurs consistently. That's the point of undefined behaviour: the compiler is permitted to assume the absence of undefined behaviour, and to optimise accordingly.

If you have undefined behaviour in your C++ code, you are not guaranteed to see consistent program behaviour. Your program is ill-formed. All bets are off, throughout the entire lifetime of your program. [1] (In C++, undefined behaviour can 'travel back in time', meaning that if your program invokes undefined behaviour, the behaviour across the entire lifetime of your program is made undefined.)

A compiler may choose to commit to a certain behaviour for a certain type of undefined behaviour (such as guaranteeing wrap-around behaviour for signed overflow), but it is not required to.

A compiler is required to define a consistent value for sizeof(int), because that's implementation-defined. [0]

> Another consequence is that "undefined behavior" isn't undefined in the context of a specific implementation because you can look at what it does and see how it's defined in that implementation.

This isn't right.

Unless the compiler's documentation tells you that you can rely upon its handling of the relevant undefined behaviour, then then compiler is not required to provide consistent behaviour for any particular kind of undefined behaviour.

> In this case, libc++ is essentially part of the implementation

It's not, as it's not tied to one compiler. [2]

[0] https://stackoverflow.com/a/4105123/

[1] https://stackoverflow.com/a/39915175/

[2] https://news.ycombinator.com/item?id=22202355

chacham15 · on Jan 31, 2020

Its about maintenance. Things which are "undefined behavior" can actually have a defined behavior if the compiler provides it. By doing something undefined in the stl, the compiler authors are saying that the compiler will support it and if the compiler changes, they authors will also change the library. If you rely on this behavior, theres no guarantee that the compiler wont change this behavior in a future update thereby breaking your code.

aidenn0 · on Jan 31, 2020

It will only ever be compiled with clang, so as long as clang doesn't implement this behavior in a way that will cause it to be incorrect, that's fine.

If it were to be special-cased, it would probably require an attribute or a pragma or something. While it's not unheard of for compilers to automatically detect they are compiling the standard library, it's fairly rare.

bregma · on Jan 31, 2020

libc++ is compiled by compilers other than clang. It's a completely invalid assumption that it will only ever be compiled by clang, because it's never only over been compiled by clang.

I say this as someone with a full time paid job supporting libc++ compiled by another compiler for a commercial organization in a safety context.

aidenn0 · on Feb 1, 2020

Oh, that's good to know. If obscure C++ compiler X were to cause incorrect behavior of this code, would it be easy to upstream the fix?

cryptonector · on Jan 31, 2020

Why is this worth noting?