Thanks for the feedback!
Umm. Funny thing. Remember way back, when we found out that the clever pattern to avoid null checks before function calls
Code: Select all
class X
{
void f()
{
if(!this) return;
// do some actual work
}
}
was undefined behavior, the compiler was allowed to assume 'this' was never null and the check could turn into a NOP? And when gcc actually decided to do just that, some bits of network code broke and crashed? (We were not the only ones hit, by the way).
Anyway. Something along those lines happened again. This time, it was the humble loop
Code: Select all
while ( sn_FirstServer )
{
nServerInfo * server = sn_FirstServer;
server->Remove();
server->Insert( sn_masterList );
}
when loading the master server list. sn_FirstServer is the first pointer to the list of just loaded nServerInfo structures, and server->Remove() would remove itself from that list, modifying sn_FirstServer and setting it to the next. Only it does so not via its real type, 'nServerInfo *', but as a pointer to the list base class 'tListItemBase *'. These two pointer types are unrelated and the compiler is allowed to assume they can never occupy the same space. And thus it is allowed to assume the tListItemBase * manipulations in server->Remove() cannot possibly modify sn_FirstServer. It then can remove both the repeated check in the while loop and re-fetching it into the server variable, turning the whole thing into an infinite loop. Even though infinite loops are also undefined behavior.
Now, what protected us from that happening was that server->Remove() was implemented in a separate .cpp file, where the compiler could not see what it was doing, and therefore not draw any conclusions.
That protection broke down. The Fedora build apparently applies some whole program optimization and manages to inline the call, setting the whole thing into motion. Therefore,
this.
Anyway, the fix was straightforward, just get rid of reinterpret_cast shenanigans in tLinkedList.h, get rid of the non-template base class. I only pushed it to the beta and legacy_0.2.9 branches, the release_0.2.9 branch will follow shortly.
This is mostly a source problem. Our own builds were not affected because we use very old compilers.