Skip to content

Conversation

qmfrederik
Copy link
Collaborator

Under some circumstances, apparently when two classes with the same name exist in two different modules, the dtable for a class can be null. This would cause a crash in msgSend.

Update msgSend to return nil instead.

Under some circumstances, apparently when two classes with the same name exist in two different modules, the dtable for a class can be null.  This would cause a crash in `msgSend`.

Update `msgSend` to return `nil` instead.
@qmfrederik
Copy link
Collaborator Author

Here's an example of GNUstep including the GSColorSliderCell class in two separate modules/bundles: https://github.com/gnustep/libs-gui/blob/gui-0_31_1/ColorPickers/GNUmakefile#L47-L49. Loading a NSColorPanel would cause a crash without this change when using libobjc2+clang. Seems to work fine when using GCC.

@qmfrederik
Copy link
Collaborator Author

You should be able to reproduce this with the GSTest app in the gnustep/tests-examples repository; without this change, you'll get a segfault when clicking Format, Font, Colors....

@hmelder
Copy link
Member

hmelder commented Nov 18, 2024

apparently when two classes with the same name exist in two different modules

This is probably undefined behaviour. AFAIK there will be two copies of _OBJC_METACLASS_<NAME> and _OBJC_CLASS_<NAME> that are essential orthogonal to each other, despite sharing a common super class.

the dtable for a class can be null

The dtable is nil, because the class was not loaded:

libobjc2/class_table.c

Lines 410 to 425 in 771f7c4

PRIVATE void objc_load_class(struct objc_class *class)
{
struct objc_class *existingClass = class_table_get_safe(class->name);
if (Nil != existingClass)
{
if (objc_developer_mode_developer != mode)
{
fprintf(stderr,
"Loading two versions of %s. The class that will be used is undefined\n",
class->name);
return;
}
reload_class(class, existingClass);
return;
}

It also seems like gnustep-base calls various private functions from libobjc2 which we should probably fix.

@qmfrederik
Copy link
Collaborator Author

qmfrederik commented Nov 19, 2024

I agree on this being undefined behavior. The question is, what should the runtime do? In particular in this case, where the same class exists in two modules, but it is the exact same class (same source code file being compiled into two modules).

Currently, it causes an access violation in a macro in assembly code. That's probably the worst place to error out, as it makes it very difficult to troubleshoot.

This code has been in gnustep-gui for 17 years (gnustep/libs-gui@f22c862) and has been working for the past 17 years with GCC's objc runtime.

So perhaps it's worth for libobjc2 to (try to) handle this gracefully? I may be addressing a symptom here and there may be a better place to handle this, though?

@hmelder
Copy link
Member

hmelder commented Nov 19, 2024

has been working for the past 17 years

Because it depends on a quirk in the GCC runtime. I am really not sure if we should really handle this at all (or at least in this way) because dtable being NULL is just one side-effect. How about just fixing the code instead of working around in the runtime?

Additionally, we are inserting a potential branch into the fast path for something that is UB. We would need to adjust our assumption on what is passed into objc_msgSend.

@davidchisnall
Copy link
Member

I don’t see a way of fixing this without an additional branch in the message send function. This is the hot path for most Objective-C programs and so every extra instruction there can have a measurable impact on performance. A branch that’s always not-taken might be fine, but it’s not great.

I believe the Apple runtime gives a linker failure in these cases.

@qmfrederik
Copy link
Collaborator Author

I opened gnustep/libs-gui#321 to address it on the libs-gui side.

I'd be curious to understand the performance impact of this. Are there any (micro)benchmarks I could use to quantify the impact? If not, a framework you'd recommend?

@hmelder
Copy link
Member

hmelder commented Nov 19, 2024

Are there any (micro)benchmarks I could use to quantify the impact? If not, a framework you'd recommend?

Google Test is a good start. For scheduling analysis, llvm-mca is great but heavily depends on the quality of the CPU's Scheduling Model and does not simulate caching.

@davidchisnall
Copy link
Member

We don’t have good performance testing, but using sample-based profilers you can see how much time is spent in objc_msgSend in a codebase you care about.

@qmfrederik
Copy link
Collaborator Author

I used Google's performance testing famework, here is are the results (on a Windows VM):

branch mean stddev
master 4.12ns 0.073ns
msgSend-no-dtable 4.41ns 0.175ns

So looks like this has a 7% performance impact (which was more than what I had expected). This has also been fixed upstream: gnustep/libs-gui#321 .

So I think we're good to close this, but would appreciate any feedback on the performance benchmark.

@triplef
Copy link
Member

triplef commented Nov 21, 2024

Side note: can we in general integrate that benchmark into CI to ensure future changes don’t impact performance in a negative way? For our app a 7% performance impact could have been fairly significant.

@qmfrederik
Copy link
Collaborator Author

Benchmark signals on CI machines may be a bit noisy, but what I've done in other projects is to add two steps to a CI pipeline:

  • Build and run the benchmark on the current branch
  • Build and run the benchmark on the main branch

That at least gives some kind of signal w.r.t. any changes in the performance characteristics being introduced. I can open a separate PR for that, if there's consensus that the benchmark is useful (and everyone is happy with using Google's benchmark framework)

@rfm
Copy link
Contributor

rfm commented Nov 21, 2024

For what it's worth, I think benchmarking is very valuable as long as the margins aren't set so tight that we get lots of false positives due to other activity on the machine.

@qmfrederik
Copy link
Collaborator Author

Closing this for now. See #319 for the benchmarks

@qmfrederik qmfrederik closed this Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants