Skip to content

Conversation

@pcc
Copy link
Contributor

@pcc pcc commented Feb 6, 2026

On systems with more than one PMU for the CPUs (e.g. Apple M series SOCs), generic hardware events are only created for an arbitrary PMU. Usually this is the big cluster's PMU, which can cause inaccuracies when the process is scheduled onto a little core. To fix this, teach PerfCounters to register generic hardware events on all CPU PMUs.

CPU PMUs are identified using the same method as perf.

@pcc
Copy link
Contributor Author

pcc commented Feb 6, 2026

cc @captain5050

@LebedevRI
Copy link
Collaborator

Does this need any kind of guarding or is this always the right thing to do?

@pcc
Copy link
Contributor Author

pcc commented Feb 7, 2026

Does this need any kind of guarding or is this always the right thing to do?

This is always correct as far as I'm aware. I've tested that it does the right thing on an M2 Ultra Mac Studio, a Pixel 10 and a regular x86 PC (where it just finds the single PMU which returns the same results as not specifying a PMU).

LebedevRI
LebedevRI previously approved these changes Feb 7, 2026
Copy link
Collaborator

@LebedevRI LebedevRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't test, but seems fine.

@LebedevRI LebedevRI requested a review from dmah42 February 7, 2026 01:44
@dmah42
Copy link
Member

dmah42 commented Feb 9, 2026

can we add google test unit tests for this please? i don't know how tricky it is to test it but i'm a little concerned having no tests at all.

@pcc
Copy link
Contributor Author

pcc commented Feb 9, 2026

can we add google test unit tests for this please? i don't know how tricky it is to test it but i'm a little concerned having no tests at all.

We can test it by verifying that counters are non-zero when the process is pinned to each CPU in the system. We can adapt the test PerfCountersTest.Read1Counter for this purpose. Actually, on my Mac Studio that test fails at head (after patching BCR to fix the build failures with --define pfm=1: bazelbuild/bazel-central-registry#7502 bazelbuild/bazel-central-registry#7503) and it also fails after this change (because it didn't expect more than one counter with the same name). I'll take care of that as well.

@pcc
Copy link
Contributor Author

pcc commented Feb 10, 2026

can we add google test unit tests for this please? i don't know how tricky it is to test it but i'm a little concerned having no tests at all.

We can test it by verifying that counters are non-zero when the process is pinned to each CPU in the system. We can adapt the test PerfCountersTest.Read1Counter for this purpose. Actually, on my Mac Studio that test fails at head (after patching BCR to fix the build failures with --define pfm=1: bazelbuild/bazel-central-registry#7502 bazelbuild/bazel-central-registry#7503) and it also fails after this change (because it didn't expect more than one counter with the same name). I'll take care of that as well.

Done

dmah42
dmah42 previously approved these changes Feb 10, 2026
@dmah42
Copy link
Member

dmah42 commented Feb 10, 2026

looks like we need to update this from head and do some clang-format fun.

On systems with more than one PMU for the CPUs (e.g. Apple M series SOCs),
generic hardware events are only created for an arbitrary PMU. Usually
this is the big cluster's PMU, which can cause inaccuracies when the
process is scheduled onto a little core. To fix this, teach PerfCounters
to register generic hardware events on all CPU PMUs.

CPU PMUs are identified using the same method as perf.
@pcc
Copy link
Contributor Author

pcc commented Feb 10, 2026

looks like we need to update this from head and do some clang-format fun.

Done

@dmah42 dmah42 merged commit 84732c8 into google:main Feb 11, 2026
82 of 84 checks passed
@dmah42
Copy link
Member

dmah42 commented Feb 11, 2026

thank you so much :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants