Skip to content

Conversation

@azhai219
Copy link
Contributor

@azhai219 azhai219 commented Sep 12, 2025

Details:

  • 3-D matmul => FC => dnnl::matmul.
    • In this case, only 3-D with const weight will be converted to FC. Otherwise, it fallback to matmul node.
    • This change is mainly for gpt-oss model with 3-D matmul.
  • Only x64 arch is affected. Both arm and risc-v remain previous behaviour.

Tickets:

  • 171918

@azhai219 azhai219 requested review from a team as code owners September 12, 2025 07:38
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Sep 12, 2025
@azhai219 azhai219 force-pushed the az/matmul_fp2 branch 2 times, most recently from aabe78c to e9622d8 Compare September 15, 2025 08:53
@azhai219 azhai219 force-pushed the az/matmul_fp2 branch 8 times, most recently from a732ef7 to c630d12 Compare September 23, 2025 07:09
@maxnick maxnick self-assigned this Sep 23, 2025
@azhai219 azhai219 force-pushed the az/matmul_fp2 branch 8 times, most recently from a2fa5a5 to b33342e Compare September 28, 2025 08:58
@maxnick maxnick added this to the 2025.4 milestone Oct 13, 2025
@maxnick
Copy link
Contributor

maxnick commented Oct 13, 2025

@EgorDuplensky , could you please review this PR?

@EgorDuplensky
Copy link
Contributor

EgorDuplensky commented Oct 13, 2025

@azhai219 Does the PR also fixes usage of dnnl::matmul as FullyConnected node executor for compressed weights?
If yes, could you please double check by running 'matmul_weights_decompression.cpp' test scope using 'sde -gnr' for example.

@azhai219 azhai219 force-pushed the az/matmul_fp2 branch 3 times, most recently from 5946198 to d5505d9 Compare October 22, 2025 11:52
auto wDims = weiDesc.get_dims();
std::swap(wDims[wDims.size() - 1], wDims[wDims.size() - 2]);
const auto wDataType = weiDesc.get_data_type();
if (wDims.size() == 3 && !weightsNonTransposed) {
Copy link
Contributor

@EgorDuplensky EgorDuplensky Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify this change?
Does it mean we never want to reshape down to rank 2 anymore (the logic below this if)?
If yes, can we simplify the code by adding a lambda:

     auto weightsFormat = [](const size_t rank, const bool weightsNonTransposed) {
         switch (rank) {
             case 2:
                 return weightsNonTransposed ? dnnl::memory::format_tag::ab : dnnl::memory::format_tag::ba;
             case 3:
                 return weightsNonTransposed ? dnnl::memory::format_tag::abc : dnnl::memory::format_tag::bac;
             default:
                 OPENVINO_THROW("DnnlMatmulExecutor: unsupported weights rank: ", rank);
         }
     };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorDuplensky In 3d case, when weightsNonTransposed is false, the tag should be acb not bac.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, acb
So, can we implement it this way?

return;
}
const auto rank = shape.getRank();
if (rank == 3 && shape.getDims()[2] > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use a named variable for this combination of checks. The name should help to explain why we disable tensor parallel in this case (like is3DwithMultipleInputChannels or maybe there is a better name if we are talking about transformers).
Also better to use shape.getDims().back() in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use a named variable for this combination of checks. The name should help to explain why we disable tensor parallel in this case (like is3DwithMultipleInputChannels or maybe there is a better name if we are talking about transformers). Also better to use shape.getDims().back() in this case.

OK. fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the variable?
const bool is3DwithMultipleInputChannels = rank == 3 && shape.getDims()[2] > 1;
if (is3DwithMultipleInputChannels) ...

@yuxu42
Copy link
Contributor

yuxu42 commented Oct 27, 2025

Hi @EgorDuplensky @praasz could you please review again? thanks!

Copy link
Contributor

@praasz praasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azhai219
It is OK for me, wait for CPU team member approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPU OpenVINO CPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants