Skip to content

Tensorflow huggingface BERT Model is slower in ARM compared to Intel #194

@abhishek-rn

Description

@abhishek-rn

Docker Container Version/Tag : r23.07-tf-2.12.0-onednn-acl
ARM System : Graviton 3 (c7g.8xlarge)
Architecture: aarch64
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: ARM
Model: 1
Thread(s) per core: 1
Core(s) per socket: 32
Caches (sum of all):
L1d: 2 MiB (32 instances)
L1i: 2 MiB (32 instances)
L2: 32 MiB (32 instances)
L3: 32 MiB (1 instance)

Intel System: Icelake (c6i.8xlarge)
Architecture: x86_64
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
CPU family: 6
Model: 106
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 6
Virtualization features:
Hypervisor vendor: KVM
Virtualization type: full
Caches (sum of all):
L1d: 768 KiB (16 instances)
L1i: 512 KiB (16 instances)
L2: 20 MiB (16 instances)
L3: 54 MiB (1 instance)

As per this blog, ACL inference should be faster than intel systems for transformer models,

We ran Tensorflow Hugging Face BERT model for Inference (Python Code Attached as txt file here)
TF_bert_inf - Copy.txt
Below are results for inference speeds in seconds :

Env Variables Graviton Icelake
No Opts 0.2294 0.145099
TF_ENABLE_ONEDNN_OPTS=1 0.2191 0.144636
ONEDNN_DEFAULT_FPMATH_MODE=BF16 1.49034 0.145511

From the results above, we see that the performance is almost 1.8x worse for ARM cores compared to Intel ones.
The code is run on 2 cores for both the Intel and ARM systems.
Another issue is enabling FPMATH mode to BF16 degrades the performance.
From the oneDNN logs, we see that when BF16 is enabled, there are overheads while executing reorder for ARM cores

Env Variables Reorder Time (msecs)
TF_ENABLE_ONEDNN_OPTS=1 0.582031
ONEDNN_DEFAULT_FPMATH_MODE=BF16 11.1628

This is observed only for larger sized Matmul operations. Here the size was 768x768 and the reorder uses "simple:any" implementation instead of "jit:uni" in oneDNN.
Attaching oneDNN Verbose for both scenarios
Bert_TF12_issue_verbose_BF16.txt
Bert_TF12_issue_verbose_OPTS.txt

Request your views and comments on whether we need any other settings to improve the performance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions