Skip to content

Can the decode of transformer be accelerated by NPU #2612

@119458

Description

@119458

🌱 Describe Feature Request

I trained a Transformer model. When I converted it as a whole into an mlmodel, I found that its intelligence could only be processed on the cpu. After splitting it into encode and decode, I discovered that encode could be normally accelerated using the NPU, but decode could only be processed on the cpu. Is it because decode is self-decoding, not a static issue? If decode can be accelerated by NPU, could a method for converting pt to mlmodel or mlpackage be provided
thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestFunctionality does not currently exist, would need to be created as a new feature (type)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions