Skip to content

Transfer torch::Tensor from cuda to cpu slow #32

@datlt4

Description

@datlt4

When I test your repo i found that in line 72 of nn_matching.h have a trouble.
when you call nn_cosine_distance if for loop (line 55), when i==0, time taken by .cpu() was >22000 micro second, but with other index, it took only 10-20 microseconds. If there is the way decrease that one, performance will increase dramatically.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions