Skip to content

CUDA speedup on smaller tensors? #3

@fiftysevendegreesofrad

Description

@fiftysevendegreesofrad

Hi, this module is great :-)

I'm wondering however if there are any options on the table for reducing CUDA fixed overheads and hence getting a speedup on smaller tensors? e.g. modifying perf.py to interpolate fewer points
X, Y = np.meshgrid(np.arange(-.5, 2.5, .1), np.arange(-.5, 2.5, .01))

I'm getting

Interpolating 9000 points on 300 by 300 grid
PyTorch took 1.319 +\- 0.235 ms
PyTorch Cuda took 1.322 +\- 0.869 ms
Scipy took 0.803 +\- 0.052 ms

Do you think there is some way to combine CUDA kernals to get the 20x speed boost on a tensor this size?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions