CUDA speedup on smaller tensors?

Hi, this module is great :-)

I'm wondering however if there are any options on the table for reducing CUDA fixed overheads and hence getting a speedup on smaller tensors? e.g. modifying perf.py to interpolate fewer points
`X, Y = np.meshgrid(np.arange(-.5, 2.5, .1), np.arange(-.5, 2.5, .01))`

I'm getting
```
Interpolating 9000 points on 300 by 300 grid
PyTorch took 1.319 +\- 0.235 ms
PyTorch Cuda took 1.322 +\- 0.869 ms
Scipy took 0.803 +\- 0.052 ms
```

Do you think there is some way to combine CUDA kernals to get the 20x speed boost on a tensor this size?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA speedup on smaller tensors? #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

CUDA speedup on smaller tensors? #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions