**Is your feature request related to a problem? Please describe.** gpu is good performence, but some accuracy is import,may simd is a good choise