-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Open
Labels
Description
Some of our OpenCL formats try to allocate pinned memory for buffers with a fallback to normal buffers if the former fails. I always thought the code looked fishy so I looked a bit closer now and it's worse than I expected:
Current code (this from opencl_rawmd4_fmt_plug.c):
pinned_saved_keys = clCreateBuffer(context[gpu_id], CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, BUFSIZE * kpc, NULL, &ret_code);
if (ret_code != CL_SUCCESS) {
saved_plain = (cl_uint *) mem_alloc(BUFSIZE * kpc);
if (saved_plain == NULL)
HANDLE_CLERROR(ret_code, "Error creating page-locked memory pinned_saved_keys.");
}
else {
saved_plain = (cl_uint *) clEnqueueMapBuffer(queue[gpu_id], pinned_saved_keys, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, BUFSIZE * kpc, 0, NULL, NULL, &ret_code);
HANDLE_CLERROR(ret_code, "Error mapping page-locked memory saved_plain.");
}
(...)
buffer_keys = clCreateBuffer(context[gpu_id], CL_MEM_READ_ONLY, BUFSIZE * kpc, NULL, &ret_code);
(...)
BENCH_CLERROR(clEnqueueWriteBuffer(queue[gpu_id], buffer_keys, CL_TRUE, 0, 4 * key_idx, saved_plain, 0, NULL, NULL), "failed in clEnqueueWriteBuffer buffer_keys.");- I emulated a failure for the pinned alloc and sure enough the fallback just makes the format crash soon after.
- The current code weirdly allocates twice the size needed: First a pinned buffer, then a normal one.
- Even worse, the current code doesn't even use the pinned buffer ever: It uses that extra, non-pinned, one. Duh!
It's not hard to make it work as intended, the question is if it's even needed at all. I think not: When an allocation using CL_MEM_ALLOC_HOST_PTR fails to get pinned memory it will simply return non-pinned instead (which is why we never saw the mentioned crash). IMHO we should just drop that fallback code and start actually using the pinned buffer.
Something like this:
device_buffer = clCreateBuffer(context[gpu_id], CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, BUFSIZE * kpc, NULL, &ret_code);
HANDLE_CLERROR(ret_code, "Error creating device buffer");
host_buffer = clEnqueueMapBuffer(queue[gpu_id], device_buffer, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, BUFSIZE * kpc, 0, NULL, NULL, &ret_code);
HANDLE_CLERROR(ret_code, "Error mapping host buffer");
(...)
BENCH_CLERROR(clEnqueueWriteBuffer(queue[gpu_id], device_buffer, CL_TRUE, 0, 4 * key_idx, host_buffer, 0, NULL, NULL), "failed in clEnqueueWriteBuffer buffer_keys.");I authored some of the cases with b0rken code myself, blindly copied from existing formats.