Skip to content

Conversation

@WeiqunZhang
Copy link
Member

When redistributing particles, we need to resize a number of vectors. This sometimes causes out-of-memory issues for GPU runs, because the vector resize may fragment the memory arena. To address the issue, we try to minimize the number of memory reallocation calls by reserving space before we unpack local and remote communication buffers. Previously, resize may cause memory reallocation in both unpacking operations.

We have also collected all the vectors that need to be resized into one place. This will allows to explore different strategies for how to resize a number of vectors.

When redistributing particles, we need to resize a number of vectors. This
sometimes causes out-of-memory issues for GPU runs, because the vector
resize may fragment the memory arena. To address the issue, we try to
minimize the number of memory reallocation calls by reserving space before
we unpack local and remote communication buffers. Previously, resize may
cause memory reallocation in both unpacking operations.

We have also collected all the vectors that need to be resized into one
place. This will allows to explore different strategies for how to resize a
number of vectors.
}
}

// xxxxx TODO: Can we come up a better strategy?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can try to shrink the size first if a vector has say 2x capacity than needed. Then we can order the vectors by their future capacity, and reserve big vectors first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a reserve fails, maybe we should try to defragment these vectors that are under our control. We can order them by their pointer address, and then try to move the vectors to lower addresses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant