Conversation
|
/ok to test 307d18e |
@aamijar, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
|
/ok to test e52ad48 |
viclafargue
left a comment
There was a problem hiding this comment.
Thanks! Here are some comments.
| indptr = indptr.astype(np.int32) | ||
| indices = indices.astype(np.int32) |
There was a problem hiding this comment.
Isn't there an integer overflow risk here? I think we should at least warn the user that the indices will be converted.
| template <typename ValueTypeT> | ||
| struct sparse_svd_config { | ||
| /** @brief Number of singular values/vectors to compute */ | ||
| int n_components; |
There was a problem hiding this comment.
We should maybe set a default for n_components. A C++ user that forgets to specify a value would face an undefined behavior. Setting it to 0 would allow it to be catched immediately by parameters validation.
| cdef sparse_svd_config[float] config_float | ||
| cdef sparse_svd_config[double] config_double |
There was a problem hiding this comment.
Using module-level global config is not thread-safe. Please instantiate them in the function where they are needed.
| void sparse_randomized_svd( | ||
| raft::resources const& handle, | ||
| sparse_svd_config<ValueTypeT> const& config, | ||
| raft::device_csr_matrix_view<ValueTypeT, int, int, NNZTypeT> A, |
There was a problem hiding this comment.
| raft::device_csr_matrix_view<ValueTypeT, int, int, NNZTypeT> A, | |
| raft::device_csr_matrix_view<const ValueTypeT, int, int, NNZTypeT> A, |
The input matrix should be const.
Also for the doc, we sometimes use the paradigm @param [in], @param [inout], @param [out]. @param [in] is supposed to be usable with const data.
| Omega.data_handle(), n, block_size), | ||
| Y.view()); | ||
| } // Omega freed here | ||
| cholesky_qr2(handle, Y.view()); |
There was a problem hiding this comment.
We should maybe check the return value of cholesky_qr2 calls and emit a warning when there's a fallback to standard QR.
| int rows() const { return m_; } | ||
| int cols() const { return n_; } |
There was a problem hiding this comment.
| int rows() const { return m_; } | |
| int cols() const { return n_; } | |
| int rows() const { return A_.structure_view().get_n_rows(); } | |
| int cols() const { return A_.structure_view().get_n_cols(); } |
| raft::device_csr_matrix_view<const ValueTypeT, int, int, NNZTypeT> A_; | ||
| int m_; | ||
| int n_; |
There was a problem hiding this comment.
A_ already stores the dimensions of the matrix, storing m_ and n_ here is redundant, adds complexity and introduces the possibility of bugs. The number of rows and cols can be derived directly from A_. Also A_ should probably be a private member.
| void sparse_randomized_svd( | ||
| raft::resources const& handle, | ||
| sparse_svd_config<ValueTypeT> const& config, | ||
| OperatorT const& op, |
There was a problem hiding this comment.
Even though the csr_linear_operator struct is certainly useful, the user should not ever need to construct this exotic type. The public function should expose an interface with a raft::device_csr_matrix_view and the csr_linear_operator utility should be built inside of the function.
| rmm::device_uvector<ValueTypeT> Q_copy(m * k, stream); | ||
| raft::copy(Q_copy.data(), Q.data_handle(), m * k, stream); | ||
| raft::linalg::qrGetQ(handle, Q_copy.data(), Q.data_handle(), m, k, stream); |
There was a problem hiding this comment.
| rmm::device_uvector<ValueTypeT> Q_copy(m * k, stream); | |
| raft::copy(Q_copy.data(), Q.data_handle(), m * k, stream); | |
| raft::linalg::qrGetQ(handle, Q_copy.data(), Q.data_handle(), m, k, stream); | |
| raft::linalg::qrGetQ(handle, Q.data_handle(), Q.data_handle(), m, k, stream); |
qrGetQ already does a copy internally. Using Q.data_handle() twice would allow the operation to work inplace (even the copy could be avoided as src==dst). To double check though.
Additionally, m * k has an integer overflow risk.
| rmm::device_uvector<ValueTypeT> Q_copy(m * k, stream); | ||
| raft::copy(Q_copy.data(), Q.data_handle(), m * k, stream); | ||
| raft::linalg::qrGetQ(handle, Q_copy.data(), Q.data_handle(), m, k, stream); |
This PR adds randomized SVDs to raft based on (Halko et al. 2009) and (Tomás et al. 2024). I also added the possibility for a very limited linear operator. This one is C++ and might be useful for sparse PCA in cuml. It's tested in the C++ layer but not exposed in Python. This PR mimics the implementation I did for rapids-singlecell