Add Randomized SVDs by Intron7 · Pull Request #2999 · rapidsai/raft

Intron7 · 2026-04-09T11:19:57Z

This PR adds randomized SVDs to raft based on (Halko et al. 2009) and (Tomás et al. 2024). I also added the possibility for a very limited linear operator. This one is C++ and might be useful for sparse PCA in cuml. It's tested in the C++ layer but not exposed in Python. This PR mimics the implementation I did for rapids-singlecell

copy-pr-bot · 2026-04-09T11:20:01Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

aamijar · 2026-04-10T00:16:33Z

/ok to test 307d18e

copy-pr-bot · 2026-04-10T00:16:36Z

/ok to test 307d18e

@aamijar, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

aamijar · 2026-04-10T00:17:36Z

/ok to test e52ad48

viclafargue

Thanks! Here are some comments.

viclafargue · 2026-04-10T09:46:32Z

python/pylibraft/pylibraft/sparse/linalg/svds.pyx

+        indptr = indptr.astype(np.int32)
+        indices = indices.astype(np.int32)


Isn't there an integer overflow risk here? I think we should at least warn the user that the indices will be converted.

viclafargue · 2026-04-10T09:50:50Z

cpp/include/raft/sparse/solver/svds_config.hpp

+template <typename ValueTypeT>
+struct sparse_svd_config {
+  /** @brief Number of singular values/vectors to compute */
+  int n_components;


We should maybe set a default for n_components. A C++ user that forgets to specify a value would face an undefined behavior. Setting it to 0 would allow it to be catched immediately by parameters validation.

viclafargue · 2026-04-10T09:53:12Z

python/pylibraft/pylibraft/sparse/linalg/svds.pyx

+cdef sparse_svd_config[float] config_float
+cdef sparse_svd_config[double] config_double


Using module-level global config is not thread-safe. Please instantiate them in the function where they are needed.

viclafargue · 2026-04-10T10:02:01Z

cpp/include/raft/sparse/solver/randomized_svds.cuh

+void sparse_randomized_svd(
+  raft::resources const& handle,
+  sparse_svd_config<ValueTypeT> const& config,
+  raft::device_csr_matrix_view<ValueTypeT, int, int, NNZTypeT> A,


Suggested change

raft::device_csr_matrix_view<ValueTypeT, int, int, NNZTypeT> A,

raft::device_csr_matrix_view<const ValueTypeT, int, int, NNZTypeT> A,

The input matrix should be const.

Also for the doc, we sometimes use the paradigm @param [in], @param [inout], @param [out]. @param [in] is supposed to be usable with const data.

viclafargue · 2026-04-10T10:08:44Z

cpp/include/raft/sparse/solver/detail/randomized_svds.cuh

+               Omega.data_handle(), n, block_size),
+             Y.view());
+  }  // Omega freed here
+  cholesky_qr2(handle, Y.view());


We should maybe check the return value of cholesky_qr2 calls and emit a warning when there's a fallback to standard QR.

viclafargue · 2026-04-10T10:51:12Z

cpp/include/raft/sparse/solver/svds_types.hpp

+  int rows() const { return m_; }
+  int cols() const { return n_; }


Suggested change

int rows() const { return m_; }

int cols() const { return n_; }

int rows() const { return A_.structure_view().get_n_rows(); }

int cols() const { return A_.structure_view().get_n_cols(); }

viclafargue · 2026-04-10T10:55:21Z

cpp/include/raft/sparse/solver/svds_types.hpp

+  raft::device_csr_matrix_view<const ValueTypeT, int, int, NNZTypeT> A_;
+  int m_;
+  int n_;


A_ already stores the dimensions of the matrix, storing m_ and n_ here is redundant, adds complexity and introduces the possibility of bugs. The number of rows and cols can be derived directly from A_. Also A_ should probably be a private member.

viclafargue · 2026-04-10T10:58:18Z

cpp/include/raft/sparse/solver/randomized_svds.cuh

+void sparse_randomized_svd(
+  raft::resources const& handle,
+  sparse_svd_config<ValueTypeT> const& config,
+  OperatorT const& op,


Even though the csr_linear_operator struct is certainly useful, the user should not ever need to construct this exotic type. The public function should expose an interface with a raft::device_csr_matrix_view and the csr_linear_operator utility should be built inside of the function.

viclafargue · 2026-04-10T11:18:07Z

cpp/include/raft/sparse/solver/detail/cholesky_qr.cuh

+    rmm::device_uvector<ValueTypeT> Q_copy(m * k, stream);
+    raft::copy(Q_copy.data(), Q.data_handle(), m * k, stream);
+    raft::linalg::qrGetQ(handle, Q_copy.data(), Q.data_handle(), m, k, stream);


Suggested change

rmm::device_uvector<ValueTypeT> Q_copy(m * k, stream);

raft::copy(Q_copy.data(), Q.data_handle(), m * k, stream);

raft::linalg::qrGetQ(handle, Q_copy.data(), Q.data_handle(), m, k, stream);

raft::linalg::qrGetQ(handle, Q.data_handle(), Q.data_handle(), m, k, stream);

qrGetQ already does a copy internally. Using Q.data_handle() twice would allow the operation to work inplace (even the copy could be avoided as src==dst). To double check though.

Additionally, m * k has an integer overflow risk.

viclafargue · 2026-04-10T11:18:18Z

cpp/include/raft/sparse/solver/detail/cholesky_qr.cuh

+    rmm::device_uvector<ValueTypeT> Q_copy(m * k, stream);
+    raft::copy(Q_copy.data(), Q.data_handle(), m * k, stream);
+    raft::linalg::qrGetQ(handle, Q_copy.data(), Q.data_handle(), m, k, stream);


Similar comment here.

Intron7 added 2 commits April 9, 2026 12:33

add randomized svds

d20ea68

add more tests

307d18e

Intron7 requested review from a team as code owners April 9, 2026 11:19

github-project-automation bot added this to Unstructured Data Processing Apr 9, 2026

Merge branch 'main' into randomized_svds

e52ad48

aamijar assigned Intron7 Apr 10, 2026

aamijar added non-breaking Non-breaking change feature request New feature or request labels Apr 10, 2026

aamijar changed the title ~~add randomized svds~~ Add randomized svds Apr 10, 2026

aamijar changed the title ~~Add randomized svds~~ Add Randomized SVDs Apr 10, 2026

viclafargue reviewed Apr 10, 2026

View reviewed changes

adress issues

8a08ffc

		indptr = indptr.astype(np.int32)
		indices = indices.astype(np.int32)

		cdef sparse_svd_config[float] config_float
		cdef sparse_svd_config[double] config_double

	raft::device_csr_matrix_view<ValueTypeT, int, int, NNZTypeT> A,
	raft::device_csr_matrix_view<const ValueTypeT, int, int, NNZTypeT> A,

		int rows() const { return m_; }
		int cols() const { return n_; }

Conversation

Intron7 commented Apr 9, 2026

Uh oh!

copy-pr-bot bot commented Apr 9, 2026

Uh oh!

aamijar commented Apr 10, 2026

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

aamijar commented Apr 10, 2026

Uh oh!

viclafargue left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants