The main advantage of the approach taken here is that it's friendly to sparse matrices and GPU arrays. We need some tests for those.