[linux-nvidia-6.8] arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults#335
Open
nvmochs wants to merge 1 commit intoNVIDIA:24.04_linux-nvidiafrom
Open
[linux-nvidia-6.8] arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults#335nvmochs wants to merge 1 commit intoNVIDIA:24.04_linux-nvidiafrom
nvmochs wants to merge 1 commit intoNVIDIA:24.04_linux-nvidiafrom
Conversation
contpte_ptep_set_access_flags() compared the gathered ptep_get() value
against the requested entry to detect no-ops. ptep_get() ORs AF/dirty
from all sub-PTEs in the CONT block, so a dirty sibling can make the
target appear already-dirty. When the gathered value matches entry, the
function returns 0 even though the target sub-PTE still has PTE_RDONLY
set in hardware.
For a CPU with FEAT_HAFDBS this gathered view is fine, since hardware may
set AF/dirty on any sub-PTE and CPU TLB behavior is effectively gathered
across the CONT range. But page-table walkers that evaluate each
descriptor individually (e.g. a CPU without DBM support, or an SMMU
without HTTU, or with HA/HD disabled in CD.TCR) can keep faulting on the
unchanged target sub-PTE, causing an infinite fault loop.
Gathering can therefore cause false no-ops when only a sibling has been
updated:
- write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared)
- read faults: target still lacks PTE_AF
Fix by checking each sub-PTE against the requested AF/dirty/write state
(the same bits consumed by __ptep_set_access_flags()), using raw
per-PTE values rather than the gathered ptep_get() view, before
returning no-op. Keep using the raw target PTE for the write-bit unfold
decision.
Per Arm ARM (DDI 0487) D8.7.1 ("The Contiguous bit"), any sub-PTE in a CONT
range may become the effective cached translation and software must
maintain consistent attributes across the range.
Fixes: 4602e57 ("arm64/mm: wire up PTE_CONT for user mappings")
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: stable@vger.kernel.org
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: James Houghton <jthoughton@google.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com>
Acked-by: Balbir Singh <balbirs@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(backported from commit 97c5550)
[mochs: minor context adjustment due to lack of contpte_clear_young_dirty_ptes()]
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
clsotog
approved these changes
Mar 9, 2026
Collaborator
clsotog
left a comment
There was a problem hiding this comment.
Acked-by: Carol L Soto csoto@nvidia.com
Collaborator
|
|
jamieNguyenNVIDIA
approved these changes
Mar 9, 2026
Collaborator
Author
|
PR submitted to Canonical. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
contpte_ptep_set_access_flags() compared the gathered ptep_get() value against the requested entry to detect no-ops. ptep_get() ORs AF/dirty from all sub-PTEs in the CONT block, so a dirty sibling can make the target appear already-dirty. When the gathered value matches entry, the function returns 0 even though the target sub-PTE still has PTE_RDONLY set in hardware.
For a CPU with FEAT_HAFDBS this gathered view is fine, since hardware may set AF/dirty on any sub-PTE and CPU TLB behavior is effectively gathered across the CONT range. But page-table walkers that evaluate each descriptor individually (e.g. a CPU without DBM support, or an SMMU without HTTU, or with HA/HD disabled in CD.TCR) can keep faulting on the unchanged target sub-PTE, causing an infinite fault loop.
Gathering can therefore cause false no-ops when only a sibling has been updated:
Fix by checking each sub-PTE against the requested AF/dirty/write state (the same bits consumed by __ptep_set_access_flags()), using raw per-PTE values rather than the gathered ptep_get() view, before returning no-op. Keep using the raw target PTE for the write-bit unfold decision.
Per Arm ARM (DDI 0487) D8.7.1 ("The Contiguous bit"), any sub-PTE in a CONT range may become the effective cached translation and software must maintain consistent attributes across the range.
Fixes: 4602e57 ("arm64/mm: wire up PTE_CONT for user mappings")
Cc: Ryan Roberts ryan.roberts@arm.com
Cc: Catalin Marinas catalin.marinas@arm.com
Cc: Will Deacon will@kernel.org
Cc: Jason Gunthorpe jgg@nvidia.com
Cc: John Hubbard jhubbard@nvidia.com
Cc: Zi Yan ziy@nvidia.com
Cc: Breno Leitao leitao@debian.org
Cc: stable@vger.kernel.org
Reviewed-by: Alistair Popple apopple@nvidia.com
Reviewed-by: James Houghton jthoughton@google.com
Reviewed-by: Ryan Roberts ryan.roberts@arm.com
Reviewed-by: Catalin Marinas catalin.marinas@arm.com
Tested-by: Breno Leitao leitao@debian.org
Acked-by: Balbir Singh balbirs@nvidia.com
(backported from commit 97c5550) [mochs: minor context adjustment due to lack of contpte_clear_young_dirty_ptes()]
This v7.0 fix patch is needed to resolve a hang in pageable D2H copy. See nvb#5931592.
The fix was tested using the cuda_d2h_pageable.py script attached to the bug.
LP: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/2143602