-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[Flang][OpenMP] Fix Fortran automap handling #162501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Move automapped Fortran descriptor mappings from map() to has_device_addr in target regions, updating block args and uses accordingly. - In libomptarget, detect CFI descriptors during pointer attachment and compute the correct descriptor size to transfer full metadata (including bounds). - Resolves lost bounds for automapped Fortran arrays on device; no change for C/C++. This fixes test offload/test/offloading/fortran/declare-target-automap.f90 reported broken in llvm#161265.
@llvm/pr-subscribers-flang-fir-hlfir @llvm/pr-subscribers-offload Author: Akash Banerjee (TIFitis) Changes
This fixes the Full diff: https://github.com/llvm/llvm-project/pull/162501.diff 3 Files Affected:
diff --git a/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp b/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp
index 8b9991301aae8..2edbef151ea4a 100644
--- a/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp
+++ b/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp
@@ -18,9 +18,11 @@
#include "mlir/Dialect/OpenMP/OpenMPInterfaces.h"
#include "mlir/IR/BuiltinAttributes.h"
#include "mlir/IR/Operation.h"
+#include "mlir/IR/SymbolTable.h"
#include "mlir/Pass/Pass.h"
#include "llvm/Frontend/OpenMP/OMPConstants.h"
+#include <algorithm>
namespace flangomp {
#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS
@@ -154,6 +156,104 @@ class AutomapToTargetDataPass
addMapInfo(globalOp, loadOp);
}
}
+
+ // Move automapped descriptors from map() to has_device_addr in target ops.
+ auto originatesFromAutomapGlobal = [&](mlir::Value varPtr) -> bool {
+ if (auto decl = mlir::dyn_cast_or_null<hlfir::DeclareOp>(
+ varPtr.getDefiningOp())) {
+ if (auto addrOp = mlir::dyn_cast_or_null<fir::AddrOfOp>(
+ decl.getMemref().getDefiningOp())) {
+ if (auto g =
+ mlir::SymbolTable::lookupNearestSymbolFrom<fir::GlobalOp>(
+ decl, addrOp.getSymbol()))
+ return automapGlobals.contains(g);
+ }
+ }
+ return false;
+ };
+
+ module.walk([&](mlir::omp::TargetOp target) {
+ // Collect candidates to move: descriptor maps of automapped globals.
+ llvm::SmallVector<mlir::Value> newMapOps;
+ llvm::SmallVector<unsigned> removedIndices;
+ llvm::SmallVector<mlir::Value> movedToHDA;
+ llvm::SmallVector<mlir::BlockArgument> oldMapArgsForMoved;
+
+ auto mapRange = target.getMapVars();
+ newMapOps.reserve(mapRange.size());
+
+ auto argIface = llvm::dyn_cast<mlir::omp::BlockArgOpenMPOpInterface>(
+ target.getOperation());
+ llvm::ArrayRef<mlir::BlockArgument> mapBlockArgs =
+ argIface.getMapBlockArgs();
+
+ for (auto [idx, mapVal] : llvm::enumerate(mapRange)) {
+ auto mapOp =
+ mlir::dyn_cast<mlir::omp::MapInfoOp>(mapVal.getDefiningOp());
+ if (!mapOp) {
+ newMapOps.push_back(mapVal);
+ continue;
+ }
+
+ mlir::Type varTy = fir::unwrapRefType(mapOp.getVarType());
+ bool isDescriptor = mlir::isa<fir::BaseBoxType>(varTy);
+ if (isDescriptor && originatesFromAutomapGlobal(mapOp.getVarPtr())) {
+ movedToHDA.push_back(mapVal);
+ removedIndices.push_back(idx);
+ oldMapArgsForMoved.push_back(mapBlockArgs[idx]);
+ } else {
+ newMapOps.push_back(mapVal);
+ }
+ }
+
+ if (movedToHDA.empty())
+ return;
+
+ // Update map vars to exclude moved entries.
+ mlir::MutableOperandRange mapMutable = target.getMapVarsMutable();
+ mapMutable.assign(newMapOps);
+
+ // Append moved entries to has_device_addr and insert corresponding block
+ // arguments.
+ mlir::MutableOperandRange hdaMutable =
+ target.getHasDeviceAddrVarsMutable();
+ llvm::SmallVector<mlir::Value> newHDA;
+ newHDA.reserve(hdaMutable.size() + movedToHDA.size());
+ llvm::for_each(hdaMutable.getAsOperandRange(),
+ [&](mlir::Value v) { newHDA.push_back(v); });
+
+ unsigned hdaStart = argIface.getHasDeviceAddrBlockArgsStart();
+ unsigned oldHdaCount = argIface.numHasDeviceAddrBlockArgs();
+ llvm::SmallVector<mlir::BlockArgument> newHDAArgsForMoved;
+ unsigned insertIndex = hdaStart + oldHdaCount;
+ for (mlir::Value v : movedToHDA) {
+ newHDA.push_back(v);
+ target->getRegion(0).insertArgument(insertIndex, v.getType(),
+ v.getLoc());
+ // Capture the newly inserted block argument.
+ newHDAArgsForMoved.push_back(
+ target->getRegion(0).getArgument(insertIndex));
+ insertIndex++;
+ }
+ hdaMutable.assign(newHDA);
+
+ // Redirect uses in the region: replace old map block args with the
+ // corresponding new has_device_addr block args.
+ for (auto [oldArg, newArg] :
+ llvm::zip_equal(oldMapArgsForMoved, newHDAArgsForMoved))
+ oldArg.replaceAllUsesWith(newArg);
+
+ // Finally, erase corresponding map block arguments (descending order).
+ unsigned mapStart = argIface.getMapBlockArgsStart();
+ // Convert indices to absolute argument numbers before erasing.
+ llvm::SmallVector<unsigned> absArgNos;
+ absArgNos.reserve(removedIndices.size());
+ for (unsigned idx : removedIndices)
+ absArgNos.push_back(mapStart + idx);
+ std::sort(absArgNos.begin(), absArgNos.end(), std::greater<>());
+ for (unsigned absNo : absArgNos)
+ target->getRegion(0).eraseArgument(absNo);
+ });
}
};
} // namespace
diff --git a/offload/libomptarget/omptarget.cpp b/offload/libomptarget/omptarget.cpp
index a1950cbb62908..02c83b4a51078 100644
--- a/offload/libomptarget/omptarget.cpp
+++ b/offload/libomptarget/omptarget.cpp
@@ -33,6 +33,9 @@
#include "llvm/Frontend/OpenMP/OMPConstants.h"
#include "llvm/Object/ObjectFile.h"
+#include "flang/ISO_Fortran_binding.h"
+
+#include <algorithm>
#include <cassert>
#include <cstdint>
#include <vector>
@@ -378,6 +381,34 @@ static void *calculateTargetPointeeBase(void *HstPteeBase, void *HstPteeBegin,
return TgtPteeBase;
}
+// Fortran pointer attachments treated descriptors as plain pointers, so
+// automapped arrays lose their declared bounds on the device. Recognize
+// CFI descriptors to compute their actual size before copying, ensuring the
+// full descriptor (including bounds) is transferred during attachment.
+static int64_t getFortranDescriptorSize(void **HstPtrAddr,
+ int64_t ReportedSize) {
+ constexpr int64_t VoidPtrSize = sizeof(void *);
+
+ if (!HstPtrAddr || ReportedSize > VoidPtrSize)
+ return ReportedSize;
+
+ const CFI_cdesc_t *Desc = reinterpret_cast<const CFI_cdesc_t *>(HstPtrAddr);
+
+ if (Desc->version != CFI_VERSION)
+ return ReportedSize;
+
+ if (Desc->rank > CFI_MAX_RANK)
+ return ReportedSize;
+
+ const char *RawDesc = reinterpret_cast<const char *>(Desc);
+ const char *DimsAddr = reinterpret_cast<const char *>(&Desc->dim);
+ size_t HeaderBytes = static_cast<size_t>(DimsAddr - RawDesc);
+ size_t DimsBytes = static_cast<size_t>(Desc->rank) * sizeof(CFI_dim_t);
+ size_t TotalBytes = HeaderBytes + DimsBytes;
+
+ return std::max<int64_t>(ReportedSize, static_cast<int64_t>(TotalBytes));
+}
+
/// Utility function to perform a pointer attachment operation.
///
/// For something like:
@@ -445,6 +476,7 @@ static int performPointerAttachment(DeviceTy &Device, AsyncInfoTy &AsyncInfo,
"Need a valid pointer entry to perform pointer-attachment");
constexpr int64_t VoidPtrSize = sizeof(void *);
+ HstPtrSize = getFortranDescriptorSize(HstPtrAddr, HstPtrSize);
assert(HstPtrSize >= VoidPtrSize && "PointerSize is too small");
void *TgtPteeBase =
diff --git a/offload/test/offloading/fortran/declare-target-automap.f90 b/offload/test/offloading/fortran/declare-target-automap.f90
index b44c0b2815274..b9c2d34c834fa 100644
--- a/offload/test/offloading/fortran/declare-target-automap.f90
+++ b/offload/test/offloading/fortran/declare-target-automap.f90
@@ -1,9 +1,6 @@
!Offloading test for AUTOMAP modifier in declare target enter
! REQUIRES: flang, amdgpu
-! FIXME: https://github.com/llvm/llvm-project/issues/161265
-! XFAIL: amdgpu
-
! RUN: %libomptarget-compile-fortran-run-and-check-generic
program automap_program
use iso_c_binding, only: c_loc
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes Fortran automap handling in OpenMP target regions by addressing lost bounds information for automapped Fortran arrays on the device. The changes ensure that Fortran descriptor metadata, including array bounds, is properly transferred during pointer attachment operations.
Key changes:
- Move automapped Fortran descriptors from
map()
tohas_device_addr
in target regions - Add CFI descriptor detection in libomptarget to compute correct descriptor sizes
- Remove test failure expectations for the now-fixed functionality
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
offload/test/offloading/fortran/declare-target-automap.f90 |
Removes XFAIL directive as the test now passes |
offload/libomptarget/omptarget.cpp |
Adds CFI descriptor size calculation for proper Fortran metadata transfer |
flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp |
Implements logic to move automapped descriptors from map to has_device_addr |
Ping for review. |
I'm happy with the compiler changes, but I think we should drop the runtime changes and keep the test XFAIL'd for the moment. The runtime fix is trying to address a problem that will be resolved without a runtime change when swapping to attach mapping, which will happen in the near future if I can land the PR stack I have up finally and can advance it forward. I've done a bit of a deep dive into this and the runtime fix appears to be trying to address two things, the main thing is that we don't actually support declare target to upstream fully until the aforementioned PR stack lands, so what we're actually doing in this test currently is still using a kernel input argument instead of the actual declare target global that's generated by the compiler and should be used in it's place (PR stack addresses this and a few dozen other things). Currently we map allocatables as a form of record type/C struct, where we have the descriptor as a parent, and the base address as a pointer map member that points to some data, like the below: omptarget --> Entry 0: Base=0x00005e1e0039c0a0, Begin=0x00005e1e0039c0a0, Size=48, Type=0x0, Name=/home/agozillo/git/aomp22.0/llvm-project/offload/test/offloading/fortran/declare-target-automap.f90 The 8-bytes is what the OMP_PTR_AND_OBJ flag gets applied to and eventually runs through the runtime changes we add in this PR. The change is effectively tricking the runtime into thinking the 8-byte mapping should map the same bits as Entry 1. Later the has_device_addr map we give to the compiler has the same address as entry 2's begin and entry 3's base address, which is likely confusing the runtime a little. It likely should have the same address as entry 0/1 (but this alone isn't going to fix the issue I think). But I believe the main issue is we've made an example that depends on declare target to/enter, which is only partially working upstream at the moment and we're trying to fix an issue that can't be addressed correctly/fully yet. So my thoughts are we keep the test xfailed for a bit longer, and land the compiler changes and wait a bit for the rest of the support to land to revisit this (can revisit when the PR stack lands to see if it'll work with just that, but I've verified that this will work downstream with the compiler changes in this PR and the attach map work) :-) |
@agozillon Thanks for the review and explanation. In that case I'll hold off on landing this PR and keep an eye out for when your changes land. Once landed, I'll rebase this PR and verify the test works before landing it. Thanks. |
Thank you very much for the understanding and great idea @TIFitis I'll drop a message in here when it's landed, it unfortunately might be a while as the review is a little held up at the moment unfortunately and then a subsequent PR will need to land as well. Was very cool to see the runtime addition for checking the descriptors! |
I agree, please avoid the runtime changes. libomptarget should not need to internally detect descriptor shapes/sizes.
Once Andrew's flang changes to use the ATTACH map-type in the codegen are upstreamed, then
Which would imply mapping the pointee data, and doing an attachment to a descriptor that's already present on the device using the ATTACH map-type bit. The ATTACH map-type already accepts the size of the descriptor as the size argument (the assumption it makes is that the pointee address is in the first field of the descriptor). That way, the descriptor for the declare-target version of It is not correct to use |
I second Abhinav's assessment. I believe |
This fixes the
offload/test/offloading/fortran/declare-target-automap.f90
test reported broken in #161265.