Skip to content

Conversation

TIFitis
Copy link
Member

@TIFitis TIFitis commented Oct 8, 2025

  • Move automapped Fortran descriptor mappings from map() to has_device_addr in target regions, updating block args and uses accordingly.
  • In libomptarget, detect CFI descriptors during pointer attachment and compute the correct descriptor size to transfer full metadata (including bounds).
  • Resolves lost bounds for automapped Fortran arrays on device; no change for C/C++.

This fixes the offload/test/offloading/fortran/declare-target-automap.f90 test reported broken in #161265.

	- Move automapped Fortran descriptor mappings from map() to has_device_addr in target regions, updating block args and uses accordingly.
	- In libomptarget, detect CFI descriptors during pointer attachment and compute the correct descriptor size to transfer full metadata (including bounds).
	- Resolves lost bounds for automapped Fortran arrays on device; no change for C/C++.

This fixes test offload/test/offloading/fortran/declare-target-automap.f90 reported broken in llvm#161265.
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir offload labels Oct 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 8, 2025

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-offload

Author: Akash Banerjee (TIFitis)

Changes
  • Move automapped Fortran descriptor mappings from map() to has_device_addr in target regions, updating block args and uses accordingly.
  • In libomptarget, detect CFI descriptors during pointer attachment and compute the correct descriptor size to transfer full metadata (including bounds).
  • Resolves lost bounds for automapped Fortran arrays on device; no change for C/C++.

This fixes the offload/test/offloading/fortran/declare-target-automap.f90 test reported broken in #161265.


Full diff: https://github.com/llvm/llvm-project/pull/162501.diff

3 Files Affected:

  • (modified) flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp (+100)
  • (modified) offload/libomptarget/omptarget.cpp (+32)
  • (modified) offload/test/offloading/fortran/declare-target-automap.f90 (-3)
diff --git a/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp b/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp
index 8b9991301aae8..2edbef151ea4a 100644
--- a/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp
+++ b/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp
@@ -18,9 +18,11 @@
 #include "mlir/Dialect/OpenMP/OpenMPInterfaces.h"
 #include "mlir/IR/BuiltinAttributes.h"
 #include "mlir/IR/Operation.h"
+#include "mlir/IR/SymbolTable.h"
 #include "mlir/Pass/Pass.h"
 
 #include "llvm/Frontend/OpenMP/OMPConstants.h"
+#include <algorithm>
 
 namespace flangomp {
 #define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS
@@ -154,6 +156,104 @@ class AutomapToTargetDataPass
           addMapInfo(globalOp, loadOp);
       }
     }
+
+    // Move automapped descriptors from map() to has_device_addr in target ops.
+    auto originatesFromAutomapGlobal = [&](mlir::Value varPtr) -> bool {
+      if (auto decl = mlir::dyn_cast_or_null<hlfir::DeclareOp>(
+              varPtr.getDefiningOp())) {
+        if (auto addrOp = mlir::dyn_cast_or_null<fir::AddrOfOp>(
+                decl.getMemref().getDefiningOp())) {
+          if (auto g =
+                  mlir::SymbolTable::lookupNearestSymbolFrom<fir::GlobalOp>(
+                      decl, addrOp.getSymbol()))
+            return automapGlobals.contains(g);
+        }
+      }
+      return false;
+    };
+
+    module.walk([&](mlir::omp::TargetOp target) {
+      // Collect candidates to move: descriptor maps of automapped globals.
+      llvm::SmallVector<mlir::Value> newMapOps;
+      llvm::SmallVector<unsigned> removedIndices;
+      llvm::SmallVector<mlir::Value> movedToHDA;
+      llvm::SmallVector<mlir::BlockArgument> oldMapArgsForMoved;
+
+      auto mapRange = target.getMapVars();
+      newMapOps.reserve(mapRange.size());
+
+      auto argIface = llvm::dyn_cast<mlir::omp::BlockArgOpenMPOpInterface>(
+          target.getOperation());
+      llvm::ArrayRef<mlir::BlockArgument> mapBlockArgs =
+          argIface.getMapBlockArgs();
+
+      for (auto [idx, mapVal] : llvm::enumerate(mapRange)) {
+        auto mapOp =
+            mlir::dyn_cast<mlir::omp::MapInfoOp>(mapVal.getDefiningOp());
+        if (!mapOp) {
+          newMapOps.push_back(mapVal);
+          continue;
+        }
+
+        mlir::Type varTy = fir::unwrapRefType(mapOp.getVarType());
+        bool isDescriptor = mlir::isa<fir::BaseBoxType>(varTy);
+        if (isDescriptor && originatesFromAutomapGlobal(mapOp.getVarPtr())) {
+          movedToHDA.push_back(mapVal);
+          removedIndices.push_back(idx);
+          oldMapArgsForMoved.push_back(mapBlockArgs[idx]);
+        } else {
+          newMapOps.push_back(mapVal);
+        }
+      }
+
+      if (movedToHDA.empty())
+        return;
+
+      // Update map vars to exclude moved entries.
+      mlir::MutableOperandRange mapMutable = target.getMapVarsMutable();
+      mapMutable.assign(newMapOps);
+
+      // Append moved entries to has_device_addr and insert corresponding block
+      // arguments.
+      mlir::MutableOperandRange hdaMutable =
+          target.getHasDeviceAddrVarsMutable();
+      llvm::SmallVector<mlir::Value> newHDA;
+      newHDA.reserve(hdaMutable.size() + movedToHDA.size());
+      llvm::for_each(hdaMutable.getAsOperandRange(),
+                     [&](mlir::Value v) { newHDA.push_back(v); });
+
+      unsigned hdaStart = argIface.getHasDeviceAddrBlockArgsStart();
+      unsigned oldHdaCount = argIface.numHasDeviceAddrBlockArgs();
+      llvm::SmallVector<mlir::BlockArgument> newHDAArgsForMoved;
+      unsigned insertIndex = hdaStart + oldHdaCount;
+      for (mlir::Value v : movedToHDA) {
+        newHDA.push_back(v);
+        target->getRegion(0).insertArgument(insertIndex, v.getType(),
+                                            v.getLoc());
+        // Capture the newly inserted block argument.
+        newHDAArgsForMoved.push_back(
+            target->getRegion(0).getArgument(insertIndex));
+        insertIndex++;
+      }
+      hdaMutable.assign(newHDA);
+
+      // Redirect uses in the region: replace old map block args with the
+      // corresponding new has_device_addr block args.
+      for (auto [oldArg, newArg] :
+           llvm::zip_equal(oldMapArgsForMoved, newHDAArgsForMoved))
+        oldArg.replaceAllUsesWith(newArg);
+
+      // Finally, erase corresponding map block arguments (descending order).
+      unsigned mapStart = argIface.getMapBlockArgsStart();
+      // Convert indices to absolute argument numbers before erasing.
+      llvm::SmallVector<unsigned> absArgNos;
+      absArgNos.reserve(removedIndices.size());
+      for (unsigned idx : removedIndices)
+        absArgNos.push_back(mapStart + idx);
+      std::sort(absArgNos.begin(), absArgNos.end(), std::greater<>());
+      for (unsigned absNo : absArgNos)
+        target->getRegion(0).eraseArgument(absNo);
+    });
   }
 };
 } // namespace
diff --git a/offload/libomptarget/omptarget.cpp b/offload/libomptarget/omptarget.cpp
index a1950cbb62908..02c83b4a51078 100644
--- a/offload/libomptarget/omptarget.cpp
+++ b/offload/libomptarget/omptarget.cpp
@@ -33,6 +33,9 @@
 #include "llvm/Frontend/OpenMP/OMPConstants.h"
 #include "llvm/Object/ObjectFile.h"
 
+#include "flang/ISO_Fortran_binding.h"
+
+#include <algorithm>
 #include <cassert>
 #include <cstdint>
 #include <vector>
@@ -378,6 +381,34 @@ static void *calculateTargetPointeeBase(void *HstPteeBase, void *HstPteeBegin,
   return TgtPteeBase;
 }
 
+// Fortran pointer attachments treated descriptors as plain pointers, so
+// automapped arrays lose their declared bounds on the device. Recognize
+// CFI descriptors to compute their actual size before copying, ensuring the
+// full descriptor (including bounds) is transferred during attachment.
+static int64_t getFortranDescriptorSize(void **HstPtrAddr,
+                                        int64_t ReportedSize) {
+  constexpr int64_t VoidPtrSize = sizeof(void *);
+
+  if (!HstPtrAddr || ReportedSize > VoidPtrSize)
+    return ReportedSize;
+
+  const CFI_cdesc_t *Desc = reinterpret_cast<const CFI_cdesc_t *>(HstPtrAddr);
+
+  if (Desc->version != CFI_VERSION)
+    return ReportedSize;
+
+  if (Desc->rank > CFI_MAX_RANK)
+    return ReportedSize;
+
+  const char *RawDesc = reinterpret_cast<const char *>(Desc);
+  const char *DimsAddr = reinterpret_cast<const char *>(&Desc->dim);
+  size_t HeaderBytes = static_cast<size_t>(DimsAddr - RawDesc);
+  size_t DimsBytes = static_cast<size_t>(Desc->rank) * sizeof(CFI_dim_t);
+  size_t TotalBytes = HeaderBytes + DimsBytes;
+
+  return std::max<int64_t>(ReportedSize, static_cast<int64_t>(TotalBytes));
+}
+
 /// Utility function to perform a pointer attachment operation.
 ///
 /// For something like:
@@ -445,6 +476,7 @@ static int performPointerAttachment(DeviceTy &Device, AsyncInfoTy &AsyncInfo,
          "Need a valid pointer entry to perform pointer-attachment");
 
   constexpr int64_t VoidPtrSize = sizeof(void *);
+  HstPtrSize = getFortranDescriptorSize(HstPtrAddr, HstPtrSize);
   assert(HstPtrSize >= VoidPtrSize && "PointerSize is too small");
 
   void *TgtPteeBase =
diff --git a/offload/test/offloading/fortran/declare-target-automap.f90 b/offload/test/offloading/fortran/declare-target-automap.f90
index b44c0b2815274..b9c2d34c834fa 100644
--- a/offload/test/offloading/fortran/declare-target-automap.f90
+++ b/offload/test/offloading/fortran/declare-target-automap.f90
@@ -1,9 +1,6 @@
 !Offloading test for AUTOMAP modifier in declare target enter
 ! REQUIRES: flang, amdgpu
 
-! FIXME: https://github.com/llvm/llvm-project/issues/161265
-! XFAIL: amdgpu
-
 ! RUN: %libomptarget-compile-fortran-run-and-check-generic
 program automap_program
    use iso_c_binding, only: c_loc

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes Fortran automap handling in OpenMP target regions by addressing lost bounds information for automapped Fortran arrays on the device. The changes ensure that Fortran descriptor metadata, including array bounds, is properly transferred during pointer attachment operations.

Key changes:

  • Move automapped Fortran descriptors from map() to has_device_addr in target regions
  • Add CFI descriptor detection in libomptarget to compute correct descriptor sizes
  • Remove test failure expectations for the now-fixed functionality

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
offload/test/offloading/fortran/declare-target-automap.f90 Removes XFAIL directive as the test now passes
offload/libomptarget/omptarget.cpp Adds CFI descriptor size calculation for proper Fortran metadata transfer
flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp Implements logic to move automapped descriptors from map to has_device_addr

@TIFitis
Copy link
Member Author

TIFitis commented Oct 14, 2025

Ping for review.

@agozillon
Copy link
Contributor

I'm happy with the compiler changes, but I think we should drop the runtime changes and keep the test XFAIL'd for the moment.

The runtime fix is trying to address a problem that will be resolved without a runtime change when swapping to attach mapping, which will happen in the near future if I can land the PR stack I have up finally and can advance it forward.

I've done a bit of a deep dive into this and the runtime fix appears to be trying to address two things, the main thing is that we don't actually support declare target to upstream fully until the aforementioned PR stack lands, so what we're actually doing in this test currently is still using a kernel input argument instead of the actual declare target global that's generated by the compiler and should be used in it's place (PR stack addresses this and a few dozen other things).

Currently we map allocatables as a form of record type/C struct, where we have the descriptor as a parent, and the base address as a pointer map member that points to some data, like the below:

omptarget --> Entry 0: Base=0x00005e1e0039c0a0, Begin=0x00005e1e0039c0a0, Size=48, Type=0x0, Name=/home/agozillo/git/aomp22.0/llvm-project/offload/test/offloading/fortran/declare-target-automap.f90
omptarget --> Entry 1: Base=0x00005e1e0039c0a0, Begin=0x00005e1e0039c0a0, Size=48, Type=0x1000000000001, Name=/home/agozillo/git/aomp22.0/llvm-project/offload/test/offloading/fortran/declare-target-automap.f90
omptarget --> Entry 2: Base=0x00005e1e0039c0a0, Begin=0x00005e1dfedc8c90, Size=8, Type=0x1000000000001, Name=/home/agozillo/git/aomp22.0/llvm-project/offload/test/offloading/fortran/declare-target-automap.f90
omptarget --> Entry 3: Base=0x00005e1dfedc8c90, Begin=0x00005e1e0039c0a0, Size=40, Type=0x1000000000011, Name=/home/agozillo/git/aomp22.0/llvm-project/offload/test/offloading/fortran/declare-target-automap.f90

The 8-bytes is what the OMP_PTR_AND_OBJ flag gets applied to and eventually runs through the runtime changes we add in this PR. The change is effectively tricking the runtime into thinking the 8-byte mapping should map the same bits as Entry 1.

Later the has_device_addr map we give to the compiler has the same address as entry 2's begin and entry 3's base address, which is likely confusing the runtime a little. It likely should have the same address as entry 0/1 (but this alone isn't going to fix the issue I think).

But I believe the main issue is we've made an example that depends on declare target to/enter, which is only partially working upstream at the moment and we're trying to fix an issue that can't be addressed correctly/fully yet. So my thoughts are we keep the test xfailed for a bit longer, and land the compiler changes and wait a bit for the rest of the support to land to revisit this (can revisit when the PR stack lands to see if it'll work with just that, but I've verified that this will work downstream with the compiler changes in this PR and the attach map work) :-)

@TIFitis
Copy link
Member Author

TIFitis commented Oct 15, 2025

@agozillon Thanks for the review and explanation. In that case I'll hold off on landing this PR and keep an eye out for when your changes land. Once landed, I'll rebase this PR and verify the test works before landing it.

Thanks.

@agozillon
Copy link
Contributor

Thank you very much for the understanding and great idea @TIFitis I'll drop a message in here when it's landed, it unfortunately might be a while as the review is a little held up at the moment unfortunately and then a subsequent PR will need to land as well. Was very cool to see the runtime addition for checking the descriptors!

@abhinavgaba
Copy link
Contributor

abhinavgaba commented Oct 15, 2025

I agree, please avoid the runtime changes. libomptarget should not need to internally detect descriptor shapes/sizes.

Currently we map allocatables as a form of record type/C struct, where we have the descriptor as a parent, and the base address as a pointer map member that points to some data, like the below:

omptarget --> Entry 0: Base=0x00005e1e0039c0a0, Begin=0x00005e1e0039c0a0, Size=48, Type=0x0...
omptarget --> Entry 1: Base=0x00005e1e0039c0a0, Begin=0x00005e1e0039c0a0, Size=48, Type=0x1000000000001,...
omptarget --> Entry 2: Base=0x00005e1e0039c0a0, Begin=0x00005e1dfedc8c90, Size=8, Type=0x1000000000001, ...
omptarget --> Entry 3: Base=0x00005e1dfedc8c90, Begin=0x00005e1e0039c0a0, Size=40, Type=0x1000000000011, ...

Once Andrew's flang changes to use the ATTACH map-type in the codegen are upstreamed, then automap(automap_array) can just be translated into map(alloc: automap_array(:)) on any allocate statement.

&pointee_of_automap_array, &pointee_of_automap_array, size_of_pointee, ALLOC
&descriptor_or_pointer_of_automap_array, &pointee_of_automap_array, size_of_descriptor_or_pointer, ATTACH

Which would imply mapping the pointee data, and doing an attachment to a descriptor that's already present on the device using the ATTACH map-type bit. The ATTACH map-type already accepts the size of the descriptor as the size argument (the assumption it makes is that the pointee address is in the first field of the descriptor).

That way, the descriptor for the declare-target version of automap_array will get updated on device.

It is not correct to use has_device_addr to pass the value in, because has_device_addr only works for the lexical scope of the target construct. A declare target variable may be accessed in a declare target subroutine. So the device version of the descriptor has to be updated using the ATTACH map as shown above.

@vzakhari
Copy link
Contributor

I second Abhinav's assessment. I believe libomptarget is supposed to be language independent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flang:fir-hlfir flang Flang issues not falling into any other category offload

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants