[mlir] Use MLIR op names when generating FileCheck variables in generate-test-checks.py #160820

banach-space · 2025-09-26T07:23:31Z

Motivation

Improve readability and stability of autogenerated CHECK lines by using
operation-aware FileCheck variable names instead of generic VAL_N.

What changes

When possible, variable names are derived from the MLIR op name, e.g.
vector.transfer_read → TRANSFER_READ_0.
Unknown ops (e.g., from out-of-tree dialects) fall back to the prior
VAL_N scheme.

Before

  // CHECK: %[[VAL_4:.*]] = vector.transfer_read ...
  // CHECK: %[[VAL_5:.*]] = "val_use"(%[[VAL_4]]) : ...

After

  // CHECK: %[[TRANSFER_READ_0:.*]] = vector.transfer_read ...
  // CHECK: %[[VAL_1:.*]] = "val_use"(%[[TRANSFER_READ_0]]) : ...

Rationale

Using op-derived names (e.g., TRANSFER_READ_0) makes tests easier to
read and audit, while remaining more stable across unrelated edits (e.g.
there will always be fewer TRANSFER_READ_#N variables than VAL_#N).
The fallback to VAL_N preserves compatibility for unknown ops.

llvmbot · 2025-09-26T07:24:05Z

@llvm/pr-subscribers-mlir

Author: Andrzej Warzyński (banach-space)

Changes

Motivation

Improve readability and stability of autogenerated CHECK lines by using
operation-aware FileCheck variable names instead of generic VAL_N.

What changes

When possible, variable names are derived from the MLIR op name, e.g.
vector.transfer_read → TRANSFER_READ_0.
Unknown ops (e.g., from out-of-tree dialects) fall back to the prior
VAL_N scheme.

Before

  // CHECK: %[[VAL_4:.*]] = vector.transfer_read ...
  // CHECK: %[[VAL_5:.*]] = "val_use"(%[[VAL_4]]) : ...

After

  // CHECK: %[[TRANSFER_READ_0:.*]] = vector.transfer_read ...
  // CHECK: %[[VAL_1:.*]] = "val_use"(%[[TRANSFER_READ_0]]) : ...

Rationale

Using op-derived names (e.g., TRANSFER_READ_0) makes tests easier to
read and audit, while remaining more stable across unrelated edits (e.g.
there will always be fewer TRANSFER_READ_#N variables than VAL_#N).
The fallback to VAL_N preserves compatibility for unknown ops.

Full diff: https://github.com/llvm/llvm-project/pull/160820.diff

1 Files Affected:

(modified) mlir/utils/generate-test-checks.py (+30-4)

diff --git a/mlir/utils/generate-test-checks.py b/mlir/utils/generate-test-checks.py
index f80a1811f418c..02ed4a24e122c 100755
--- a/mlir/utils/generate-test-checks.py
+++ b/mlir/utils/generate-test-checks.py
@@ -31,6 +31,7 @@
 import os  # Used to advertise this file's name ("autogenerated_note").
 import re
 import sys
+from collections import Counter
 
 ADVERT_BEGIN = "// NOTE: Assertions have been autogenerated by "
 ADVERT_END = """
@@ -45,6 +46,14 @@
 SSA_RE_STR = "[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*"
 SSA_RE = re.compile(SSA_RE_STR)
 
+# Regex matching `dialect.op_name`, where `dialect` is an upstream MLIR
+# dialect (e.g. `vector.transfer_read`).
+DIALECTS = "acc|affine|amdgpu|amx|arith|arm_neon|arm_sve|arm_sme|async|bufferization|cf|complex|dlti|emitc|\
+    func|gpu|index|irdl|linalg|llvm|math|memref|ml_program|mpi|nvgpu|nvvm|omp|pdl_interp|pdl|ptr|quant|\
+    rocdl|scf|shape|shard|smt|sparse_tensor|tensor|ub|vcix|vector|wasmssa|x86vector|xegpu|xevm|spirv|tosa|\
+    transform"
+SSA_OP_NAME_RE = re.compile(rf'\b(?:{DIALECTS})[.]([a-z_]+)\b')
+
 # Regex matching the left-hand side of an assignment
 SSA_RESULTS_STR = r'\s*(%' + SSA_RE_STR + r')(\s*,\s*(%' + SSA_RE_STR + r'))*\s*='
 SSA_RESULTS_RE = re.compile(SSA_RESULTS_STR)
@@ -63,7 +72,12 @@
 class VariableNamer:
     def __init__(self, variable_names):
         self.scopes = []
+        # Counter for generic FileCHeck names, e.g. VAL_#N
         self.name_counter = 0
+        # Counters for FileCheck names derived from Op names, e.g.
+        # TRANSFER_READ_#N (based on `vector.transfer_read`). Note, there's a
+        # dedicated counter for every Op type present in the input.
+        self.op_name_counter = Counter()
 
         # Number of variable names to still generate in parent scope
         self.generate_in_parent_scope_left = 0
@@ -77,7 +91,7 @@ def generate_in_parent_scope(self, n):
         self.generate_in_parent_scope_left = n
 
     # Generate a substitution name for the given ssa value name.
-    def generate_name(self, source_variable_name, use_ssa_name):
+    def generate_name(self, source_variable_name, use_ssa_name, op_name = ""):
 
         # Compute variable name
         variable_name = self.variable_names.pop(0) if len(self.variable_names) > 0 else ''
@@ -86,8 +100,16 @@ def generate_name(self, source_variable_name, use_ssa_name):
             # a FileCHeck substation string. As FileCheck requires these
             # strings to start with a character, skip MLIR variables starting
             # with a digit (e.g. `%0`).
+            #
+            # The next fallback option is to use the op name, if the
+            # corresponding match succeeds.
+            #
+            # If neither worked, use a generic name: `VAL_#N`.
             if use_ssa_name and source_variable_name[0].isalpha():
                 variable_name = source_variable_name.upper()
+            elif op_name != "":
+                variable_name = op_name.upper() + "_" + str(self.op_name_counter[op_name])
+                self.op_name_counter[op_name] += 1
             else:
                 variable_name = "VAL_" + str(self.name_counter)
                 self.name_counter += 1
@@ -123,6 +145,7 @@ def num_scopes(self):
     def clear_names(self):
         self.name_counter = 0
         self.used_variable_names = set()
+        self.op_name_counter.clear()
 
 class AttributeNamer:
 
@@ -170,8 +193,10 @@ def process_line(line_chunks, variable_namer, use_ssa_name=False, strict_name_re
 
     # Process the rest that contained an SSA value name.
     for chunk in line_chunks:
-        m = SSA_RE.match(chunk)
-        ssa_name = m.group(0) if m is not None else ''
+        ssa = SSA_RE.match(chunk)
+        op_name_with_dialect = SSA_OP_NAME_RE.search(chunk)
+        ssa_name = ssa.group(0) if ssa is not None else ''
+        op_name = op_name_with_dialect.group(1) if op_name_with_dialect is not None else ''
 
         # Check if an existing variable exists for this name.
         variable = None
@@ -185,7 +210,8 @@ def process_line(line_chunks, variable_namer, use_ssa_name=False, strict_name_re
             output_line += "%[[" + variable + "]]"
         else:
             # Otherwise, generate a new variable.
-            variable = variable_namer.generate_name(ssa_name, use_ssa_name)
+            variable = variable_namer.generate_name(ssa_name, use_ssa_name,
+                                                    op_name)
             if strict_name_re:
                 # Use stricter regexp for the variable name, if requested.
                 # Greedy matching may cause issues with the generic '.*'

github-actions · 2025-09-26T07:27:18Z

✅ With the latest revision this PR passed the Python code formatter.

…ate-test-checks.py Motivation ---------- Improve readability and stability of autogenerated CHECK lines by using operation-aware FileCheck variable names instead of generic VAL_N. What changes ------------ - When possible, variable names are derived from the MLIR op name, e.g. `vector.transfer_read` → `TRANSFER_READ_0`. - Unknown ops (e.g., from out-of-tree dialects) fall back to the prior `VAL_N` scheme. Before ------ ```mlir // CHECK: %[[VAL_4:.*]] = vector.transfer_read ... // CHECK: %[[VAL_5:.*]] = "val_use"(%[[VAL_4]]) : ... ``` After ----- ```mlir // CHECK: %[[TRANSFER_READ_0:.*]] = vector.transfer_read ... // CHECK: %[[VAL_1:.*]] = "val_use"(%[[TRANSFER_READ_0]]) : ... ``` Rationale --------- Using op-derived names (e.g., `TRANSFER_READ_0`) makes tests easier to read and audit, while remaining more stable across unrelated edits (e.g. there will always be fewer `TRANSFER_READ_#N` variables than `VAL_#N`). The fallback to `VAL_N` preserves compatibility for unknown ops.

joker-eph · 2025-09-26T12:31:13Z

I concerned about this because the script is not meant to provide something readable in a "ready to use" kind of way.

We shouldn't have tests with [[VAL0]] but we equally should not have [[ADDI]] : instead the author should use descriptive names in the context of the test.

Making the script better here may be counter-productive in that it would be encouraging people to not pay more attention to the naming. And reviewer are also less likely to notice when the output of the script is used "as-is".

michaelmaitland · 2025-09-26T12:53:58Z

I concerned about this because the script is not meant to provide something readable in a "ready to use" kind of way.

We shouldn't have tests with [[VAL0]] but we equally should not have [[ADDI]] : instead the author should use descriptive names in the context of the test.

Why is that the case? If tests shouldn't have VAL0 nor ADDI, then what should they have?

This is how the llvm update_test_checks script works, and its quite helpful. I am in favor of this change.

I concerned about this because the script is not meant to provide something readable in a "ready to use" kind of way.

Also, the update_test_checks are intended to provide something that is ready to use. Why do we feel differently about the MLIR version? I consider the checks this script generates as "ready to use" -- same goes for folks on the LLVM side.

joker-eph · 2025-09-26T13:27:05Z

Why is that the case? If tests shouldn't have VAL0 nor ADDI, then what should they have?

Semantically meaningful names.

This is how the llvm update_test_checks script works, and its quite helpful.

It may be helpful for quickly writing tests: these are however fragile and unmaintainable. "Tests should be minimal, and only check what is absolutely necessary."

Also, the update_test_checks are intended to provide something that is ready to use. Why do we feel differently about the MLIR version? I consider the checks this script generates as "ready to use" -- same goes for folks on the LLVM side.

I don't know how LLVM got there, or what is the rationale behind this.
I personally strongly object to having tests are not minimally checking the output (that is remove everything that isn't relevant to the check. This seems fundamentally incompatible with scripting.

banach-space · 2025-09-26T15:47:48Z

Thanks for taking a look and for the feedback!

I concerned about this because the script is not meant to provide something readable in a "ready to use" kind of way.

Note, the high-level rationale behind this script does not change:

llvm-project/mlir/utils/generate-test-checks.py

Lines 22 to 23 in 0d989b2

    
           The script is designed to make adding checks to a test case fast, it is *not* 
        
           designed to be authoritative about what constitutes a good test!

I personally strongly object to having tests are not minimally checking the output

Agreed - that’s exactly what we document under FileCheck Best Practices. This PR doesn’t change that principle; it only tweaks how placeholders are named.

Making the script better here may be counter-productive in that it would be encouraging people to not pay more attention to the naming.

To me, moving from VAL_0 to something like ADDI_0 is a small step in the right direction, and probably sufficient in many cases. Ultimately, it’s contributors and reviewers who make sure our Testing Guide is followed.

Totally agreed that no script or automation should replace human input. Personally though, I’d much rather see [[ADDI_0]] than [[VAL_0]] when the human input is skipped.

Btw, I’ll also highlight the Testing Guide at LLVM Dev to reinforce it.

Just a heads up: I’ll be offline most of next week, so my responses will be slower than usual.

joker-eph · 2025-09-26T15:50:12Z

Can we update

llvm-project/mlir/utils/generate-test-checks.py

Lines 36 to 41 in 0d989b2

    
           ADVERT_END = """ 
        
           // The script is designed to make adding checks to 
        
           // a test case fast, it is *not* designed to be authoritative 
        
           // about what constitutes a good test! The CHECK should be 
        
           // minimized and named to reflect the test intent. 
        
           """

to be even more explicit about reducing the test case and improving the naming?
(Also including a link to the testing guideline alongside)

vzakhari · 2025-09-26T15:57:40Z

I like this change! It will be a bit easier to rename the check variables with meaningful names, if they are more verbose initially.

I also support the updates suggested by Mehdi.

amirBish · 2025-09-26T20:06:17Z

The change looks good!

I wonder if there is an easy way to match also the unknown dialects in a conservative way, so downstream users can get additional benefits from it, and we will not have a hardcoded list of dialects in the modified script.

joker-eph · 2025-09-26T21:10:02Z

mlir/utils/generate-test-checks.py

Why this list of dialects? The MLIR syntax is quite strict: there is necessarily a = after the SSA value(s) and then the op name is always formed by dialect.op_name . The RE should be able to find this without hardcoding any list.

jpienaar · 2025-09-30T06:02:58Z

mlir/utils/generate-test-checks.py

+    func|gpu|index|irdl|linalg|llvm|math|memref|ml_program|mpi|nvgpu|nvvm|omp|pdl_interp|pdl|ptr|quant|\
+    rocdl|scf|shape|shard|smt|sparse_tensor|tensor|ub|vcix|vector|wasmssa|x86vector|xegpu|xevm|spirv|tosa|\
+    transform"
+SSA_OP_NAME_RE = re.compile(rf"\b(?:{DIALECTS})[.]([a-z_]+)\b")


What happens in the case where the dialect is elided? +1 to removing list of dialects.

What happens in the case where the dialect is elided?

In the input IR? Is it ever?

+1 to removing list of dialects.

Done

jpienaar

Folks should still refine test post generation indeed, but this makes the changes they need to make potentially smaller/often sufficient names and so makes it easier to trim the rest.

banach-space · 2025-10-05T16:32:40Z

Hi folks, I've removed the hard-coded list of dialects and update the note generated by the script, thanks for the suggestions!

I wonder if there is an easy way to match also the unknown dialects in a conservative way, so downstream users can get additional benefits from it, and we will not have a hardcoded list of dialects in the modified script.

The latest revision will work for downstream dialects as well :)

llvmbot added the mlir label Sep 26, 2025

banach-space requested review from River707 and vzakhari September 26, 2025 07:23

banach-space requested a review from michaelmaitland September 26, 2025 07:24

banach-space force-pushed the andrzej/update_generate_checks branch from bd34708 to 70cb5b6 Compare September 26, 2025 09:07

joker-eph reviewed Sep 26, 2025

View reviewed changes

jpienaar reviewed Sep 30, 2025

View reviewed changes

Update ADVERT_END note and remove the list of dialects

43ded5b

[mlir] Use MLIR op names when generating FileCheck variables in generate-test-checks.py #160820

Are you sure you want to change the base?

[mlir] Use MLIR op names when generating FileCheck variables in generate-test-checks.py #160820

Uh oh!

Conversation

banach-space commented Sep 26, 2025

Motivation

What changes

Before

After

Rationale

Uh oh!

llvmbot commented Sep 26, 2025

Motivation

What changes

Before

After

Rationale

Uh oh!

github-actions bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joker-eph commented Sep 26, 2025

Uh oh!

michaelmaitland commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joker-eph commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

banach-space commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joker-eph commented Sep 26, 2025

Uh oh!

vzakhari commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amirBish commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joker-eph Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

jpienaar Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

banach-space Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

jpienaar left a comment

Choose a reason for hiding this comment

Uh oh!

banach-space commented Oct 5, 2025

Uh oh!

Uh oh!

github-actions bot commented Sep 26, 2025 •

edited

Loading

michaelmaitland commented Sep 26, 2025 •

edited

Loading

joker-eph commented Sep 26, 2025 •

edited

Loading

banach-space commented Sep 26, 2025 •

edited

Loading

vzakhari commented Sep 26, 2025 •

edited

Loading

amirBish commented Sep 26, 2025 •

edited

Loading