-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[mlir] Use MLIR op names when generating FileCheck variables in generate-test-checks.py #160820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[mlir] Use MLIR op names when generating FileCheck variables in generate-test-checks.py #160820
Conversation
@llvm/pr-subscribers-mlir Author: Andrzej Warzyński (banach-space) ChangesMotivationImprove readability and stability of autogenerated CHECK lines by using What changes
Before // CHECK: %[[VAL_4:.*]] = vector.transfer_read ...
// CHECK: %[[VAL_5:.*]] = "val_use"(%[[VAL_4]]) : ... After // CHECK: %[[TRANSFER_READ_0:.*]] = vector.transfer_read ...
// CHECK: %[[VAL_1:.*]] = "val_use"(%[[TRANSFER_READ_0]]) : ... RationaleUsing op-derived names (e.g., Full diff: https://github.com/llvm/llvm-project/pull/160820.diff 1 Files Affected:
diff --git a/mlir/utils/generate-test-checks.py b/mlir/utils/generate-test-checks.py
index f80a1811f418c..02ed4a24e122c 100755
--- a/mlir/utils/generate-test-checks.py
+++ b/mlir/utils/generate-test-checks.py
@@ -31,6 +31,7 @@
import os # Used to advertise this file's name ("autogenerated_note").
import re
import sys
+from collections import Counter
ADVERT_BEGIN = "// NOTE: Assertions have been autogenerated by "
ADVERT_END = """
@@ -45,6 +46,14 @@
SSA_RE_STR = "[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*"
SSA_RE = re.compile(SSA_RE_STR)
+# Regex matching `dialect.op_name`, where `dialect` is an upstream MLIR
+# dialect (e.g. `vector.transfer_read`).
+DIALECTS = "acc|affine|amdgpu|amx|arith|arm_neon|arm_sve|arm_sme|async|bufferization|cf|complex|dlti|emitc|\
+ func|gpu|index|irdl|linalg|llvm|math|memref|ml_program|mpi|nvgpu|nvvm|omp|pdl_interp|pdl|ptr|quant|\
+ rocdl|scf|shape|shard|smt|sparse_tensor|tensor|ub|vcix|vector|wasmssa|x86vector|xegpu|xevm|spirv|tosa|\
+ transform"
+SSA_OP_NAME_RE = re.compile(rf'\b(?:{DIALECTS})[.]([a-z_]+)\b')
+
# Regex matching the left-hand side of an assignment
SSA_RESULTS_STR = r'\s*(%' + SSA_RE_STR + r')(\s*,\s*(%' + SSA_RE_STR + r'))*\s*='
SSA_RESULTS_RE = re.compile(SSA_RESULTS_STR)
@@ -63,7 +72,12 @@
class VariableNamer:
def __init__(self, variable_names):
self.scopes = []
+ # Counter for generic FileCHeck names, e.g. VAL_#N
self.name_counter = 0
+ # Counters for FileCheck names derived from Op names, e.g.
+ # TRANSFER_READ_#N (based on `vector.transfer_read`). Note, there's a
+ # dedicated counter for every Op type present in the input.
+ self.op_name_counter = Counter()
# Number of variable names to still generate in parent scope
self.generate_in_parent_scope_left = 0
@@ -77,7 +91,7 @@ def generate_in_parent_scope(self, n):
self.generate_in_parent_scope_left = n
# Generate a substitution name for the given ssa value name.
- def generate_name(self, source_variable_name, use_ssa_name):
+ def generate_name(self, source_variable_name, use_ssa_name, op_name = ""):
# Compute variable name
variable_name = self.variable_names.pop(0) if len(self.variable_names) > 0 else ''
@@ -86,8 +100,16 @@ def generate_name(self, source_variable_name, use_ssa_name):
# a FileCHeck substation string. As FileCheck requires these
# strings to start with a character, skip MLIR variables starting
# with a digit (e.g. `%0`).
+ #
+ # The next fallback option is to use the op name, if the
+ # corresponding match succeeds.
+ #
+ # If neither worked, use a generic name: `VAL_#N`.
if use_ssa_name and source_variable_name[0].isalpha():
variable_name = source_variable_name.upper()
+ elif op_name != "":
+ variable_name = op_name.upper() + "_" + str(self.op_name_counter[op_name])
+ self.op_name_counter[op_name] += 1
else:
variable_name = "VAL_" + str(self.name_counter)
self.name_counter += 1
@@ -123,6 +145,7 @@ def num_scopes(self):
def clear_names(self):
self.name_counter = 0
self.used_variable_names = set()
+ self.op_name_counter.clear()
class AttributeNamer:
@@ -170,8 +193,10 @@ def process_line(line_chunks, variable_namer, use_ssa_name=False, strict_name_re
# Process the rest that contained an SSA value name.
for chunk in line_chunks:
- m = SSA_RE.match(chunk)
- ssa_name = m.group(0) if m is not None else ''
+ ssa = SSA_RE.match(chunk)
+ op_name_with_dialect = SSA_OP_NAME_RE.search(chunk)
+ ssa_name = ssa.group(0) if ssa is not None else ''
+ op_name = op_name_with_dialect.group(1) if op_name_with_dialect is not None else ''
# Check if an existing variable exists for this name.
variable = None
@@ -185,7 +210,8 @@ def process_line(line_chunks, variable_namer, use_ssa_name=False, strict_name_re
output_line += "%[[" + variable + "]]"
else:
# Otherwise, generate a new variable.
- variable = variable_namer.generate_name(ssa_name, use_ssa_name)
+ variable = variable_namer.generate_name(ssa_name, use_ssa_name,
+ op_name)
if strict_name_re:
# Use stricter regexp for the variable name, if requested.
# Greedy matching may cause issues with the generic '.*'
|
✅ With the latest revision this PR passed the Python code formatter. |
…ate-test-checks.py Motivation ---------- Improve readability and stability of autogenerated CHECK lines by using operation-aware FileCheck variable names instead of generic VAL_N. What changes ------------ - When possible, variable names are derived from the MLIR op name, e.g. `vector.transfer_read` → `TRANSFER_READ_0`. - Unknown ops (e.g., from out-of-tree dialects) fall back to the prior `VAL_N` scheme. Before ------ ```mlir // CHECK: %[[VAL_4:.*]] = vector.transfer_read ... // CHECK: %[[VAL_5:.*]] = "val_use"(%[[VAL_4]]) : ... ``` After ----- ```mlir // CHECK: %[[TRANSFER_READ_0:.*]] = vector.transfer_read ... // CHECK: %[[VAL_1:.*]] = "val_use"(%[[TRANSFER_READ_0]]) : ... ``` Rationale --------- Using op-derived names (e.g., `TRANSFER_READ_0`) makes tests easier to read and audit, while remaining more stable across unrelated edits (e.g. there will always be fewer `TRANSFER_READ_#N` variables than `VAL_#N`). The fallback to `VAL_N` preserves compatibility for unknown ops.
bd34708
to
70cb5b6
Compare
I concerned about this because the script is not meant to provide something readable in a "ready to use" kind of way. We shouldn't have tests with Making the script better here may be counter-productive in that it would be encouraging people to not pay more attention to the naming. And reviewer are also less likely to notice when the output of the script is used "as-is". |
Why is that the case? If tests shouldn't have VAL0 nor ADDI, then what should they have? This is how the llvm update_test_checks script works, and its quite helpful. I am in favor of this change.
Also, the update_test_checks are intended to provide something that is ready to use. Why do we feel differently about the MLIR version? I consider the checks this script generates as "ready to use" -- same goes for folks on the LLVM side. |
Semantically meaningful names.
It may be helpful for quickly writing tests: these are however fragile and unmaintainable. "Tests should be minimal, and only check what is absolutely necessary."
I don't know how LLVM got there, or what is the rationale behind this. |
Thanks for taking a look and for the feedback!
Note, the high-level rationale behind this script does not change: llvm-project/mlir/utils/generate-test-checks.py Lines 22 to 23 in 0d989b2
Agreed - that’s exactly what we document under FileCheck Best Practices. This PR doesn’t change that principle; it only tweaks how placeholders are named.
To me, moving from Totally agreed that no script or automation should replace human input. Personally though, I’d much rather see Btw, I’ll also highlight the Testing Guide at LLVM Dev to reinforce it. Just a heads up: I’ll be offline most of next week, so my responses will be slower than usual. |
Can we update llvm-project/mlir/utils/generate-test-checks.py Lines 36 to 41 in 0d989b2
(Also including a link to the testing guideline alongside) |
I like this change! It will be a bit easier to rename the check variables with meaningful names, if they are more verbose initially. I also support the updates suggested by Mehdi. |
The change looks good! I wonder if there is an easy way to match also the unknown dialects in a conservative way, so downstream users can get additional benefits from it, and we will not have a hardcoded list of dialects in the modified script. |
mlir/utils/generate-test-checks.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this list of dialects? The MLIR syntax is quite strict: there is necessarily a =
after the SSA value(s) and then the op name is always formed by dialect.op_name
. The RE should be able to find this without hardcoding any list.
mlir/utils/generate-test-checks.py
Outdated
func|gpu|index|irdl|linalg|llvm|math|memref|ml_program|mpi|nvgpu|nvvm|omp|pdl_interp|pdl|ptr|quant|\ | ||
rocdl|scf|shape|shard|smt|sparse_tensor|tensor|ub|vcix|vector|wasmssa|x86vector|xegpu|xevm|spirv|tosa|\ | ||
transform" | ||
SSA_OP_NAME_RE = re.compile(rf"\b(?:{DIALECTS})[.]([a-z_]+)\b") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in the case where the dialect is elided? +1 to removing list of dialects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in the case where the dialect is elided?
In the input IR? Is it ever?
+1 to removing list of dialects.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Folks should still refine test post generation indeed, but this makes the changes they need to make potentially smaller/often sufficient names and so makes it easier to trim the rest.
Hi folks, I've removed the hard-coded list of dialects and update the note generated by the script, thanks for the suggestions!
The latest revision will work for downstream dialects as well :) |
Motivation
Improve readability and stability of autogenerated CHECK lines by using
operation-aware FileCheck variable names instead of generic VAL_N.
What changes
vector.transfer_read
→TRANSFER_READ_0
.VAL_N
scheme.Before
After
Rationale
Using op-derived names (e.g.,
TRANSFER_READ_0
) makes tests easier toread and audit, while remaining more stable across unrelated edits (e.g.
there will always be fewer
TRANSFER_READ_#N
variables thanVAL_#N
).The fallback to
VAL_N
preserves compatibility for unknown ops.