-
Notifications
You must be signed in to change notification settings - Fork 183
cmd-diff: extract files instead of using FUSE. #4333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Using the FUSE mount was unstable in my testing. This approach use more disk space but is faster. Files will also survive under `tmp-diff` will also still be there after git diff returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors cmd-diff
to replace the unstable FUSE-based mounting of disk images with a more robust approach that extracts files using libguestfs
and tar
. This aligns with the goal of improving stability and speed. The changes also introduce new generic command-line arguments, --artifact
and --artifact-part-table
, for diffing artifacts.
My review has identified a critical issue where the functions to handle these new arguments are missing, which will cause the script to crash. I've also pointed out a medium-severity issue in the new diff_metal
implementation related to exception handling that could mask the root cause of errors. Addressing these points will ensure the new functionality is complete and robust.
if args.artifact: | ||
diff_artifact(diff_from, diff_to, args.artifact) | ||
if args.artifact_part_table: | ||
diff_artifact_partitions(diff_from, diff_to, args.artifact_part_table) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions diff_artifact
and diff_artifact_partitions
are called here but are not defined anywhere in the file. This will cause a NameError
at runtime when using the --artifact
or --artifact-part-table
arguments.
To fix this, you need to implement these functions. They should likely act as dispatchers to the specific diffing functions based on the artifact name. For example:
# A map of artifact names to their diffing functions
ARTIFACT_DIFFERS = {
'metal': diff_metal,
}
# A map of artifact names to their partition table diffing functions
ARTIFACT_PARTITION_DIFFERS = {
'metal': diff_metal_partitions,
}
def diff_artifact(diff_from, diff_to, artifact_name):
"""Looks up and calls the diffing function for a given artifact."""
differ = ARTIFACT_DIFFERS.get(artifact_name)
if not differ:
raise ValueError(f"No differ found for artifact: {artifact_name}")
print(f"Diffing {artifact_name} artifact...")
differ(diff_from, diff_to)
def diff_artifact_partitions(diff_from, diff_to, artifact_name):
"""Looks up and calls the partition diffing function for a given artifact."""
differ = ARTIFACT_PARTITION_DIFFERS.get(artifact_name)
if not differ:
raise ValueError(f"No partition differ found for artifact: {artifact_name}")
print(f"Diffing {artifact_name} artifact partition table...")
differ(diff_from, diff_to)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Maybe you didn't mean to include this bit in this PR?
finally: | ||
if g: | ||
g.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an exception occurs in the try
block, and g.close()
also raises an exception in this finally
block, the original exception will be hidden. This can make debugging more difficult.
To make the error handling more robust, you can wrap the call to g.close()
in its own try...except
block. This ensures that an error during cleanup doesn't mask the root cause of a failure.
finally: | |
if g: | |
g.close() | |
finally: | |
if g: | |
try: | |
g.close() | |
except Exception as e: | |
print(f"Error closing guestfs for {image_path}, ignoring: {e}", file=sys.stderr) |
finally: | ||
if g: | ||
g.close() | ||
diff_cmd_outputs(['sfdisk', '--dump'], metal_from, metal_to) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is probably worth its own commit (i.e. I'm interested in the reasoning and future people may be interested in that reasoning too).
|
||
for image_path, target_dir in [(metal_from, dir_from), (metal_to, dir_to)]: | ||
if os.path.exists(target_dir): | ||
shutil.rmtree(target_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I run cosa diff --metal && cosa diff --metal
will the results of the previous run survive? i.e. it would be nice to detect if the work had already been done and use the previous work here.
|
||
g = None | ||
try: | ||
g = guestfs.GuestFS(python_return_dict=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're still using GuestFS here, which is completely fine (I'm not sure of another unprivileged way to mount things anyway), but I'm wondering if the previous problems about all files not being detected/used are taken care of with this PR?
i.e. if we're still using GuestFS, what's different now that fixes the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was because I had I/O errors with FUSE so I thought dumping everything would be more reliable.
I can't say if it's more reliable, i haven't done enough testing
excludes = ["*backingFsBlockDev"] | ||
|
||
with tempfile.NamedTemporaryFile(suffix=".tar", delete=True) as tmp_tar: | ||
g.tar_out("/", tmp_tar.name, xattrs=True, selinux=True, excludes= excludes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
g.tar_out("/", tmp_tar.name, xattrs=True, selinux=True, excludes= excludes) | |
g.tar_out("/", tmp_tar.name, xattrs=True, selinux=True, excludes=excludes) |
Using the FUSE mount was unstable in my testing. This approach use more disk space but is faster.
Files will also survive under
tmp-diff
will also still be there after git diff returns.