diff --git a/tools/verifyio/README.md b/tools/verifyio/README.md index b9f10e2..eb40bfd 100644 --- a/tools/verifyio/README.md +++ b/tools/verifyio/README.md @@ -1,36 +1,66 @@ +# VerifyIO +VerifyIO collects execution traces, detects data conflicts, and verifies proper synchronization against specified consistency models. +The workflow for VerifyIO can generally be divided into three independent steps (Step 1 - Step 3). This process requires [Recorder](https://github.com/uiuc-hpc/Recorder/tree/dev), [VerifyIO](https://github.com/uiuc-hpc/Recorder/tree/dev/tools/verifyio), and the corresponding traces. -### Verify if I/O operations are properly synchronized uder specific semantics ------------------------------- +Make sure `$RECORDER_INSTALL_PATH` is the install location of Recorder. +Additional [scripts](https://github.com/uiuc-hpc/Recorder/blob/dev/tools/verifyio/scripts/) are available for exporting results as CSV files and visualizing them (Step 4 and Step 5). The verification can be performed in a batched manner for multiple traces. For this, alternative steps can be followed to handle multiple traces more efficiently. +## Step 1: Run the application with Recorder to generate traces. -Make sure `$RECORDER_INSTALL_PATH` is the install location of Recorder. +```bash +mpirun -np N -env LD_PRELOAD $RECORDER_INSTALL_PATH/lib/librecorder.so ./test_mpi -Dependencies: `recorder-viz` and `networkx`. Both can be installed using pip. +# On HPC systems, you may need to use srun or +# other job schedulers to replace mpirun, e.g., +srun -n4 -N1 --overlap --export=ALL,LD_PRELOAD=$RECORDER_INSTALL_PATH/lib/librecorder.so ./test_mpi +``` +For detailed information on the Recorder and guidance on its usage, please refer to: https://recorder.readthedocs.io/latest/overview.html -Steps: -1. Run program with Recorder to generate traces. -2. Run the conflict detector to report **potential** conflicting I/O accesses. - Those acesses are only potentially conflicting as here we do not take happens-before order into consideration yet. +## Step 2: Conflict Detection +Run the conflict detector to report **potential** conflicting I/O accesses. Those acesses are only potentially conflicting as here we do not take happens-before order into consideration yet. +To detect conflicts, use the `conflict-detector` tool from Recorder: - `$RECORDER_INSTALL_PATH/bin/conflict_detector /path/to/traces` - - This command will write all potential conflicts found to the file `/path/to/traces/conflicts.txt` - -3. Finally run the verification code, which checks if those potential conflicting operations are properly synchronzied. +```bash +$RECORDER_INSTALL_PATH/bin/conflict-detector /path/to/trace +``` +This command will write all potential conflicts found to the file `/path/to/traces/conflicts.txt` - The `semantics` option needs to match the one provided by the underlying file system. For example, if the traces were collected on UnifyFS, set it to "commit". - - ```python - python ./verifyio.py -h # print out usage - - #Example: - python ./verifyio.py /path/to/traces --semantics=mpi - ``` - - - -#### Note on the third step: +Alternatively, the [`auto_detect.sh`](https://github.com/uiuc-hpc/Recorder/blob/dev/tools/verifyio/scripts/auto_detect.sh) script can be used to process multiple traces and detect conflicts. The script requires the path to the directory containing the traces as an argument. Please adjust the configuartion for resource manager accordingly. + +```bash +./auto_detect.sh /path/to/target/traces +``` + +## Step 3: Semantic Verification +The next step is to run the semantic verification using [`verifyio.py`](https://github.com/uiuc-hpc/Recorder/tree/dev/tools/verifyio). It checks if those potential conflicting operations are properly synchronzied. By default, MPI-IO semantics and the vector clock algorithm are used for verification. + +### Dependencies for step 3: + +Ensure the following dependencies are installed: + +#### Python Libraries +- **recorder-viz**: For visualizing recorder traces. `pip install recorder-viz` +- **networkx**: For creating, analyzing, and manipulating complex networks and graphs. Install using `pip install networkx`. + +```bash +python /path/to/verifyio.py /path/to/trace +``` +Available arguments: +* --semantics: Specifies the I/O semantics to verify. Choices are: POSIX, MPI-IO (default), Commit, Session, Custom +* --algorithm: Specifies the algorithm for verification. Choices are: 1: Graph reachability 2: Transitive closure 3: Vector clock (default) 4: On-the-fly MPI check +* --semantic_string: A custom semantic string for verification. Default is: "c1:+1[MPI_File_close, MPI_File_sync] & c2:-1[MPI_File_open, MPI_File_sync]"" +* --show_details: Displays details of the conflicts. +* --show_summary: Displays a summary of the conflicts. +* --show_full_chain: Displays the full call chain of the conflicts. + +Alternatively, the [`auto_verify.sh`](https://github.com/uiuc-hpc/Recorder/blob/dev/tools/verifyio/scripts/auto_verify.sh) script can be used for verification. This script requires the path to the directory containing the traces, and the path to verifyio as arguments. By default, it verifies all supported semantics, i.e., POSIX, MPI-IO, Commit, Session with Vector clock algorithm. + +```bash +./auto_verify.sh /path/to/target/traces /path/to/verifyio.py +``` + +#### Note on step 3: The code first matches all MPI calls to build a graph representing the happens-before order. Each node in the graph represents a MPI call, if there is a path from node A to node B, then A must happens-before B. @@ -48,3 +78,40 @@ However, things are a little different with default MPI user-imposed semantics ( With user-imposed semantics, the **"sync-barrier-sync"** construct is required to guarnatee sequencial consistency. Barrier can be replaced by a send-recv or the collectives listed above. Sync is one of MPI_File_open, MPI_File_close or MPI_File_sync. Now, given two potential conflicting accesses op1 an op2, our job is to find out if there is a **sync-barrier-sync** in between. + + +## Step 4: Export Results to CSV + +### Dependencies for step 4 & 5: + +Ensure the following dependencies are installed: + +#### Python Libraries +- **argparse**: For parsing command-line arguments. +- **pandas**: For data manipulation and analysis. Install using `pip install pandas`. +- **seaborn**: For advanced data visualization. Install using `pip install seaborn`. +- **matplotlib**: For creating static, animated, and interactive plots. Install using `pip install matplotlib`. +- **numpy**: For numerical computations. Install using `pip install numpy`. + +VerifyIO results can be exported to a CSV format for further analysis by using [`verifyio_to_csv.py `](https://github.com/uiuc-hpc/Recorder/blob/dev/tools/verifyio/scripts/verifyio_to_csv.py). Use the following command, providing the output file (usually stdout from VerifyIO execution) as an argument: + +```bash +python verifyio_to_csv.py /path/to/verifyio/output /path/to/output.csv +``` +Optional arguments: +* --dir_prefix="/prefix/to/remove": Removes a specified prefix from the paths in the output. +* --group_by_api: Groups the output by API calls. + +## Step 5: Heatmap Visualization + +To visualize the results from VerifyIO, use the [`verifyio_plot_heatmap.py`](https://github.com/uiuc-hpc/Recorder/blob/dev/tools/verifyio/scripts/verifyio_plot_violation_heatmap.py) script: + +```bash +python verifyio_plot_heatmap.py --file=/path/to/output.csv +``` +Alternatively, for visualizing up to three CSV files in a single heatmap, use the following argument: + +```bash +python verifyio_plot_heatmap.py --files="/path/to/output1.csv" "/path/to/output2.csv" "/path/to/output3.csv" +``` + diff --git a/tools/verifyio/scripts/auto_detect.sh b/tools/verifyio/scripts/auto_detect.sh new file mode 100644 index 0000000..60c3caf --- /dev/null +++ b/tools/verifyio/scripts/auto_detect.sh @@ -0,0 +1,25 @@ +#!/bin/bash + +if [ -z "$1" ]; then + echo "Usage: $0 " + exit 1 +fi + +BASE_DIR="$1" +PROGRAM="$RECORDER_INSTALL_PATH/bin/conflict-detector" + +if [[ ! -x "$PROGRAM" ]]; then + echo "Program $PROGRAM not found or is not executable." + exit 1 +fi + +for dir in "$BASE_DIR"/*/; do + if [ -d "$dir" ]; then + echo "Entering directory: $dir" + cd "$dir" + $PROGRAM "$dir" + echo "command: $PROGRAM $dir" + echo "finished" + cd .. + fi +done \ No newline at end of file diff --git a/tools/verifyio/scripts/auto_verify.sh b/tools/verifyio/scripts/auto_verify.sh new file mode 100644 index 0000000..d91d8cc --- /dev/null +++ b/tools/verifyio/scripts/auto_verify.sh @@ -0,0 +1,31 @@ +#!/bin/bash + + +#SBATCH -N 1 +#SBATCH -J jobname +#SBATCH -t 01:00:00 +#SBATCH -p partition +#SBATCH -o /path/to/out.out + + +if [ -z "$1" ]; then + echo "Usage: $0 " + exit 1 +fi + +BASE_DIR="$1" +PROGRAM="/path/to/verifyio.py" +SEMANTICS=("POSIX" "MPI-IO" "Commit" "Session") + +for semantic in "${SEMANTICS[@]}"; do + for dir in "$BASE_DIR"/*/; do + if [ -d "$dir" ]; then + echo "Entering directory: $dir" + ( + cd "$dir" || exit + python "$PROGRAM" "$dir" "--semantics=$semantic" + ) + echo "Finished --semantic=$semantic" + fi + done +done diff --git a/tools/verifyio/scripts/verifyio_plot_violation_heatmap.py b/tools/verifyio/scripts/verifyio_plot_violation_heatmap.py new file mode 100644 index 0000000..f126738 --- /dev/null +++ b/tools/verifyio/scripts/verifyio_plot_violation_heatmap.py @@ -0,0 +1,239 @@ +import argparse +import pandas as pd +import seaborn as sns +import matplotlib.pyplot as plt +import os +import matplotlib.gridspec as gridspec +from matplotlib.colors import LogNorm +import numpy as np + +def remove_after_last_dash(column_name): + parts = column_name.rsplit('-', 1) + return parts[0] + + +def plot_multi_heat_map(file_paths): + + plt.rcParams.update({ + "text.usetex": True, + "font.family": "Times Roman" + }) + title_fontsize = 14 + label_fontsize = 14 + tick_fontsize = 10 + + fig = plt.figure(figsize=(12, 12)) + gs = gridspec.GridSpec(nrows=2, ncols=2, width_ratios=[1, 1]) # 2 rows, 2 columns with different widths + + # Define subplot positions + ax1 = plt.subplot(gs[0, 0]) # Top left + ax2 = plt.subplot(gs[1, 0]) # Bottom left + ax3 = plt.subplot(gs[:, 1]) # Right column spanning both rows + + axes = [ax1, ax2, ax3] + + # Step 1: Calculate the global maximum value (vmax) across all datasets + vmax = -np.inf + for file_path in file_paths: + df = pd.read_csv(file_path) + heatmap_data = df.set_index('directory_name')[[ + 'total_semantic_violation_POSIX', + 'total_semantic_violation_Commit', + 'total_semantic_violation_Session', + 'total_semantic_violation_MPI-IO', + ]] + vmax = max(vmax, heatmap_data.max().max()) # Find the global max across all files + + # Step 2: Generate heatmaps using the global vmax + for i, file_path in enumerate(file_paths): + file_name = os.path.basename(file_path).replace('.csv', '') + df = pd.read_csv(file_path) + + heatmap_data = df.set_index('directory_name')[[ + 'total_semantic_violation_POSIX', + 'total_semantic_violation_Commit', + 'total_semantic_violation_Session', + 'total_semantic_violation_MPI-IO', + ]] + heatmap_data.index = heatmap_data.index.map(remove_after_last_dash) + simplified_columns = ['POSIX', 'Commit', 'Session', 'MPI-IO'] + heatmap_data.columns = simplified_columns + + # Fill missing data + heatmap_data.fillna(-1, inplace=True) + + # Mask zeros for separate color mapping + masked_data = heatmap_data.copy() + masked_data.replace(0, np.nan, inplace=True) # Mask zeros for the heatmap + grey_mask = heatmap_data == -1 + + ax = axes[i] + + cmap = sns.color_palette("gist_heat_r", as_cmap=True) + cmap.set_bad('green') + + # Step 3: Use the global vmax for consistent coloring across subplots + sns.heatmap( + heatmap_data, + annot=True, + fmt='g', + cmap=cmap, + linewidths=0.5, + norm=LogNorm(vmin=1, vmax=vmax), # Use global vmax here + cbar=False, + annot_kws={'size': 10, 'weight': 'bold', 'color': 'white'}, + mask=masked_data.isna(), + ax=ax + ) + + for t in ax.texts: + if t.get_text() == '0': + t.set_text('') + if t.get_text() == '-1': + t.set_text('') + + + # Grey mask for missing data + sns.heatmap( + grey_mask, + annot=False, + cmap=sns.color_palette(['grey']), + linewidths=0.5, + cbar=False, # No color bar for grey mask + mask=~grey_mask, # Mask everything except -1 values + ax=ax + ) + + ax.set_title(f'{file_name}', fontsize=title_fontsize) + ax.set_xlabel('consistency semantics', fontsize=label_fontsize) + ax.set_ylabel('test case', fontsize=label_fontsize) + ax.tick_params(axis='x', rotation=0, labelsize=tick_fontsize) + ax.tick_params(axis='y', labelsize=tick_fontsize) + ax.set_aspect('auto') + + # Create a single color bar for all subplots + cbar_ax = fig.add_axes([0.92, 0.15, 0.02, 0.7]) # Adjust color bar position + sm = plt.cm.ScalarMappable(cmap=sns.color_palette("gist_heat_r", as_cmap=True), norm=LogNorm(vmin=1, vmax=vmax)) + fig.colorbar(sm, cax=cbar_ax, label='number of data races') + + plt.tight_layout(rect=[0, 0, 0.9, 1]) + plt.savefig('heatmap.png', dpi=800, bbox_inches='tight') + + +def plot_single_heat_map(file_path): + def remove_after_last_dash(text): + """Helper function to process the directory names.""" + return text.rsplit('-', 1)[0] + + plt.rcParams.update({ + "text.usetex": True, + "font.family": "Times Roman" + }) + title_fontsize = 14 + label_fontsize = 14 + tick_fontsize = 10 + + # Read data + df = pd.read_csv(file_path) + heatmap_data = df.set_index('directory_name')[[ + 'total_semantic_violation_POSIX', + 'total_semantic_violation_Commit', + 'total_semantic_violation_Session', + 'total_semantic_violation_MPI-IO', + ]] + + heatmap_data.index = heatmap_data.index.map(remove_after_last_dash) + simplified_columns = ['POSIX', 'Commit', 'Session', 'MPI-IO'] + heatmap_data.columns = simplified_columns + + # Fill missing data + heatmap_data.fillna(-1, inplace=True) + + # Mask zeros for separate color mapping + masked_data = heatmap_data.copy() + masked_data.replace(0, np.nan, inplace=True) # Mask zeros for the heatmap + grey_mask = heatmap_data == -1 + + # Determine vmax + vmax = heatmap_data.max().max() + + # Plot heatmap + fig, ax = plt.subplots(figsize=(8, 6)) + cmap = sns.color_palette("gist_heat_r", as_cmap=True) + cmap.set_bad('green') + + sns.heatmap( + heatmap_data, + annot=True, + fmt='g', + cmap=cmap, + linewidths=0.5, + norm=LogNorm(vmin=1, vmax=vmax), + cbar=False, + annot_kws={'size': 10, 'weight': 'bold', 'color': 'white'}, + mask=masked_data.isna(), + ax=ax + ) + + for t in ax.texts: + if t.get_text() == '0': + t.set_text('') + if t.get_text() == '-1': + t.set_text('') + + # Grey mask for missing data + sns.heatmap( + grey_mask, + annot=False, + cmap=sns.color_palette(['grey']), + linewidths=0.5, + cbar=False, + mask=~grey_mask, + ax=ax + ) + + # Set labels and title + file_name = os.path.basename(file_path).replace('.csv', '') + ax.set_title(f'{file_name}', fontsize=title_fontsize) + ax.set_xlabel('consistency semantics', fontsize=label_fontsize) + ax.set_ylabel('test case', fontsize=label_fontsize) + ax.tick_params(axis='x', rotation=0, labelsize=tick_fontsize) + ax.tick_params(axis='y', labelsize=tick_fontsize) + + # Create a color bar + cbar = fig.colorbar( + plt.cm.ScalarMappable(cmap=cmap, norm=LogNorm(vmin=1, vmax=vmax)), + ax=ax, + fraction=0.046, + pad=0.04, + label='number of data races' + ) + + plt.tight_layout() + plt.savefig(f'{file_name}_heatmap.png', dpi=800, bbox_inches='tight') + plt.show() + + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Process csv file") + parser.add_argument( + "--file", + type=str, + default=None, + help="Path to a single CSV file." + ) + parser.add_argument( + "--files", + type=str, + nargs='+', + default=["/p/lustre3/zhu22/traces/HDF5.csv", + "/p/lustre3/zhu22/traces/NetCDF.csv", + "/p/lustre3/zhu22/traces/PnetCDF.csv"], + help="Paths to the CSV files (space-separated)." + ) + args = parser.parse_args() + if args.file: + plot_single_heat_map(args.file) + else: + plot_multi_heat_map(args.files) + \ No newline at end of file diff --git a/tools/verifyio/scripts/verifyio_to_csv.py b/tools/verifyio/scripts/verifyio_to_csv.py new file mode 100644 index 0000000..4c99bab --- /dev/null +++ b/tools/verifyio/scripts/verifyio_to_csv.py @@ -0,0 +1,103 @@ +import csv +import re +import argparse +import pandas as pd + + +def parser(path, dir_prefix=None): + with open(path, "r") as file: + log_data = file.read() + + data = [] + dir_regex = re.compile(r"Entering directory:\s+(.+)") + io_time_regex = re.compile(r"Step 1. read trace records and conflicts time:\s+([\d.]+)\s+secs") + mpi_calls_regex = re.compile(r"Step 2. match mpi calls:\s+([\d.]+)\s+secs, mpi edges:\s+(\d+)") + happens_before_regex = re.compile(r"Step 3. build happens-before graph:\s+([\d.]+)\s+secs, nodes:\s+(\d+)") + run_the_algorithm_regex = re.compile(r"Step 4. run vector clock algorithm:\s+([\d.]+)\s+secs") + verification_regex = re.compile(r"semantics verification time:\s+(\d+)") + semantic_violation_regex = re.compile(r"Total semantic violations:\s+(\d+)") + conflict_pairs_regex = re.compile(r"Total conflict pairs:\s+(\d+)") + ap_regex = re.compile(r"Finished\s+--semantic=(POSIX|MPI-IO|Commit|Session)") + + lines = log_data.strip().split("\n") + entry = {} + for line in lines: + line = line.strip() + if dir_match := dir_regex.match(line): + if entry: + data.append(entry) + match_str = dir_match.group(1) + if dir_prefix is None: + entry = {"directory_name": match_str} + else: + split_str = match_str.split(dir_prefix) + if len(split_str) > 1: + entry = {"directory_name": split_str[1]} + else: + entry = {"directory_name": None} + elif io_time_match := io_time_regex.search(line): + entry["io_time"] = io_time_match.group(1) + elif mpi_calls_match := mpi_calls_regex.search(line): + entry["match_mpi_calls"] = mpi_calls_match.group(1) + entry["mpi_edges"] = mpi_calls_match.group(2) + elif happens_before_match := happens_before_regex.search(line): + entry["build_happens-before_graph"] = happens_before_match.group(1) + entry["nodes"] = happens_before_match.group(2) + elif run_the_algorithm_match := run_the_algorithm_regex.search(line): + entry["run_the_algorithm"] = run_the_algorithm_match.group(1) + elif verification_match := verification_regex.search(line): + entry["verification_time"]= verification_match.group(1) + elif semantic_violation_match := semantic_violation_regex.search(line): + entry["total_semantic_violation"] = semantic_violation_match.group(1) + elif conflict_pairs_match := conflict_pairs_regex.search(line): + entry["total_conflict_pairs"] = conflict_pairs_match.group(1) + elif ap_match := ap_regex.search(line): + entry["api"] = ap_match.group(1) + if entry: + data.append(entry) + + return data + + +def to_api_format(data): + df = pd.DataFrame(data, columns=[ + 'directory_name', 'io_time', 'match_mpi_calls', 'mpi_edges', 'build_happens-before_graph', 'nodes', + 'run_the_algorithm', 'verification_time', 'total_semantic_violation', 'total_conflict_pairs', 'api' + ]) + df_no_conflicts = df.drop(columns='total_conflict_pairs') + df_pivot = df_no_conflicts.pivot(index='directory_name', columns='api') + df_pivot.columns = ['_'.join(col).strip() for col in df_pivot.columns.values] + df_pivot = df_pivot.reset_index() + df_total_conflicts = df[['directory_name', 'total_conflict_pairs']].drop_duplicates() + df_pivot = pd.merge(df_pivot, df_total_conflicts, on='directory_name') + return df_pivot + + + +def write_to_csv(data, csv_file): + with open(csv_file, "a", newline="") as file: + writer = csv.DictWriter(file, fieldnames=["directory_name", "io_time", "match_mpi_calls", "mpi_edges", "build_happens-before_graph", "nodes", "run_the_algorithm", "verification_time", "total_semantic_violation", "total_conflict_pairs", "api"]) + writer.writeheader() + writer.writerows(data) + + print(f"Data has been written to {csv_file}") + +def write_df_to_csv(df, csv_file): + df.to_csv(csv_file, index=False) + print(f"Data has been written to {csv_file}") + +if __name__ == '__main__': + arg_parser = argparse.ArgumentParser(description="Parse log file and export data to CSV") + arg_parser.add_argument("path", type=str, help="Path to the log file") + arg_parser.add_argument("csv_file", type=str, help="Name of the CSV file to save the data") + arg_parser.add_argument("--dir_prefix", type=str, help="Remove the directory prefix", required=False) + arg_parser.add_argument("--group_by_api", action="store_true", help="Group data by API", required=False) + + args = arg_parser.parse_args() + + parsed_data = parser(args.path, args.dir_prefix) + if args.group_by_api: + to_api_format(parsed_data) + write_df_to_csv(to_api_format(parsed_data), args.csv_file) + else: + write_to_csv(parsed_data, args.csv_file) \ No newline at end of file