refactor: compact duplicate detection output format and simplify CLI#40
refactor: compact duplicate detection output format and simplify CLI#40
Conversation
The terminal output is now CSV-compatible (comma-separated) and readable with color-coded fields, eliminating the need for a separate -o/--output option. Users can now pipe output directly to files: `spfm spotify find-duplicate-ids > file.csv` Changes: - Remove -o/--output argument from CLI - Remove output_file parameter from find_duplicate_ids() and find_duplicate_names() - Simplify output logic to only print to terminal - Duplicate detection output format: playlists,artists,track,... with colors Users can optionally strip ANSI codes if needed: `spfm spotify find-duplicate-ids | sed 's/\x1b\[[0-9;]*m//g' > file.csv` Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR removes the -o/--output option from the Spotify duplicate-detection CLI commands and changes duplicate output to be printed directly to the terminal (with ANSI coloring) rather than written to a file.
Changes:
- Removes
-o/--outputfrom the Spotify CLI parser and stops passingoutput_fileinto duplicate detection functions. - Updates
find_duplicate_ids()/find_duplicate_names()to print “CSV-like” colored rows to stdout and adjusts sorting/normalization. - Modifies the duplicate CSV writer schema for ID duplicates.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
spotfm/spotify/dupes.py |
Removes output_file params for duplicate detection, adds ANSI-colored terminal output, and adjusts playlist normalization/sorting. |
spotfm/cli.py |
Removes -o/--output from Spotify CLI args and updates duplicate command dispatch accordingly. |
Comments suppressed due to low confidence (2)
spotfm/cli.py:303
- Removing the global
-o/--outputargument fromspotify_parserwill break other Spotify commands that still referenceargs.output(e.g.,find-relinked-tracksandfind-tracksinspotify_cli), causing anAttributeErrorat runtime. Either keep-o/--outputon the Spotify parser (and ignore it for the duplicate commands), or refactor to per-command subparsers so only duplicate commands lose the option.
spotify_parser.add_argument(
"-p", "--playlists", nargs="+", help="Playlist ID(s) or name pattern(s) (use %% as wildcard for LIKE syntax)"
)
spotify_parser.add_argument("-f", "--file")
spotify_parser.add_argument("--start-date", help="Filter by album release start date (YYYY-MM-DD)")
spotify_parser.add_argument("--end-date", help="Filter by album release end date (YYYY-MM-DD)")
spotify_parser.add_argument("--genre", help="Filter by genre using a regex pattern")
spotify_parser.add_argument(
spotfm/spotify/dupes.py:125
- This is a breaking API change:
find_duplicate_idsno longer acceptsoutput_file, but the current test suite (e.g.,tests/test_dupes.py:228) and any external callers using this parameter will fail. If the CSV-output path is being removed, the tests and any callers should be updated accordingly, or consider keepingoutput_filetemporarily with a deprecation path.
def find_duplicate_ids(excluded_playlist_ids=None):
"""Find tracks that appear multiple times (exact ID match).
Args:
excluded_playlist_ids: List of playlist IDs to exclude
Returns:
List of dicts with duplicate track information
"""
if excluded_playlist_ids is None:
excluded_playlist_ids = []
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| spotify_parser.add_argument( | ||
| "-p", "--playlists", nargs="+", help="Playlist ID(s) or name pattern(s) (use %% as wildcard for LIKE syntax)" | ||
| ) | ||
| spotify_parser.add_argument("-f", "--file") | ||
| spotify_parser.add_argument("-o", "--output", help="Output CSV file path") | ||
| spotify_parser.add_argument("--start-date", help="Filter by album release start date (YYYY-MM-DD)") | ||
| spotify_parser.add_argument("--end-date", help="Filter by album release end date (YYYY-MM-DD)") |
There was a problem hiding this comment.
The CLI flag removal here changes user-facing behavior, but repo docs still reference -o for these commands (e.g., README.md:84 and SPEC.md:357 show spfm spotify find-duplicate-ids -o ...). Please update the documentation (and any changelog notes) to match the new CLI behavior.
| MAGENTA = "\033[35m" | ||
| BLUE = "\033[34m" |
There was a problem hiding this comment.
MAGENTA and BLUE are defined but unused in this module. Please remove unused color constants (or start using them) to avoid dead code and keep linting clean.
| MAGENTA = "\033[35m" | |
| BLUE = "\033[34m" |
| playlist_names = [pname for _pid, pname in track_info["playlists"]] | ||
| # Normalize playlists: sort them alphabetically so "A,B" and "B,A" both become "A,B" | ||
| normalized_playlists = ",".join(sorted(playlist_names)) | ||
| duplicates.append( | ||
| { | ||
| "type": "ID", | ||
| "track": track_info["full_name"], | ||
| "track_name": track_info["name"], | ||
| "artists": track_info["artists"], | ||
| "count": track_info["playlist_count"], | ||
| "playlists": ",".join(playlist_names), | ||
| "playlists": normalized_playlists, |
There was a problem hiding this comment.
The terminal output is not reliably CSV-parsable because fields are separated with commas while playlists is also a comma-joined list, and playlist/artist/track names can themselves contain commas/newlines. Consider emitting CSV via csv.writer (with quoting) to stdout, or using a delimiter that cannot appear in the data (consistent with the existing ;-delimited CSV writer).
| if dup["artists"]: | ||
| print(f"{playlists_str},{artists_str},{track_str}") | ||
| else: | ||
| print(f"{playlists_str},{track_str}") |
There was a problem hiding this comment.
This output emits a different number of comma-separated fields depending on whether artists is empty (2 fields vs 3), which makes downstream parsing brittle. For CSV-compatible output, always emit the same columns (e.g., print an empty artists field when missing).
| if dup["artists"]: | |
| print(f"{playlists_str},{artists_str},{track_str}") | |
| else: | |
| print(f"{playlists_str},{track_str}") | |
| print(f"{playlists_str},{artists_str},{track_str}") |
| # Output results to terminal | ||
| for dup in duplicates: | ||
| playlists_str = f"{CYAN}{dup['playlists']}{RESET}" | ||
| artists_str = f"{GREEN}{dup['artists']}{RESET}" if dup["artists"] else "" | ||
| track_str = f"{YELLOW}{dup['track_name']}{RESET}" | ||
| if dup["artists"]: | ||
| print(f"{playlists_str},{artists_str},{track_str}") |
There was a problem hiding this comment.
ANSI color codes are always included in the printed output. When users pipe to a file, this produces non-CSV content by default and contradicts the “CSV-compatible” claim. Consider enabling colors only when sys.stdout.isatty() (and/or honoring NO_COLOR) so piped output is plain CSV.
| playlist1_names = [pname for _pid, pname in track1["playlists"]] | ||
| playlist2_names = [pname for _pid, pname in track2["playlists"]] | ||
|
|
||
| playlists1_str = ",".join(playlist1_names) | ||
| playlists2_str = ",".join(playlist2_names) | ||
|
|
||
| # Normalize playlist pairs: ensure they're always in alphabetical order | ||
| if playlists1_str > playlists2_str: | ||
| # Swap if needed to maintain alphabetical order | ||
| playlists1_str, playlists2_str = playlists2_str, playlists1_str | ||
| # Also swap artists and tracks to maintain correspondence |
There was a problem hiding this comment.
playlists1_str/playlists2_str are built from the raw playlist order coming from SQL GROUP_CONCAT (no ORDER BY), so the joined strings can be nondeterministic and the if playlists1_str > playlists2_str swap can flip between runs for the same pair. Sort playlist*_names before joining (and ideally compare normalized lists) to make output stable and normalization correct.
| playlist1_names = [pname for _pid, pname in track1["playlists"]] | ||
| playlist2_names = [pname for _pid, pname in track2["playlists"]] | ||
|
|
||
| playlists1_str = ",".join(playlist1_names) | ||
| playlists2_str = ",".join(playlist2_names) | ||
|
|
||
| # Normalize playlist pairs: ensure they're always in alphabetical order | ||
| if playlists1_str > playlists2_str: | ||
| # Swap if needed to maintain alphabetical order | ||
| playlists1_str, playlists2_str = playlists2_str, playlists1_str | ||
| # Also swap artists and tracks to maintain correspondence |
There was a problem hiding this comment.
Same determinism issue as in pass 1: playlists1_str/playlists2_str are joined from an undefined playlist ordering, so the normalization swap can vary between runs. Sort playlist names (or IDs) before joining to make the output stable and the alphabetical normalization meaningful.
| # Output results to terminal | ||
| for dup in duplicates: | ||
| # Pair 1 in Cyan | ||
| playlists1_str = f"{CYAN}{dup['playlists1']}{RESET}" | ||
| artists1_str = f"{CYAN}{dup['artists1']}{RESET}" | ||
| track1_str = f"{CYAN}{dup['track1']}{RESET}" | ||
|
|
||
| # Pair 2 in Green | ||
| playlists2_str = f"{GREEN}{dup['playlists2']}{RESET}" | ||
| artists2_str = f"{GREEN}{dup['artists2']}{RESET}" | ||
| track2_str = f"{GREEN}{dup['track2']}{RESET}" | ||
|
|
||
| # Score in Yellow (2 decimal places) | ||
| score_str = f"{YELLOW}{dup['score']:.2f}{RESET}" | ||
|
|
||
| print(f"{playlists1_str},{playlists2_str},{artists1_str},{artists2_str},{track1_str},{track2_str},{score_str}") |
There was a problem hiding this comment.
The printed output here uses commas as separators but does not quote/escape fields; artists* values are comma-separated (and track/playlist names can contain commas), so consumers cannot reliably parse this as CSV. If CSV compatibility is desired, write via csv.writer to stdout with proper quoting (and optionally disable ANSI when not a TTY).
| with open(output_path, "w", newline="") as csvfile: | ||
| fieldnames = ["Type", "Track", "Count", "Playlists"] | ||
| fieldnames = ["Playlists", "Artists", "Track", "Count"] | ||
| writer = csv.DictWriter(csvfile, fieldnames=fieldnames, delimiter=";") | ||
| writer.writeheader() | ||
|
|
||
| for dup in duplicates: | ||
| writer.writerow( | ||
| { | ||
| "Type": dup["type"], | ||
| "Track": dup["track"], | ||
| "Count": dup["count"], | ||
| "Playlists": dup["playlists"], | ||
| "Artists": dup["artists"], | ||
| "Track": dup["track_name"], | ||
| "Count": dup["count"], |
There was a problem hiding this comment.
write_duplicates_csv now expects each duplicate dict to contain artists and track_name keys and no longer writes the previous Type/Track columns. This will break existing callers (and current tests in tests/test_dupes.py that pass the older shape). Either keep backward compatibility (accept both key shapes / preserve old columns) or update all callers/tests that rely on the old CSV schema.
| @@ -359,7 +376,6 @@ def find_duplicate_names(excluded_playlist_ids=None, output_file=None, threshold | |||
|
|
|||
| Args: | |||
| excluded_playlist_ids: List of playlist IDs to exclude | |||
| output_file: Path to CSV output file (optional) | |||
| threshold: Minimum similarity score (0-100) to consider a duplicate | |||
|
|
|||
| Returns: | |||
There was a problem hiding this comment.
This signature change removes output_file, but the existing tests (e.g., tests/test_dupes.py:501) and any external callers passing output_file will break. If file output is being removed, the tests/callers should be updated; otherwise consider keeping output_file with a deprecation plan.
- Restore -o/--output option for other Spotify commands (find-relinked-tracks, find-tracks) - Update docstrings for find_duplicate_ids/names to document CSV-compatible output format - Remove tests for dead CSV writing functions - All tests passing Copilot identified: other commands still referenced args.output, and tests called functions with removed output_file parameter. Kept -o option on global parser but deprecated for duplicate commands. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Update README with piping examples instead of -o option - Update SPEC.md with CSV-compatible format details and pipe examples - Add CHANGELOG entry documenting output format improvements - Clarify that -o option is only for find-tracks and find-relinked-tracks Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Core fixes for CSV output issues identified in Copilot review: - Remove unused MAGENTA/BLUE color constants (dead code cleanup) - Use csv.writer with proper quoting instead of naive comma separation - Handles special chars (commas, quotes, newlines) in playlist/artist/track names - Fields are always quoted when needed - Check sys.stdout.isatty() before adding ANSI codes - TTY output: includes color codes (as before) - Piped output: plain CSV without ANSI codes - Sort playlist names before joining to fix determinism issue - GROUP_CONCAT returns undefined order from SQL - Sorting ensures "A,B" and "B,A" always normalize to same output - Fixes swap logic in playlist pair normalization - Always emit same number of columns in find_duplicate_ids - Artists field always included (empty if no artists) - Prevents brittle downstream parsing Fixes identified Copilot comments: - Remove unused color constants - Make output truly CSV-compliant with proper quoting - Only include ANSI codes for TTY output - Fix nondeterministic playlist ordering in both passes - Ensure consistent column count in all output rows Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Add implementation details, parsing examples, and development guidance: SPEC.md: - Document two-pass algorithm (prefix-based + cross-prefix) - Explain playlist pair normalization and determinism - Clarify CSV output quoting behavior and TTY detection README.md: - Add "Parsing Duplicate Detection Output" section - Show how to strip ANSI codes with sed for clean CSV - Document column reference for both commands - Note on field quoting for special characters CONTRIBUTING.md: - Add "Working with Duplicate Detection (dupes.py)" section - Document four invariants: playlist ordering, CSV encoding, ANSI codes, normalization - Explain why each matters and how to maintain them - Prevent future regressions with clear guidance Addresses feedback from PR review on CSV determinism and output format. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Summary
Simplifies duplicate detection commands by removing separate CSV output functionality and optimizing terminal output format. Output is now comma-separated (CSV-compatible) with color-coded fields and truncated precision, improving both readability and line length in terminals.
Changes
Output Format:
-to,(saves ~14 chars/line)91.67instead of91.66666666666666)ratio_typefrom find-duplicate-names output (unnecessary verbosity)CLI:
-o/--outputon global parser for other commands (find-tracks, find-relinked-tracks)Code:
Testing:
Usage
Resolved Issues
✅ Fixed Copilot review: restored -o option for other commands
✅ Fixed breaking changes: updated tests, restored compatibility
✅ Improved output readability: better use of screen real estate with colors
🤖 Generated with Claude Code