refactor: compact duplicate detection output format and simplify CLI by jmlrt · Pull Request #40 · jmlrt/spotfm

jmlrt · 2026-04-06T10:08:24Z

Summary

Simplifies duplicate detection commands by removing separate CSV output functionality and optimizing terminal output format. Output is now comma-separated (CSV-compatible) with color-coded fields and truncated precision, improving both readability and line length in terminals.

Changes

Output Format:

Change separator from - to , (saves ~14 chars/line)
Truncate similarity scores to 2 decimals (e.g., 91.67 instead of 91.66666666666666)
Remove ratio_type from find-duplicate-names output (unnecessary verbosity)
Add ANSI color codes for terminal readability

CLI:

Keep -o/--output on global parser for other commands (find-tracks, find-relinked-tracks)
Remove output_file parameter from find_duplicate_ids() and find_duplicate_names()
Deprecate CSV output for duplicate detection (pipe to files instead)

Code:

Update docstrings to document CSV-compatible output format
Remove unused write_duplicates_csv() and write_similarity_csv() functions
Remove tests for CSV output (dead code)
Fix pre-commit linting issues

Testing:

All 38 tests passing
Tested both commands work with new output format

Usage

# View colored output in terminal
spfm spotify find-duplicate-ids
spfm spotify find-duplicate-names

# Save to file (includes ANSI codes)
spfm spotify find-duplicate-ids > file.csv

# Strip ANSI codes for clean CSV
spfm spotify find-duplicate-ids | sed 's/\x1b\[[0-9;]*m//g' > file.csv

Resolved Issues

✅ Fixed Copilot review: restored -o option for other commands
✅ Fixed breaking changes: updated tests, restored compatibility
✅ Improved output readability: better use of screen real estate with colors

🤖 Generated with Claude Code

The terminal output is now CSV-compatible (comma-separated) and readable with color-coded fields, eliminating the need for a separate -o/--output option. Users can now pipe output directly to files: `spfm spotify find-duplicate-ids > file.csv` Changes: - Remove -o/--output argument from CLI - Remove output_file parameter from find_duplicate_ids() and find_duplicate_names() - Simplify output logic to only print to terminal - Duplicate detection output format: playlists,artists,track,... with colors Users can optionally strip ANSI codes if needed: `spfm spotify find-duplicate-ids | sed 's/\x1b\[[0-9;]*m//g' > file.csv` Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

This PR removes the -o/--output option from the Spotify duplicate-detection CLI commands and changes duplicate output to be printed directly to the terminal (with ANSI coloring) rather than written to a file.

Changes:

Removes -o/--output from the Spotify CLI parser and stops passing output_file into duplicate detection functions.
Updates find_duplicate_ids() / find_duplicate_names() to print “CSV-like” colored rows to stdout and adjusts sorting/normalization.
Modifies the duplicate CSV writer schema for ID duplicates.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File	Description
`spotfm/spotify/dupes.py`	Removes `output_file` params for duplicate detection, adds ANSI-colored terminal output, and adjusts playlist normalization/sorting.
`spotfm/cli.py`	Removes `-o/--output` from Spotify CLI args and updates duplicate command dispatch accordingly.

Comments suppressed due to low confidence (2)

spotfm/cli.py:303

Removing the global -o/--output argument from spotify_parser will break other Spotify commands that still reference args.output (e.g., find-relinked-tracks and find-tracks in spotify_cli), causing an AttributeError at runtime. Either keep -o/--output on the Spotify parser (and ignore it for the duplicate commands), or refactor to per-command subparsers so only duplicate commands lose the option.

    spotify_parser.add_argument(
        "-p", "--playlists", nargs="+", help="Playlist ID(s) or name pattern(s) (use %% as wildcard for LIKE syntax)"
    )
    spotify_parser.add_argument("-f", "--file")
    spotify_parser.add_argument("--start-date", help="Filter by album release start date (YYYY-MM-DD)")
    spotify_parser.add_argument("--end-date", help="Filter by album release end date (YYYY-MM-DD)")
    spotify_parser.add_argument("--genre", help="Filter by genre using a regex pattern")
    spotify_parser.add_argument(

spotfm/spotify/dupes.py:125

This is a breaking API change: find_duplicate_ids no longer accepts output_file, but the current test suite (e.g., tests/test_dupes.py:228) and any external callers using this parameter will fail. If the CSV-output path is being removed, the tests and any callers should be updated accordingly, or consider keeping output_file temporarily with a deprecation path.

def find_duplicate_ids(excluded_playlist_ids=None):
    """Find tracks that appear multiple times (exact ID match).

    Args:
        excluded_playlist_ids: List of playlist IDs to exclude

    Returns:
        List of dicts with duplicate track information
    """
    if excluded_playlist_ids is None:
        excluded_playlist_ids = []

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-06T10:13:36Z

    spotify_parser.add_argument(
        "-p", "--playlists", nargs="+", help="Playlist ID(s) or name pattern(s) (use %% as wildcard for LIKE syntax)"
    )
    spotify_parser.add_argument("-f", "--file")
-    spotify_parser.add_argument("-o", "--output", help="Output CSV file path")
    spotify_parser.add_argument("--start-date", help="Filter by album release start date (YYYY-MM-DD)")
    spotify_parser.add_argument("--end-date", help="Filter by album release end date (YYYY-MM-DD)")


The CLI flag removal here changes user-facing behavior, but repo docs still reference -o for these commands (e.g., README.md:84 and SPEC.md:357 show spfm spotify find-duplicate-ids -o ...). Please update the documentation (and any changelog notes) to match the new CLI behavior.

Copilot · 2026-04-06T10:13:36Z

+MAGENTA = "\033[35m"
+BLUE = "\033[34m"


MAGENTA and BLUE are defined but unused in this module. Please remove unused color constants (or start using them) to avoid dead code and keep linting clean.

Suggested change

MAGENTA = "\033[35m"

BLUE = "\033[34m"

Copilot · 2026-04-06T10:13:37Z

+            playlist_names = [pname for _pid, pname in track_info["playlists"]]
+            # Normalize playlists: sort them alphabetically so "A,B" and "B,A" both become "A,B"
+            normalized_playlists = ",".join(sorted(playlist_names))
            duplicates.append(
                {
                    "type": "ID",
                    "track": track_info["full_name"],
+                    "track_name": track_info["name"],
+                    "artists": track_info["artists"],
                    "count": track_info["playlist_count"],
-                    "playlists": ",".join(playlist_names),
+                    "playlists": normalized_playlists,


The terminal output is not reliably CSV-parsable because fields are separated with commas while playlists is also a comma-joined list, and playlist/artist/track names can themselves contain commas/newlines. Consider emitting CSV via csv.writer (with quoting) to stdout, or using a delimiter that cannot appear in the data (consistent with the existing ;-delimited CSV writer).

Copilot · 2026-04-06T10:13:37Z

+        if dup["artists"]:
+            print(f"{playlists_str},{artists_str},{track_str}")
+        else:
+            print(f"{playlists_str},{track_str}")


This output emits a different number of comma-separated fields depending on whether artists is empty (2 fields vs 3), which makes downstream parsing brittle. For CSV-compatible output, always emit the same columns (e.g., print an empty artists field when missing).

Suggested change

if dup["artists"]:

print(f"{playlists_str},{artists_str},{track_str}")

else:

print(f"{playlists_str},{track_str}")

print(f"{playlists_str},{artists_str},{track_str}")

Copilot · 2026-04-06T10:13:37Z

+    # Output results to terminal
+    for dup in duplicates:
+        playlists_str = f"{CYAN}{dup['playlists']}{RESET}"
+        artists_str = f"{GREEN}{dup['artists']}{RESET}" if dup["artists"] else ""
+        track_str = f"{YELLOW}{dup['track_name']}{RESET}"
+        if dup["artists"]:
+            print(f"{playlists_str},{artists_str},{track_str}")


ANSI color codes are always included in the printed output. When users pipe to a file, this produces non-CSV content by default and contradicts the “CSV-compatible” claim. Consider enabling colors only when sys.stdout.isatty() (and/or honoring NO_COLOR) so piped output is plain CSV.

Copilot · 2026-04-06T10:13:37Z

+                playlist1_names = [pname for _pid, pname in track1["playlists"]]
+                playlist2_names = [pname for _pid, pname in track2["playlists"]]
+
+                playlists1_str = ",".join(playlist1_names)
+                playlists2_str = ",".join(playlist2_names)
+
+                # Normalize playlist pairs: ensure they're always in alphabetical order
+                if playlists1_str > playlists2_str:
+                    # Swap if needed to maintain alphabetical order
+                    playlists1_str, playlists2_str = playlists2_str, playlists1_str
+                    # Also swap artists and tracks to maintain correspondence


playlists1_str/playlists2_str are built from the raw playlist order coming from SQL GROUP_CONCAT (no ORDER BY), so the joined strings can be nondeterministic and the if playlists1_str > playlists2_str swap can flip between runs for the same pair. Sort playlist*_names before joining (and ideally compare normalized lists) to make output stable and normalization correct.

Copilot · 2026-04-06T10:13:38Z

+        playlist1_names = [pname for _pid, pname in track1["playlists"]]
+        playlist2_names = [pname for _pid, pname in track2["playlists"]]
+
+        playlists1_str = ",".join(playlist1_names)
+        playlists2_str = ",".join(playlist2_names)
+
+        # Normalize playlist pairs: ensure they're always in alphabetical order
+        if playlists1_str > playlists2_str:
+            # Swap if needed to maintain alphabetical order
+            playlists1_str, playlists2_str = playlists2_str, playlists1_str
+            # Also swap artists and tracks to maintain correspondence


Same determinism issue as in pass 1: playlists1_str/playlists2_str are joined from an undefined playlist ordering, so the normalization swap can vary between runs. Sort playlist names (or IDs) before joining to make the output stable and the alphabetical normalization meaningful.

Copilot · 2026-04-06T10:13:38Z

+    # Output results to terminal
+    for dup in duplicates:
+        # Pair 1 in Cyan
+        playlists1_str = f"{CYAN}{dup['playlists1']}{RESET}"
+        artists1_str = f"{CYAN}{dup['artists1']}{RESET}"
+        track1_str = f"{CYAN}{dup['track1']}{RESET}"
+
+        # Pair 2 in Green
+        playlists2_str = f"{GREEN}{dup['playlists2']}{RESET}"
+        artists2_str = f"{GREEN}{dup['artists2']}{RESET}"
+        track2_str = f"{GREEN}{dup['track2']}{RESET}"
+
+        # Score in Yellow (2 decimal places)
+        score_str = f"{YELLOW}{dup['score']:.2f}{RESET}"
+
+        print(f"{playlists1_str},{playlists2_str},{artists1_str},{artists2_str},{track1_str},{track2_str},{score_str}")


The printed output here uses commas as separators but does not quote/escape fields; artists* values are comma-separated (and track/playlist names can contain commas), so consumers cannot reliably parse this as CSV. If CSV compatibility is desired, write via csv.writer to stdout with proper quoting (and optionally disable ANSI when not a TTY).

Copilot · 2026-04-06T10:13:38Z

    with open(output_path, "w", newline="") as csvfile:
-        fieldnames = ["Type", "Track", "Count", "Playlists"]
+        fieldnames = ["Playlists", "Artists", "Track", "Count"]
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames, delimiter=";")
        writer.writeheader()

        for dup in duplicates:
            writer.writerow(
                {
-                    "Type": dup["type"],
-                    "Track": dup["track"],
-                    "Count": dup["count"],
                    "Playlists": dup["playlists"],
+                    "Artists": dup["artists"],
+                    "Track": dup["track_name"],
+                    "Count": dup["count"],


write_duplicates_csv now expects each duplicate dict to contain artists and track_name keys and no longer writes the previous Type/Track columns. This will break existing callers (and current tests in tests/test_dupes.py that pass the older shape). Either keep backward compatibility (accept both key shapes / preserve old columns) or update all callers/tests that rely on the old CSV schema.

Copilot · 2026-04-06T10:13:38Z

@@ -359,7 +376,6 @@ def find_duplicate_names(excluded_playlist_ids=None, output_file=None, threshold

    Args:
        excluded_playlist_ids: List of playlist IDs to exclude
-        output_file: Path to CSV output file (optional)
        threshold: Minimum similarity score (0-100) to consider a duplicate

    Returns:


This signature change removes output_file, but the existing tests (e.g., tests/test_dupes.py:501) and any external callers passing output_file will break. If file output is being removed, the tests/callers should be updated; otherwise consider keeping output_file with a deprecation plan.

- Restore -o/--output option for other Spotify commands (find-relinked-tracks, find-tracks) - Update docstrings for find_duplicate_ids/names to document CSV-compatible output format - Remove tests for dead CSV writing functions - All tests passing Copilot identified: other commands still referenced args.output, and tests called functions with removed output_file parameter. Kept -o option on global parser but deprecated for duplicate commands. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

- Update README with piping examples instead of -o option - Update SPEC.md with CSV-compatible format details and pipe examples - Add CHANGELOG entry documenting output format improvements - Clarify that -o option is only for find-tracks and find-relinked-tracks Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Core fixes for CSV output issues identified in Copilot review: - Remove unused MAGENTA/BLUE color constants (dead code cleanup) - Use csv.writer with proper quoting instead of naive comma separation - Handles special chars (commas, quotes, newlines) in playlist/artist/track names - Fields are always quoted when needed - Check sys.stdout.isatty() before adding ANSI codes - TTY output: includes color codes (as before) - Piped output: plain CSV without ANSI codes - Sort playlist names before joining to fix determinism issue - GROUP_CONCAT returns undefined order from SQL - Sorting ensures "A,B" and "B,A" always normalize to same output - Fixes swap logic in playlist pair normalization - Always emit same number of columns in find_duplicate_ids - Artists field always included (empty if no artists) - Prevents brittle downstream parsing Fixes identified Copilot comments: - Remove unused color constants - Make output truly CSV-compliant with proper quoting - Only include ANSI codes for TTY output - Fix nondeterministic playlist ordering in both passes - Ensure consistent column count in all output rows Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Add implementation details, parsing examples, and development guidance: SPEC.md: - Document two-pass algorithm (prefix-based + cross-prefix) - Explain playlist pair normalization and determinism - Clarify CSV output quoting behavior and TTY detection README.md: - Add "Parsing Duplicate Detection Output" section - Show how to strip ANSI codes with sed for clean CSV - Document column reference for both commands - Note on field quoting for special characters CONTRIBUTING.md: - Add "Working with Duplicate Detection (dupes.py)" section - Document four invariants: playlist ordering, CSV encoding, ANSI codes, normalization - Explain why each matters and how to maintain them - Prevent future regressions with clear guidance Addresses feedback from PR review on CSV determinism and output format. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 6, 2026 10:08

Copilot started reviewing on behalf of jmlrt April 6, 2026 10:09 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

jmlrt changed the title ~~Remove -o option from duplicate detection commands~~ refactor: compact duplicate detection output format and simplify CLI Apr 6, 2026

jmlrt and others added 3 commits April 6, 2026 12:22

jmlrt merged commit 9497c5d into main Apr 10, 2026
4 checks passed

jmlrt deleted the feat/remove-duplicate-output-option branch April 10, 2026 05:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: compact duplicate detection output format and simplify CLI#40

refactor: compact duplicate detection output format and simplify CLI#40
jmlrt merged 5 commits intomainfrom
feat/remove-duplicate-output-option

jmlrt commented Apr 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jmlrt commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Usage

Resolved Issues

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jmlrt commented Apr 6, 2026 •

edited

Loading