Skip to content

refactor: compact duplicate detection output format and simplify CLI#40

Merged
jmlrt merged 5 commits intomainfrom
feat/remove-duplicate-output-option
Apr 10, 2026
Merged

refactor: compact duplicate detection output format and simplify CLI#40
jmlrt merged 5 commits intomainfrom
feat/remove-duplicate-output-option

Conversation

@jmlrt
Copy link
Copy Markdown
Owner

@jmlrt jmlrt commented Apr 6, 2026

Summary

Simplifies duplicate detection commands by removing separate CSV output functionality and optimizing terminal output format. Output is now comma-separated (CSV-compatible) with color-coded fields and truncated precision, improving both readability and line length in terminals.

Changes

Output Format:

  • Change separator from - to , (saves ~14 chars/line)
  • Truncate similarity scores to 2 decimals (e.g., 91.67 instead of 91.66666666666666)
  • Remove ratio_type from find-duplicate-names output (unnecessary verbosity)
  • Add ANSI color codes for terminal readability

CLI:

  • Keep -o/--output on global parser for other commands (find-tracks, find-relinked-tracks)
  • Remove output_file parameter from find_duplicate_ids() and find_duplicate_names()
  • Deprecate CSV output for duplicate detection (pipe to files instead)

Code:

  • Update docstrings to document CSV-compatible output format
  • Remove unused write_duplicates_csv() and write_similarity_csv() functions
  • Remove tests for CSV output (dead code)
  • Fix pre-commit linting issues

Testing:

  • All 38 tests passing
  • Tested both commands work with new output format

Usage

# View colored output in terminal
spfm spotify find-duplicate-ids
spfm spotify find-duplicate-names

# Save to file (includes ANSI codes)
spfm spotify find-duplicate-ids > file.csv

# Strip ANSI codes for clean CSV
spfm spotify find-duplicate-ids | sed 's/\x1b\[[0-9;]*m//g' > file.csv

Resolved Issues

✅ Fixed Copilot review: restored -o option for other commands
✅ Fixed breaking changes: updated tests, restored compatibility
✅ Improved output readability: better use of screen real estate with colors

🤖 Generated with Claude Code

The terminal output is now CSV-compatible (comma-separated) and readable with
color-coded fields, eliminating the need for a separate -o/--output option.
Users can now pipe output directly to files: `spfm spotify find-duplicate-ids > file.csv`

Changes:
- Remove -o/--output argument from CLI
- Remove output_file parameter from find_duplicate_ids() and find_duplicate_names()
- Simplify output logic to only print to terminal
- Duplicate detection output format: playlists,artists,track,... with colors

Users can optionally strip ANSI codes if needed:
`spfm spotify find-duplicate-ids | sed 's/\x1b\[[0-9;]*m//g' > file.csv`

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 6, 2026 10:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the -o/--output option from the Spotify duplicate-detection CLI commands and changes duplicate output to be printed directly to the terminal (with ANSI coloring) rather than written to a file.

Changes:

  • Removes -o/--output from the Spotify CLI parser and stops passing output_file into duplicate detection functions.
  • Updates find_duplicate_ids() / find_duplicate_names() to print “CSV-like” colored rows to stdout and adjusts sorting/normalization.
  • Modifies the duplicate CSV writer schema for ID duplicates.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File Description
spotfm/spotify/dupes.py Removes output_file params for duplicate detection, adds ANSI-colored terminal output, and adjusts playlist normalization/sorting.
spotfm/cli.py Removes -o/--output from Spotify CLI args and updates duplicate command dispatch accordingly.
Comments suppressed due to low confidence (2)

spotfm/cli.py:303

  • Removing the global -o/--output argument from spotify_parser will break other Spotify commands that still reference args.output (e.g., find-relinked-tracks and find-tracks in spotify_cli), causing an AttributeError at runtime. Either keep -o/--output on the Spotify parser (and ignore it for the duplicate commands), or refactor to per-command subparsers so only duplicate commands lose the option.
    spotify_parser.add_argument(
        "-p", "--playlists", nargs="+", help="Playlist ID(s) or name pattern(s) (use %% as wildcard for LIKE syntax)"
    )
    spotify_parser.add_argument("-f", "--file")
    spotify_parser.add_argument("--start-date", help="Filter by album release start date (YYYY-MM-DD)")
    spotify_parser.add_argument("--end-date", help="Filter by album release end date (YYYY-MM-DD)")
    spotify_parser.add_argument("--genre", help="Filter by genre using a regex pattern")
    spotify_parser.add_argument(

spotfm/spotify/dupes.py:125

  • This is a breaking API change: find_duplicate_ids no longer accepts output_file, but the current test suite (e.g., tests/test_dupes.py:228) and any external callers using this parameter will fail. If the CSV-output path is being removed, the tests and any callers should be updated accordingly, or consider keeping output_file temporarily with a deprecation path.
def find_duplicate_ids(excluded_playlist_ids=None):
    """Find tracks that appear multiple times (exact ID match).

    Args:
        excluded_playlist_ids: List of playlist IDs to exclude

    Returns:
        List of dicts with duplicate track information
    """
    if excluded_playlist_ids is None:
        excluded_playlist_ids = []

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread spotfm/cli.py
Comment on lines 296 to 301
spotify_parser.add_argument(
"-p", "--playlists", nargs="+", help="Playlist ID(s) or name pattern(s) (use %% as wildcard for LIKE syntax)"
)
spotify_parser.add_argument("-f", "--file")
spotify_parser.add_argument("-o", "--output", help="Output CSV file path")
spotify_parser.add_argument("--start-date", help="Filter by album release start date (YYYY-MM-DD)")
spotify_parser.add_argument("--end-date", help="Filter by album release end date (YYYY-MM-DD)")
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI flag removal here changes user-facing behavior, but repo docs still reference -o for these commands (e.g., README.md:84 and SPEC.md:357 show spfm spotify find-duplicate-ids -o ...). Please update the documentation (and any changelog notes) to match the new CLI behavior.

Copilot uses AI. Check for mistakes.
Comment thread spotfm/spotify/dupes.py Outdated
Comment on lines +21 to +22
MAGENTA = "\033[35m"
BLUE = "\033[34m"
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAGENTA and BLUE are defined but unused in this module. Please remove unused color constants (or start using them) to avoid dead code and keep linting clean.

Suggested change
MAGENTA = "\033[35m"
BLUE = "\033[34m"

Copilot uses AI. Check for mistakes.
Comment thread spotfm/spotify/dupes.py Outdated
Comment on lines +133 to +143
playlist_names = [pname for _pid, pname in track_info["playlists"]]
# Normalize playlists: sort them alphabetically so "A,B" and "B,A" both become "A,B"
normalized_playlists = ",".join(sorted(playlist_names))
duplicates.append(
{
"type": "ID",
"track": track_info["full_name"],
"track_name": track_info["name"],
"artists": track_info["artists"],
"count": track_info["playlist_count"],
"playlists": ",".join(playlist_names),
"playlists": normalized_playlists,
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The terminal output is not reliably CSV-parsable because fields are separated with commas while playlists is also a comma-joined list, and playlist/artist/track names can themselves contain commas/newlines. Consider emitting CSV via csv.writer (with quoting) to stdout, or using a delimiter that cannot appear in the data (consistent with the existing ;-delimited CSV writer).

Copilot uses AI. Check for mistakes.
Comment thread spotfm/spotify/dupes.py Outdated
Comment on lines +155 to +158
if dup["artists"]:
print(f"{playlists_str},{artists_str},{track_str}")
else:
print(f"{playlists_str},{track_str}")
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output emits a different number of comma-separated fields depending on whether artists is empty (2 fields vs 3), which makes downstream parsing brittle. For CSV-compatible output, always emit the same columns (e.g., print an empty artists field when missing).

Suggested change
if dup["artists"]:
print(f"{playlists_str},{artists_str},{track_str}")
else:
print(f"{playlists_str},{track_str}")
print(f"{playlists_str},{artists_str},{track_str}")

Copilot uses AI. Check for mistakes.
Comment thread spotfm/spotify/dupes.py Outdated
Comment on lines +150 to +156
# Output results to terminal
for dup in duplicates:
playlists_str = f"{CYAN}{dup['playlists']}{RESET}"
artists_str = f"{GREEN}{dup['artists']}{RESET}" if dup["artists"] else ""
track_str = f"{YELLOW}{dup['track_name']}{RESET}"
if dup["artists"]:
print(f"{playlists_str},{artists_str},{track_str}")
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ANSI color codes are always included in the printed output. When users pipe to a file, this produces non-CSV content by default and contradicts the “CSV-compatible” claim. Consider enabling colors only when sys.stdout.isatty() (and/or honoring NO_COLOR) so piped output is plain CSV.

Copilot uses AI. Check for mistakes.
Comment thread spotfm/spotify/dupes.py Outdated
Comment on lines +479 to +489
playlist1_names = [pname for _pid, pname in track1["playlists"]]
playlist2_names = [pname for _pid, pname in track2["playlists"]]

playlists1_str = ",".join(playlist1_names)
playlists2_str = ",".join(playlist2_names)

# Normalize playlist pairs: ensure they're always in alphabetical order
if playlists1_str > playlists2_str:
# Swap if needed to maintain alphabetical order
playlists1_str, playlists2_str = playlists2_str, playlists1_str
# Also swap artists and tracks to maintain correspondence
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

playlists1_str/playlists2_str are built from the raw playlist order coming from SQL GROUP_CONCAT (no ORDER BY), so the joined strings can be nondeterministic and the if playlists1_str > playlists2_str swap can flip between runs for the same pair. Sort playlist*_names before joining (and ideally compare normalized lists) to make output stable and normalization correct.

Copilot uses AI. Check for mistakes.
Comment thread spotfm/spotify/dupes.py Outdated
Comment on lines +601 to +611
playlist1_names = [pname for _pid, pname in track1["playlists"]]
playlist2_names = [pname for _pid, pname in track2["playlists"]]

playlists1_str = ",".join(playlist1_names)
playlists2_str = ",".join(playlist2_names)

# Normalize playlist pairs: ensure they're always in alphabetical order
if playlists1_str > playlists2_str:
# Swap if needed to maintain alphabetical order
playlists1_str, playlists2_str = playlists2_str, playlists1_str
# Also swap artists and tracks to maintain correspondence
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same determinism issue as in pass 1: playlists1_str/playlists2_str are joined from an undefined playlist ordering, so the normalization swap can vary between runs. Sort playlist names (or IDs) before joining to make the output stable and the alphabetical normalization meaningful.

Copilot uses AI. Check for mistakes.
Comment thread spotfm/spotify/dupes.py Outdated
Comment on lines +651 to +666
# Output results to terminal
for dup in duplicates:
# Pair 1 in Cyan
playlists1_str = f"{CYAN}{dup['playlists1']}{RESET}"
artists1_str = f"{CYAN}{dup['artists1']}{RESET}"
track1_str = f"{CYAN}{dup['track1']}{RESET}"

# Pair 2 in Green
playlists2_str = f"{GREEN}{dup['playlists2']}{RESET}"
artists2_str = f"{GREEN}{dup['artists2']}{RESET}"
track2_str = f"{GREEN}{dup['track2']}{RESET}"

# Score in Yellow (2 decimal places)
score_str = f"{YELLOW}{dup['score']:.2f}{RESET}"

print(f"{playlists1_str},{playlists2_str},{artists1_str},{artists2_str},{track1_str},{track2_str},{score_str}")
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The printed output here uses commas as separators but does not quote/escape fields; artists* values are comma-separated (and track/playlist names can contain commas), so consumers cannot reliably parse this as CSV. If CSV compatibility is desired, write via csv.writer to stdout with proper quoting (and optionally disable ANSI when not a TTY).

Copilot uses AI. Check for mistakes.
Comment thread spotfm/spotify/dupes.py
Comment on lines 682 to +693
with open(output_path, "w", newline="") as csvfile:
fieldnames = ["Type", "Track", "Count", "Playlists"]
fieldnames = ["Playlists", "Artists", "Track", "Count"]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, delimiter=";")
writer.writeheader()

for dup in duplicates:
writer.writerow(
{
"Type": dup["type"],
"Track": dup["track"],
"Count": dup["count"],
"Playlists": dup["playlists"],
"Artists": dup["artists"],
"Track": dup["track_name"],
"Count": dup["count"],
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_duplicates_csv now expects each duplicate dict to contain artists and track_name keys and no longer writes the previous Type/Track columns. This will break existing callers (and current tests in tests/test_dupes.py that pass the older shape). Either keep backward compatibility (accept both key shapes / preserve old columns) or update all callers/tests that rely on the old CSV schema.

Copilot uses AI. Check for mistakes.
Comment thread spotfm/spotify/dupes.py
Comment on lines 370 to 381
@@ -359,7 +376,6 @@ def find_duplicate_names(excluded_playlist_ids=None, output_file=None, threshold

Args:
excluded_playlist_ids: List of playlist IDs to exclude
output_file: Path to CSV output file (optional)
threshold: Minimum similarity score (0-100) to consider a duplicate

Returns:
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This signature change removes output_file, but the existing tests (e.g., tests/test_dupes.py:501) and any external callers passing output_file will break. If file output is being removed, the tests/callers should be updated; otherwise consider keeping output_file with a deprecation plan.

Copilot uses AI. Check for mistakes.
- Restore -o/--output option for other Spotify commands (find-relinked-tracks, find-tracks)
- Update docstrings for find_duplicate_ids/names to document CSV-compatible output format
- Remove tests for dead CSV writing functions
- All tests passing

Copilot identified: other commands still referenced args.output, and tests called
functions with removed output_file parameter. Kept -o option on global parser but
deprecated for duplicate commands.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@jmlrt jmlrt changed the title Remove -o option from duplicate detection commands refactor: compact duplicate detection output format and simplify CLI Apr 6, 2026
jmlrt and others added 3 commits April 6, 2026 12:22
- Update README with piping examples instead of -o option
- Update SPEC.md with CSV-compatible format details and pipe examples
- Add CHANGELOG entry documenting output format improvements
- Clarify that -o option is only for find-tracks and find-relinked-tracks

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Core fixes for CSV output issues identified in Copilot review:

- Remove unused MAGENTA/BLUE color constants (dead code cleanup)
- Use csv.writer with proper quoting instead of naive comma separation
  - Handles special chars (commas, quotes, newlines) in playlist/artist/track names
  - Fields are always quoted when needed
- Check sys.stdout.isatty() before adding ANSI codes
  - TTY output: includes color codes (as before)
  - Piped output: plain CSV without ANSI codes
- Sort playlist names before joining to fix determinism issue
  - GROUP_CONCAT returns undefined order from SQL
  - Sorting ensures "A,B" and "B,A" always normalize to same output
  - Fixes swap logic in playlist pair normalization
- Always emit same number of columns in find_duplicate_ids
  - Artists field always included (empty if no artists)
  - Prevents brittle downstream parsing

Fixes identified Copilot comments:
- Remove unused color constants
- Make output truly CSV-compliant with proper quoting
- Only include ANSI codes for TTY output
- Fix nondeterministic playlist ordering in both passes
- Ensure consistent column count in all output rows

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Add implementation details, parsing examples, and development guidance:

SPEC.md:
- Document two-pass algorithm (prefix-based + cross-prefix)
- Explain playlist pair normalization and determinism
- Clarify CSV output quoting behavior and TTY detection

README.md:
- Add "Parsing Duplicate Detection Output" section
- Show how to strip ANSI codes with sed for clean CSV
- Document column reference for both commands
- Note on field quoting for special characters

CONTRIBUTING.md:
- Add "Working with Duplicate Detection (dupes.py)" section
- Document four invariants: playlist ordering, CSV encoding, ANSI codes, normalization
- Explain why each matters and how to maintain them
- Prevent future regressions with clear guidance

Addresses feedback from PR review on CSV determinism and output format.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@jmlrt jmlrt merged commit 9497c5d into main Apr 10, 2026
4 checks passed
@jmlrt jmlrt deleted the feat/remove-duplicate-output-option branch April 10, 2026 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants