Skip to content

Commit acb1999

Browse files
committed
Add analyze orphaned-files command
- Implements new command to find files with no incoming references - Scans all RST and YAML files to build complete reference map - Identifies files not referenced by include, literalinclude, io-code-block, or toctree - Supports --include-toctree flag to consider navigation links - Supports --exclude pattern to skip certain paths - Provides text, JSON, count-only, and paths-only output formats - Includes comprehensive tests and documentation - Useful for finding unused includes, unlinked pages, and legacy content
1 parent 96c7def commit acb1999

File tree

7 files changed

+859
-1
lines changed

7 files changed

+859
-1
lines changed

audit-cli/README.md

Lines changed: 110 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,8 @@ audit-cli
5959
│ └── find-string
6060
├── analyze # Analyze RST file structures
6161
│ ├── includes
62-
│ └── file-references
62+
│ ├── file-references
63+
│ └── orphaned-files
6364
└── compare # Compare files across versions
6465
└── file-contents
6566
```
@@ -515,6 +516,114 @@ include : 3 files, 4 references
515516
./audit-cli analyze file-references ~/docs/source/includes/fact.rst --exclude "*/deprecated/*"
516517
```
517518

519+
#### `analyze orphaned-files`
520+
521+
Find files that have no incoming references from other files. This command scans all RST and YAML files in a source directory to identify files that are not referenced by any include, literalinclude, io-code-block, or toctree directive.
522+
523+
**Use Cases:**
524+
525+
This command helps writers:
526+
- Find unused include files that can be removed
527+
- Identify documentation pages not linked in the navigation
528+
- Discover legacy content that needs cleanup
529+
- Maintain documentation hygiene by removing dead files
530+
- Identify entry points (like index.rst) that are referenced externally
531+
532+
**Basic Usage:**
533+
534+
```bash
535+
# Find orphaned files in a source directory
536+
./audit-cli analyze orphaned-files ~/docs/source
537+
538+
# Include toctree references (consider navigation links)
539+
./audit-cli analyze orphaned-files ~/docs/source --include-toctree
540+
541+
# Get JSON output for automation
542+
./audit-cli analyze orphaned-files ~/docs/source --format json
543+
544+
# Show detailed scanning progress
545+
./audit-cli analyze orphaned-files ~/docs/source --verbose
546+
547+
# Quick check: just show the count
548+
./audit-cli analyze orphaned-files ~/docs/source --count-only
549+
550+
# Get list of files for piping to other commands
551+
./audit-cli analyze orphaned-files ~/docs/source --paths-only
552+
553+
# Exclude certain paths from analysis
554+
./audit-cli analyze orphaned-files ~/docs/source --exclude "*/archive/*"
555+
```
556+
557+
**Flags:**
558+
559+
- `--format <format>` - Output format: `text` (default) or `json`
560+
- `-v, --verbose` - Show detailed information during scanning
561+
- `-c, --count-only` - Only show the count of orphaned files
562+
- `--paths-only` - Only show the file paths, one per line (useful for piping to other commands)
563+
- `--include-toctree` - Include toctree references when determining orphaned status
564+
- `--exclude <pattern>` - Exclude paths matching this glob pattern (e.g., `*/archive/*` or `*/deprecated/*`)
565+
566+
**Output:**
567+
568+
Text format shows:
569+
- Source directory path
570+
- Total files scanned
571+
- Number of orphaned files
572+
- List of orphaned files (relative paths)
573+
- Helpful suggestions for what to do with orphaned files
574+
575+
JSON format provides:
576+
```json
577+
{
578+
"source_dir": "/path/to/source",
579+
"total_files": 150,
580+
"total_orphaned": 12,
581+
"orphaned_files": [
582+
"unused-include.rst",
583+
"legacy-page.rst",
584+
"includes/old-fact.rst"
585+
],
586+
"included_toctree": false
587+
}
588+
```
589+
590+
**Exit Codes:**
591+
592+
- `0` - Success (whether orphaned files found or not)
593+
- `1` - Error (invalid arguments, directory not found, etc.)
594+
595+
**Note:** By default, only content inclusion directives (include, literalinclude, io-code-block) are considered. Use `--include-toctree` to also consider toctree entries (navigation links) when determining orphaned status. Entry point files like `index.rst` will typically appear as orphaned since they're referenced externally by the build system.
596+
597+
**Examples:**
598+
599+
```bash
600+
# Find orphaned files (content inclusion only)
601+
./audit-cli analyze orphaned-files ~/docs/source
602+
603+
# Find orphaned files (including navigation links)
604+
./audit-cli analyze orphaned-files ~/docs/source --include-toctree
605+
606+
# Get machine-readable output for scripting
607+
./audit-cli analyze orphaned-files ~/docs/source --format json | jq '.total_orphaned'
608+
609+
# See detailed progress for large repositories
610+
./audit-cli analyze orphaned-files ~/docs/source --verbose
611+
612+
# Quick check: just show the count
613+
./audit-cli analyze orphaned-files ~/docs/source --count-only
614+
# Output: 12
615+
616+
# Get list of files for piping to other commands
617+
./audit-cli analyze orphaned-files ~/docs/source --paths-only
618+
# Output:
619+
# unused-include.rst
620+
# legacy-page.rst
621+
# includes/old-fact.rst
622+
623+
# Exclude archived files from analysis
624+
./audit-cli analyze orphaned-files ~/docs/source --exclude "*/archive/*"
625+
```
626+
518627
### Compare Commands
519628

520629
#### `compare file-contents`

audit-cli/commands/analyze/analyze.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,15 @@
44
// Currently supports:
55
// - includes: Analyze include directive relationships in RST files
66
// - file-references: Find all files that reference a target file
7+
// - orphaned-files: Find files with no incoming references
78
//
89
// Future subcommands could include analyzing cross-references, broken links, or content metrics.
910
package analyze
1011

1112
import (
1213
"github.com/mongodb/code-example-tooling/audit-cli/commands/analyze/includes"
1314
filereferences "github.com/mongodb/code-example-tooling/audit-cli/commands/analyze/file-references"
15+
orphanedfiles "github.com/mongodb/code-example-tooling/audit-cli/commands/analyze/orphaned-files"
1416
"github.com/spf13/cobra"
1517
)
1618

@@ -27,13 +29,15 @@ func NewAnalyzeCommand() *cobra.Command {
2729
Currently supports:
2830
- includes: Analyze include directive relationships (forward dependencies)
2931
- file-references: Find all files that reference a target file (reverse dependencies)
32+
- orphaned-files: Find files with no incoming references
3033
3134
Future subcommands may support analyzing cross-references, broken links, or content metrics.`,
3235
}
3336

3437
// Add subcommands
3538
cmd.AddCommand(includes.NewIncludesCommand())
3639
cmd.AddCommand(filereferences.NewFileReferencesCommand())
40+
cmd.AddCommand(orphanedfiles.NewOrphanedFilesCommand())
3741

3842
return cmd
3943
}

0 commit comments

Comments
 (0)