|
| 1 | +# Fix Plan: Issue #95 - Content Not Being Updated or Skipped During Fetch |
| 2 | + |
| 3 | +**Issue:** Some content is not being updated or skipped during fetch operations. |
| 4 | + |
| 5 | +**Investigation Date:** 2025-11-27 |
| 6 | + |
| 7 | +**Related Commit:** 2f2a47a (Incremental sync feature added 2025-11-26) |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## Root Causes Identified |
| 12 | + |
| 13 | +### 1. Dead Code: `filterChangedPages` Not Used |
| 14 | +- **Severity:** Medium (technical debt, potential future bug) |
| 15 | +- **File:** `scripts/notion-fetch/pageMetadataCache.ts` |
| 16 | +- **Problem:** Function is imported but never called in production code |
| 17 | +- **Impact:** Confusing codebase, potential for bugs if used later |
| 18 | + |
| 19 | +### 2. Logic Discrepancy: Missing Path Change Detection |
| 20 | +- **Severity:** High (could cause skipped updates) |
| 21 | +- **File:** `scripts/notion-fetch/pageMetadataCache.ts:179-206` |
| 22 | +- **Problem:** `filterChangedPages` lacks the path change check that inline logic has |
| 23 | +- **Impact:** If used, would skip renamed/moved pages incorrectly |
| 24 | + |
| 25 | +### 3. Sub-page Timeout Issues |
| 26 | +- **Severity:** Medium (causes incomplete content) |
| 27 | +- **File:** `scripts/fetchNotionData.ts:189` |
| 28 | +- **Problem:** 10-second timeout may be too aggressive for slow API responses |
| 29 | +- **Impact:** Sub-pages timeout and get skipped silently |
| 30 | + |
| 31 | +### 4. Insufficient Logging |
| 32 | +- **Severity:** Low (makes debugging difficult) |
| 33 | +- **Problem:** Limited visibility into why pages are skipped |
| 34 | +- **Impact:** Hard to diagnose issues in production |
| 35 | + |
| 36 | +--- |
| 37 | + |
| 38 | +## Implementation Plan |
| 39 | + |
| 40 | +### Phase 1: Immediate Fixes (High Priority) |
| 41 | + |
| 42 | +#### Task 1.1: Add Detailed Skip Reason Logging |
| 43 | +**File:** `scripts/notion-fetch/generateBlocks.ts:713-719` |
| 44 | + |
| 45 | +**Current Code:** |
| 46 | +```typescript |
| 47 | +if (!needsProcessing) { |
| 48 | + console.log(chalk.gray(` ⏭️ Skipping unchanged page: ${pageTitle}`)); |
| 49 | +} |
| 50 | +``` |
| 51 | + |
| 52 | +**New Code:** |
| 53 | +```typescript |
| 54 | +if (!needsProcessing) { |
| 55 | + const reason = |
| 56 | + !cachedPage ? "not in cache (NEW)" : |
| 57 | + hasMissingOutputs(metadataCache, page.id) ? "missing output files" : |
| 58 | + !cachedPage.outputPaths?.includes(filePath) ? `path changed (cached: ${cachedPage.outputPaths?.join(", ")}, current: ${filePath})` : |
| 59 | + new Date(page.last_edited_time).getTime() <= new Date(cachedPage.lastEdited).getTime() |
| 60 | + ? `unchanged since ${cachedPage.lastEdited}` |
| 61 | + : "UNKNOWN"; |
| 62 | + |
| 63 | + console.log( |
| 64 | + chalk.gray(` ⏭️ Skipping page: ${pageTitle}`) |
| 65 | + ); |
| 66 | + console.log( |
| 67 | + chalk.dim(` Reason: ${reason}`) |
| 68 | + ); |
| 69 | +} |
| 70 | +``` |
| 71 | + |
| 72 | +**Benefits:** |
| 73 | +- Immediate visibility into why pages are skipped |
| 74 | +- Helps diagnose cache issues |
| 75 | +- No breaking changes |
| 76 | + |
| 77 | +#### Task 1.2: Enhance Sub-page Timeout Logging |
| 78 | +**File:** `scripts/fetchNotionData.ts:221-228` |
| 79 | + |
| 80 | +**Current Code:** |
| 81 | +```typescript |
| 82 | +console.warn( |
| 83 | + `⚠️ Skipping sub-page ${rel.subId} (parent: "${rel.parentTitle}"): ${pageError.message}` |
| 84 | +); |
| 85 | +``` |
| 86 | + |
| 87 | +**New Code:** |
| 88 | +```typescript |
| 89 | +console.warn( |
| 90 | + `⚠️ Skipping sub-page ${rel.subId} (parent: "${rel.parentTitle}")` |
| 91 | +); |
| 92 | +console.warn( |
| 93 | + ` Error: ${pageError.message}` |
| 94 | +); |
| 95 | +console.warn( |
| 96 | + ` Type: ${pageError instanceof Error ? pageError.constructor.name : typeof pageError}` |
| 97 | +); |
| 98 | +if (pageError.message.includes("timeout")) { |
| 99 | + console.warn( |
| 100 | + ` 💡 Hint: Consider increasing TIMEOUT_MS in fetchNotionData.ts` |
| 101 | + ); |
| 102 | +} |
| 103 | +``` |
| 104 | + |
| 105 | +**Benefits:** |
| 106 | +- Better error categorization (timeout vs. API error vs. permission) |
| 107 | +- Actionable hints for resolution |
| 108 | +- Track patterns in failures |
| 109 | + |
| 110 | +#### Task 1.3: Increase Sub-page Timeout |
| 111 | +**File:** `scripts/fetchNotionData.ts:189` |
| 112 | + |
| 113 | +**Current Code:** |
| 114 | +```typescript |
| 115 | +const TIMEOUT_MS = 10000; // 10 second timeout |
| 116 | +``` |
| 117 | + |
| 118 | +**New Code:** |
| 119 | +```typescript |
| 120 | +// Increased from 10s to 30s to handle slow Notion API responses |
| 121 | +// particularly for pages with large blocks or many nested children |
| 122 | +const TIMEOUT_MS = 30000; // 30 second timeout |
| 123 | +``` |
| 124 | + |
| 125 | +**Rationale:** |
| 126 | +- 10 seconds may be too aggressive for complex pages |
| 127 | +- CI/GitHub Actions can have slower network |
| 128 | +- Better to wait longer than skip content |
| 129 | +- Still prevents indefinite hangs |
| 130 | + |
| 131 | +--- |
| 132 | + |
| 133 | +### Phase 2: Fix Core Logic (Medium Priority) |
| 134 | + |
| 135 | +#### Task 2.1: Update `filterChangedPages` Function |
| 136 | +**File:** `scripts/notion-fetch/pageMetadataCache.ts:179-206` |
| 137 | + |
| 138 | +**Problem:** Missing path change detection |
| 139 | + |
| 140 | +**Solution:** Add optional callback parameter for path resolution |
| 141 | + |
| 142 | +**New Signature:** |
| 143 | +```typescript |
| 144 | +export function filterChangedPages< |
| 145 | + T extends { id: string; last_edited_time: string }, |
| 146 | +>( |
| 147 | + pages: T[], |
| 148 | + cache: PageMetadataCache | null, |
| 149 | + options?: { |
| 150 | + /** |
| 151 | + * Callback to get the current file path for a page. |
| 152 | + * If provided, pages with changed paths will be marked as needing update. |
| 153 | + */ |
| 154 | + getFilePath?: (page: T) => string; |
| 155 | + } |
| 156 | +): T[] |
| 157 | +``` |
| 158 | + |
| 159 | +**New Logic:** |
| 160 | +```typescript |
| 161 | +return pages.filter((page) => { |
| 162 | + const cached = cache.pages[page.id]; |
| 163 | +
|
| 164 | + // New page - not in cache |
| 165 | + if (!cached) { |
| 166 | + return true; |
| 167 | + } |
| 168 | +
|
| 169 | + // If any expected outputs are missing, force regeneration |
| 170 | + if (hasMissingOutputs(cache, page.id)) { |
| 171 | + return true; |
| 172 | + } |
| 173 | +
|
| 174 | + // NEW: Check if path changed (only if callback provided) |
| 175 | + if (options?.getFilePath) { |
| 176 | + const currentPath = options.getFilePath(page); |
| 177 | + if (!cached.outputPaths?.includes(currentPath)) { |
| 178 | + return true; |
| 179 | + } |
| 180 | + } |
| 181 | +
|
| 182 | + // Compare timestamps |
| 183 | + const notionTime = new Date(page.last_edited_time).getTime(); |
| 184 | + const cachedTime = new Date(cached.lastEdited).getTime(); |
| 185 | +
|
| 186 | + return notionTime > cachedTime; |
| 187 | +}); |
| 188 | +``` |
| 189 | + |
| 190 | +**Benefits:** |
| 191 | +- Backwards compatible (options parameter is optional) |
| 192 | +- Matches inline logic behavior |
| 193 | +- Can be used for upfront filtering if needed |
| 194 | +- Fixes potential future bug |
| 195 | + |
| 196 | +#### Task 2.2: Update Tests |
| 197 | +**File:** `scripts/notion-fetch/__tests__/pageMetadataCache.test.ts` |
| 198 | + |
| 199 | +**Add new test:** |
| 200 | +```typescript |
| 201 | +describe("filterChangedPages with path changes", () => { |
| 202 | + it("should include pages when path changes even if timestamp unchanged", () => { |
| 203 | + const pages = [ |
| 204 | + { id: "page-1", last_edited_time: "2024-01-01T00:00:00.000Z" }, |
| 205 | + ]; |
| 206 | +
|
| 207 | + const cache: PageMetadataCache = { |
| 208 | + version: CACHE_VERSION, |
| 209 | + scriptHash: "test", |
| 210 | + lastSync: "2024-01-01", |
| 211 | + pages: { |
| 212 | + "page-1": { |
| 213 | + lastEdited: "2024-01-01T00:00:00.000Z", |
| 214 | + outputPaths: ["/docs/old-path.md"], |
| 215 | + processedAt: "2024-01-01", |
| 216 | + }, |
| 217 | + }, |
| 218 | + }; |
| 219 | +
|
| 220 | + // Path changed from /docs/old-path.md to /docs/new-path.md |
| 221 | + const result = filterChangedPages(pages, cache, { |
| 222 | + getFilePath: () => "/docs/new-path.md", |
| 223 | + }); |
| 224 | +
|
| 225 | + expect(result).toHaveLength(1); |
| 226 | + expect(result[0].id).toBe("page-1"); |
| 227 | + }); |
| 228 | +}); |
| 229 | +``` |
| 230 | + |
| 231 | +--- |
| 232 | + |
| 233 | +### Phase 3: Add Cache Diagnostics (Low Priority) |
| 234 | + |
| 235 | +#### Task 3.1: Add Cache Validation Command |
| 236 | +**New File:** `scripts/notion-fetch/validateCache.ts` |
| 237 | + |
| 238 | +**Purpose:** Detect and report cache issues |
| 239 | + |
| 240 | +**Features:** |
| 241 | +```typescript |
| 242 | +export async function validateCache(): Promise<{ |
| 243 | + valid: boolean; |
| 244 | + issues: string[]; |
| 245 | +}> { |
| 246 | + const cache = loadPageMetadataCache(); |
| 247 | + const issues: string[] = []; |
| 248 | +
|
| 249 | + if (!cache) { |
| 250 | + return { valid: false, issues: ["No cache found"] }; |
| 251 | + } |
| 252 | +
|
| 253 | + // Check 1: Verify output files exist |
| 254 | + for (const [pageId, metadata] of Object.entries(cache.pages)) { |
| 255 | + for (const outputPath of metadata.outputPaths) { |
| 256 | + if (!fs.existsSync(outputPath)) { |
| 257 | + issues.push( |
| 258 | + `Missing output file for page ${pageId}: ${outputPath}` |
| 259 | + ); |
| 260 | + } |
| 261 | + } |
| 262 | + } |
| 263 | +
|
| 264 | + // Check 2: Find orphaned output files |
| 265 | + const cachedPaths = new Set( |
| 266 | + Object.values(cache.pages).flatMap(p => p.outputPaths) |
| 267 | + ); |
| 268 | + // Scan docs/ for .md files not in cache... |
| 269 | +
|
| 270 | + // Check 3: Verify script hash is current |
| 271 | + const currentHash = await computeScriptHash(); |
| 272 | + if (cache.scriptHash !== currentHash.hash) { |
| 273 | + issues.push( |
| 274 | + `Script hash mismatch (cache: ${cache.scriptHash.slice(0, 8)}, current: ${currentHash.hash.slice(0, 8)})` |
| 275 | + ); |
| 276 | + } |
| 277 | +
|
| 278 | + return { |
| 279 | + valid: issues.length === 0, |
| 280 | + issues, |
| 281 | + }; |
| 282 | +} |
| 283 | +``` |
| 284 | + |
| 285 | +**CLI Command:** |
| 286 | +```bash |
| 287 | +bun run notion:validate-cache |
| 288 | +``` |
| 289 | + |
| 290 | +--- |
| 291 | + |
| 292 | +## Testing Plan |
| 293 | + |
| 294 | +### Test 1: Verify Skip Logging |
| 295 | +```bash |
| 296 | +# Run with incremental sync enabled |
| 297 | +bun run notion:fetch-all |
| 298 | +
|
| 299 | +# Expected: See detailed reasons for each skipped page |
| 300 | +# Example output: |
| 301 | +# ⏭️ Skipping page: Getting Started |
| 302 | +# Reason: unchanged since 2025-11-26T10:00:00.000Z |
| 303 | +``` |
| 304 | + |
| 305 | +### Test 2: Verify Timeout Increase |
| 306 | +```bash |
| 307 | +# Monitor for "Skipping sub-page" warnings |
| 308 | +bun run notion:fetch-all 2>&1 | grep -i "skipping sub-page" |
| 309 | +
|
| 310 | +# Expected: Fewer timeout-related skips |
| 311 | +``` |
| 312 | + |
| 313 | +### Test 3: Test Path Change Detection |
| 314 | +```bash |
| 315 | +# Manually edit cache to have wrong path |
| 316 | +# Run fetch |
| 317 | +# Expected: Page is regenerated despite unchanged timestamp |
| 318 | +``` |
| 319 | + |
| 320 | +### Test 4: Dry Run Validation |
| 321 | +```bash |
| 322 | +# Run dry run to see what would be processed |
| 323 | +bun run notion:fetch-all --dry-run |
| 324 | +
|
| 325 | +# Expected: Clear indication of what will be skipped and why |
| 326 | +``` |
| 327 | + |
| 328 | +--- |
| 329 | + |
| 330 | +## Rollback Plan |
| 331 | + |
| 332 | +If issues occur after implementation: |
| 333 | + |
| 334 | +1. **Revert logging changes:** Safe, can be undone without side effects |
| 335 | +2. **Revert timeout increase:** Restore to 10000ms if causing other issues |
| 336 | +3. **Revert filterChangedPages changes:** Backwards compatible, can revert |
| 337 | +4. **Force full rebuild:** Run with `--force` flag to bypass cache entirely |
| 338 | + |
| 339 | +--- |
| 340 | + |
| 341 | +## Success Metrics |
| 342 | + |
| 343 | +1. **Reduced Skip Complaints:** Issue #95 should be resolved |
| 344 | +2. **Better Diagnostics:** Team can quickly identify why pages are skipped |
| 345 | +3. **Fewer Timeouts:** Sub-page timeout warnings decrease |
| 346 | +4. **No Regressions:** All pages that should update do update |
| 347 | + |
| 348 | +--- |
| 349 | + |
| 350 | +## Documentation Updates |
| 351 | + |
| 352 | +After implementation, update: |
| 353 | + |
| 354 | +1. **`context/workflows/notion-commands.md`**: Document new logging format |
| 355 | +2. **`context/development/roadmap.md`**: Mark issue as resolved |
| 356 | +3. **`NOTION_FETCH_ARCHITECTURE.md`**: Add section on skip logic |
| 357 | +4. **`CLAUDE.md`**: Add debugging tips for cache issues |
| 358 | + |
| 359 | +--- |
| 360 | + |
| 361 | +## Timeline |
| 362 | + |
| 363 | +- **Phase 1 (Immediate):** 1-2 hours (logging + timeout) |
| 364 | +- **Phase 2 (Core Fix):** 2-3 hours (function update + tests) |
| 365 | +- **Phase 3 (Diagnostics):** 2-4 hours (optional, can defer) |
| 366 | +- **Testing:** 1 hour |
| 367 | +- **Documentation:** 30 minutes |
| 368 | + |
| 369 | +**Total Estimated Time:** 6-10 hours |
| 370 | + |
| 371 | +--- |
| 372 | + |
| 373 | +## Notes |
| 374 | + |
| 375 | +- Incremental sync was just added yesterday (2025-11-26) |
| 376 | +- This issue likely surfaced immediately after that change |
| 377 | +- The fixes maintain backwards compatibility |
| 378 | +- All changes are testable with `--dry-run` flag |
0 commit comments