Skip to content

Commit b0b9f4b

Browse files
committed
docs(notion-fetch): add comprehensive plan for issue #95
Investigate and document root causes for content being skipped during fetch: 1. Dead code: filterChangedPages imported but never used 2. Logic discrepancy: missing path change detection in filterChangedPages 3. Sub-page timeout issues: 10s timeout too aggressive 4. Insufficient logging: hard to diagnose skip reasons Plan includes: - Phase 1: Immediate fixes (detailed logging, increased timeout) - Phase 2: Core logic fixes (update filterChangedPages function) - Phase 3: Diagnostics (cache validation command) Related to incremental sync feature added in 2f2a47a (2025-11-26) Estimated timeline: 6-10 hours
1 parent 2f2a47a commit b0b9f4b

File tree

1 file changed

+378
-0
lines changed

1 file changed

+378
-0
lines changed

ISSUE_95_PLAN.md

Lines changed: 378 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,378 @@
1+
# Fix Plan: Issue #95 - Content Not Being Updated or Skipped During Fetch
2+
3+
**Issue:** Some content is not being updated or skipped during fetch operations.
4+
5+
**Investigation Date:** 2025-11-27
6+
7+
**Related Commit:** 2f2a47a (Incremental sync feature added 2025-11-26)
8+
9+
---
10+
11+
## Root Causes Identified
12+
13+
### 1. Dead Code: `filterChangedPages` Not Used
14+
- **Severity:** Medium (technical debt, potential future bug)
15+
- **File:** `scripts/notion-fetch/pageMetadataCache.ts`
16+
- **Problem:** Function is imported but never called in production code
17+
- **Impact:** Confusing codebase, potential for bugs if used later
18+
19+
### 2. Logic Discrepancy: Missing Path Change Detection
20+
- **Severity:** High (could cause skipped updates)
21+
- **File:** `scripts/notion-fetch/pageMetadataCache.ts:179-206`
22+
- **Problem:** `filterChangedPages` lacks the path change check that inline logic has
23+
- **Impact:** If used, would skip renamed/moved pages incorrectly
24+
25+
### 3. Sub-page Timeout Issues
26+
- **Severity:** Medium (causes incomplete content)
27+
- **File:** `scripts/fetchNotionData.ts:189`
28+
- **Problem:** 10-second timeout may be too aggressive for slow API responses
29+
- **Impact:** Sub-pages timeout and get skipped silently
30+
31+
### 4. Insufficient Logging
32+
- **Severity:** Low (makes debugging difficult)
33+
- **Problem:** Limited visibility into why pages are skipped
34+
- **Impact:** Hard to diagnose issues in production
35+
36+
---
37+
38+
## Implementation Plan
39+
40+
### Phase 1: Immediate Fixes (High Priority)
41+
42+
#### Task 1.1: Add Detailed Skip Reason Logging
43+
**File:** `scripts/notion-fetch/generateBlocks.ts:713-719`
44+
45+
**Current Code:**
46+
```typescript
47+
if (!needsProcessing) {
48+
console.log(chalk.gray(` ⏭️ Skipping unchanged page: ${pageTitle}`));
49+
}
50+
```
51+
52+
**New Code:**
53+
```typescript
54+
if (!needsProcessing) {
55+
const reason =
56+
!cachedPage ? "not in cache (NEW)" :
57+
hasMissingOutputs(metadataCache, page.id) ? "missing output files" :
58+
!cachedPage.outputPaths?.includes(filePath) ? `path changed (cached: ${cachedPage.outputPaths?.join(", ")}, current: ${filePath})` :
59+
new Date(page.last_edited_time).getTime() <= new Date(cachedPage.lastEdited).getTime()
60+
? `unchanged since ${cachedPage.lastEdited}`
61+
: "UNKNOWN";
62+
63+
console.log(
64+
chalk.gray(` ⏭️ Skipping page: ${pageTitle}`)
65+
);
66+
console.log(
67+
chalk.dim(` Reason: ${reason}`)
68+
);
69+
}
70+
```
71+
72+
**Benefits:**
73+
- Immediate visibility into why pages are skipped
74+
- Helps diagnose cache issues
75+
- No breaking changes
76+
77+
#### Task 1.2: Enhance Sub-page Timeout Logging
78+
**File:** `scripts/fetchNotionData.ts:221-228`
79+
80+
**Current Code:**
81+
```typescript
82+
console.warn(
83+
`⚠️ Skipping sub-page ${rel.subId} (parent: "${rel.parentTitle}"): ${pageError.message}`
84+
);
85+
```
86+
87+
**New Code:**
88+
```typescript
89+
console.warn(
90+
`⚠️ Skipping sub-page ${rel.subId} (parent: "${rel.parentTitle}")`
91+
);
92+
console.warn(
93+
` Error: ${pageError.message}`
94+
);
95+
console.warn(
96+
` Type: ${pageError instanceof Error ? pageError.constructor.name : typeof pageError}`
97+
);
98+
if (pageError.message.includes("timeout")) {
99+
console.warn(
100+
` 💡 Hint: Consider increasing TIMEOUT_MS in fetchNotionData.ts`
101+
);
102+
}
103+
```
104+
105+
**Benefits:**
106+
- Better error categorization (timeout vs. API error vs. permission)
107+
- Actionable hints for resolution
108+
- Track patterns in failures
109+
110+
#### Task 1.3: Increase Sub-page Timeout
111+
**File:** `scripts/fetchNotionData.ts:189`
112+
113+
**Current Code:**
114+
```typescript
115+
const TIMEOUT_MS = 10000; // 10 second timeout
116+
```
117+
118+
**New Code:**
119+
```typescript
120+
// Increased from 10s to 30s to handle slow Notion API responses
121+
// particularly for pages with large blocks or many nested children
122+
const TIMEOUT_MS = 30000; // 30 second timeout
123+
```
124+
125+
**Rationale:**
126+
- 10 seconds may be too aggressive for complex pages
127+
- CI/GitHub Actions can have slower network
128+
- Better to wait longer than skip content
129+
- Still prevents indefinite hangs
130+
131+
---
132+
133+
### Phase 2: Fix Core Logic (Medium Priority)
134+
135+
#### Task 2.1: Update `filterChangedPages` Function
136+
**File:** `scripts/notion-fetch/pageMetadataCache.ts:179-206`
137+
138+
**Problem:** Missing path change detection
139+
140+
**Solution:** Add optional callback parameter for path resolution
141+
142+
**New Signature:**
143+
```typescript
144+
export function filterChangedPages<
145+
T extends { id: string; last_edited_time: string },
146+
>(
147+
pages: T[],
148+
cache: PageMetadataCache | null,
149+
options?: {
150+
/**
151+
* Callback to get the current file path for a page.
152+
* If provided, pages with changed paths will be marked as needing update.
153+
*/
154+
getFilePath?: (page: T) => string;
155+
}
156+
): T[]
157+
```
158+
159+
**New Logic:**
160+
```typescript
161+
return pages.filter((page) => {
162+
const cached = cache.pages[page.id];
163+
164+
// New page - not in cache
165+
if (!cached) {
166+
return true;
167+
}
168+
169+
// If any expected outputs are missing, force regeneration
170+
if (hasMissingOutputs(cache, page.id)) {
171+
return true;
172+
}
173+
174+
// NEW: Check if path changed (only if callback provided)
175+
if (options?.getFilePath) {
176+
const currentPath = options.getFilePath(page);
177+
if (!cached.outputPaths?.includes(currentPath)) {
178+
return true;
179+
}
180+
}
181+
182+
// Compare timestamps
183+
const notionTime = new Date(page.last_edited_time).getTime();
184+
const cachedTime = new Date(cached.lastEdited).getTime();
185+
186+
return notionTime > cachedTime;
187+
});
188+
```
189+
190+
**Benefits:**
191+
- Backwards compatible (options parameter is optional)
192+
- Matches inline logic behavior
193+
- Can be used for upfront filtering if needed
194+
- Fixes potential future bug
195+
196+
#### Task 2.2: Update Tests
197+
**File:** `scripts/notion-fetch/__tests__/pageMetadataCache.test.ts`
198+
199+
**Add new test:**
200+
```typescript
201+
describe("filterChangedPages with path changes", () => {
202+
it("should include pages when path changes even if timestamp unchanged", () => {
203+
const pages = [
204+
{ id: "page-1", last_edited_time: "2024-01-01T00:00:00.000Z" },
205+
];
206+
207+
const cache: PageMetadataCache = {
208+
version: CACHE_VERSION,
209+
scriptHash: "test",
210+
lastSync: "2024-01-01",
211+
pages: {
212+
"page-1": {
213+
lastEdited: "2024-01-01T00:00:00.000Z",
214+
outputPaths: ["/docs/old-path.md"],
215+
processedAt: "2024-01-01",
216+
},
217+
},
218+
};
219+
220+
// Path changed from /docs/old-path.md to /docs/new-path.md
221+
const result = filterChangedPages(pages, cache, {
222+
getFilePath: () => "/docs/new-path.md",
223+
});
224+
225+
expect(result).toHaveLength(1);
226+
expect(result[0].id).toBe("page-1");
227+
});
228+
});
229+
```
230+
231+
---
232+
233+
### Phase 3: Add Cache Diagnostics (Low Priority)
234+
235+
#### Task 3.1: Add Cache Validation Command
236+
**New File:** `scripts/notion-fetch/validateCache.ts`
237+
238+
**Purpose:** Detect and report cache issues
239+
240+
**Features:**
241+
```typescript
242+
export async function validateCache(): Promise<{
243+
valid: boolean;
244+
issues: string[];
245+
}> {
246+
const cache = loadPageMetadataCache();
247+
const issues: string[] = [];
248+
249+
if (!cache) {
250+
return { valid: false, issues: ["No cache found"] };
251+
}
252+
253+
// Check 1: Verify output files exist
254+
for (const [pageId, metadata] of Object.entries(cache.pages)) {
255+
for (const outputPath of metadata.outputPaths) {
256+
if (!fs.existsSync(outputPath)) {
257+
issues.push(
258+
`Missing output file for page ${pageId}: ${outputPath}`
259+
);
260+
}
261+
}
262+
}
263+
264+
// Check 2: Find orphaned output files
265+
const cachedPaths = new Set(
266+
Object.values(cache.pages).flatMap(p => p.outputPaths)
267+
);
268+
// Scan docs/ for .md files not in cache...
269+
270+
// Check 3: Verify script hash is current
271+
const currentHash = await computeScriptHash();
272+
if (cache.scriptHash !== currentHash.hash) {
273+
issues.push(
274+
`Script hash mismatch (cache: ${cache.scriptHash.slice(0, 8)}, current: ${currentHash.hash.slice(0, 8)})`
275+
);
276+
}
277+
278+
return {
279+
valid: issues.length === 0,
280+
issues,
281+
};
282+
}
283+
```
284+
285+
**CLI Command:**
286+
```bash
287+
bun run notion:validate-cache
288+
```
289+
290+
---
291+
292+
## Testing Plan
293+
294+
### Test 1: Verify Skip Logging
295+
```bash
296+
# Run with incremental sync enabled
297+
bun run notion:fetch-all
298+
299+
# Expected: See detailed reasons for each skipped page
300+
# Example output:
301+
# ⏭️ Skipping page: Getting Started
302+
# Reason: unchanged since 2025-11-26T10:00:00.000Z
303+
```
304+
305+
### Test 2: Verify Timeout Increase
306+
```bash
307+
# Monitor for "Skipping sub-page" warnings
308+
bun run notion:fetch-all 2>&1 | grep -i "skipping sub-page"
309+
310+
# Expected: Fewer timeout-related skips
311+
```
312+
313+
### Test 3: Test Path Change Detection
314+
```bash
315+
# Manually edit cache to have wrong path
316+
# Run fetch
317+
# Expected: Page is regenerated despite unchanged timestamp
318+
```
319+
320+
### Test 4: Dry Run Validation
321+
```bash
322+
# Run dry run to see what would be processed
323+
bun run notion:fetch-all --dry-run
324+
325+
# Expected: Clear indication of what will be skipped and why
326+
```
327+
328+
---
329+
330+
## Rollback Plan
331+
332+
If issues occur after implementation:
333+
334+
1. **Revert logging changes:** Safe, can be undone without side effects
335+
2. **Revert timeout increase:** Restore to 10000ms if causing other issues
336+
3. **Revert filterChangedPages changes:** Backwards compatible, can revert
337+
4. **Force full rebuild:** Run with `--force` flag to bypass cache entirely
338+
339+
---
340+
341+
## Success Metrics
342+
343+
1. **Reduced Skip Complaints:** Issue #95 should be resolved
344+
2. **Better Diagnostics:** Team can quickly identify why pages are skipped
345+
3. **Fewer Timeouts:** Sub-page timeout warnings decrease
346+
4. **No Regressions:** All pages that should update do update
347+
348+
---
349+
350+
## Documentation Updates
351+
352+
After implementation, update:
353+
354+
1. **`context/workflows/notion-commands.md`**: Document new logging format
355+
2. **`context/development/roadmap.md`**: Mark issue as resolved
356+
3. **`NOTION_FETCH_ARCHITECTURE.md`**: Add section on skip logic
357+
4. **`CLAUDE.md`**: Add debugging tips for cache issues
358+
359+
---
360+
361+
## Timeline
362+
363+
- **Phase 1 (Immediate):** 1-2 hours (logging + timeout)
364+
- **Phase 2 (Core Fix):** 2-3 hours (function update + tests)
365+
- **Phase 3 (Diagnostics):** 2-4 hours (optional, can defer)
366+
- **Testing:** 1 hour
367+
- **Documentation:** 30 minutes
368+
369+
**Total Estimated Time:** 6-10 hours
370+
371+
---
372+
373+
## Notes
374+
375+
- Incremental sync was just added yesterday (2025-11-26)
376+
- This issue likely surfaced immediately after that change
377+
- The fixes maintain backwards compatibility
378+
- All changes are testable with `--dry-run` flag

0 commit comments

Comments
 (0)