Skip to content

Commit 7a63eb4

Browse files
authored
Fix hyperlinked images not rendering in markdown (#98)
* fix(notion-fetch): preserve hyperlinks on images from Notion Images that are hyperlinked in Notion were losing their links during the markdown conversion process. This fix adds: 1. Custom image transformer in notionClient.ts that detects hyperlinks on Notion image blocks (from caption rich_text or link property) and wraps images in markdown link syntax: [![alt](img-url)](link-url) 2. Enhanced image extraction regex in imageReplacer.ts to handle both: - Regular images: ![alt](url) - Hyperlinked images: [![alt](img-url)](link-url) 3. Updated image replacement logic to preserve hyperlink wrappers when replacing Notion image URLs with local paths 4. Comprehensive tests for hyperlinked image handling Fixes #96 * fix(notion-fetch): improve hyperlink detection in image captions Based on research into Notion's linking behavior, hyperlinks on images are stored in the caption as link annotations. Updated the image transformer to: 1. Check caption rich_text items for link annotations first 2. Extract the link URL from the first linked caption item 3. Use non-linked caption text as alt text 4. Add fallback checks for image.link and block-level link properties 5. Add debug logging to help diagnose hyperlink detection issues This should correctly detect and preserve image hyperlinks from Notion. Related to #96 * feat(notion-fetch): add workaround for image hyperlinks via captions Since Notion's "Add link" menu option doesn't expose hyperlinks via the API, implement a workaround that detects URLs in image captions: Method 1: Link annotations in caption rich_text - Detects when URLs are formatted as links in captions - Uses the link.url property from the rich_text annotation Method 2: Plain text URL detection (NEW) - Detects URLs typed as plain text in captions - Uses regex to extract http(s) URLs: /https?:\/\/[^\s]+/ - Separates URL from alt text Method 3 & 4: Future-proofing - Checks for image.link and block.link properties - Will work if Notion adds API support in the future Users can now make images clickable by: 1. Typing a URL in the image caption (preferred workaround) 2. Pasting a link in the caption (Notion converts it automatically) Note: The Notion UI "Add link" feature is not supported by the API. Images using that feature will not have clickable links in the output. Related to #96 * refactor(notion-fetch): clean up console logging in image transformer - Wrap console.log in IS_TEST_ENV check for cleaner test output - Use consistent ✓ prefix for success messages - Shorten log message for brevity
1 parent 6b95a02 commit 7a63eb4

File tree

3 files changed

+278
-8
lines changed

3 files changed

+278
-8
lines changed

scripts/notion-fetch/imageReplacer.test.ts

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,63 @@ Some text
159159
"![alt text](https://example.com/image.png)"
160160
);
161161
});
162+
163+
it("should extract hyperlinked images", () => {
164+
const markdown =
165+
"[![alt text](https://example.com/image.png)](https://example.com/link)";
166+
const matches = extractImageMatches(markdown);
167+
168+
expect(matches).toHaveLength(1);
169+
expect(matches[0]).toMatchObject({
170+
alt: "alt text",
171+
url: "https://example.com/image.png",
172+
linkUrl: "https://example.com/link",
173+
idx: 0,
174+
});
175+
expect(matches[0].full).toBe(
176+
"[![alt text](https://example.com/image.png)](https://example.com/link)"
177+
);
178+
});
179+
180+
it("should extract both regular and hyperlinked images", () => {
181+
const markdown = `
182+
![regular](https://example.com/regular.png)
183+
[![hyperlinked](https://example.com/linked.png)](https://example.com)
184+
`;
185+
const matches = extractImageMatches(markdown);
186+
187+
expect(matches).toHaveLength(2);
188+
189+
// Regular image
190+
expect(matches[0].alt).toBe("regular");
191+
expect(matches[0].url).toBe("https://example.com/regular.png");
192+
expect(matches[0].linkUrl).toBeUndefined();
193+
194+
// Hyperlinked image
195+
expect(matches[1].alt).toBe("hyperlinked");
196+
expect(matches[1].url).toBe("https://example.com/linked.png");
197+
expect(matches[1].linkUrl).toBe("https://example.com");
198+
});
199+
200+
it("should handle hyperlinked images with escaped parentheses", () => {
201+
const markdown =
202+
"[![alt](https://example.com/image\\).png)](https://link.com/page\\))";
203+
const matches = extractImageMatches(markdown);
204+
205+
expect(matches).toHaveLength(1);
206+
expect(matches[0].url).toBe("https://example.com/image).png");
207+
expect(matches[0].linkUrl).toBe("https://link.com/page)");
208+
});
209+
210+
it("should handle hyperlinked images with empty alt text", () => {
211+
const markdown =
212+
"[![](https://example.com/image.png)](https://example.com/link)";
213+
const matches = extractImageMatches(markdown);
214+
215+
expect(matches).toHaveLength(1);
216+
expect(matches[0].alt).toBe("");
217+
expect(matches[0].linkUrl).toBe("https://example.com/link");
218+
});
162219
});
163220

164221
describe("processAndReplaceImages", () => {
@@ -385,5 +442,51 @@ Some text after
385442
expect(result.metrics).toHaveProperty("skippedResize");
386443
expect(result.metrics).toHaveProperty("fullyProcessed");
387444
});
445+
446+
it("should preserve hyperlinks when replacing image URLs", async () => {
447+
const markdown =
448+
"[![alt text](https://example.com/image.png)](https://example.com/link)";
449+
const result = await processAndReplaceImages(markdown, "test-file");
450+
451+
// Should replace the image URL but keep the hyperlink wrapper
452+
expect(result.markdown).toContain("/images/downloaded-image.png");
453+
expect(result.markdown).toContain("https://example.com/link");
454+
expect(result.markdown).toBe(
455+
"[![alt text](/images/downloaded-image.png)](https://example.com/link)"
456+
);
457+
expect(result.stats.successfulImages).toBe(1);
458+
});
459+
460+
it("should handle multiple hyperlinked images", async () => {
461+
const markdown = `
462+
[![img1](https://example.com/1.png)](https://link1.com)
463+
[![img2](https://example.com/2.png)](https://link2.com)
464+
`;
465+
const result = await processAndReplaceImages(markdown, "test-file");
466+
467+
expect(result.stats.successfulImages).toBe(2);
468+
expect(result.markdown).toContain(
469+
"[![img1](/images/downloaded-1.png)](https://link1.com)"
470+
);
471+
expect(result.markdown).toContain(
472+
"[![img2](/images/downloaded-2.png)](https://link2.com)"
473+
);
474+
});
475+
476+
it("should handle mix of regular and hyperlinked images", async () => {
477+
const markdown = `
478+
![regular](https://example.com/regular.png)
479+
[![linked](https://example.com/linked.png)](https://example.com)
480+
`;
481+
const result = await processAndReplaceImages(markdown, "test-file");
482+
483+
expect(result.stats.successfulImages).toBe(2);
484+
expect(result.markdown).toContain(
485+
"![regular](/images/downloaded-regular.png)"
486+
);
487+
expect(result.markdown).toContain(
488+
"[![linked](/images/downloaded-linked.png)](https://example.com)"
489+
);
490+
});
388491
});
389492
});

scripts/notion-fetch/imageReplacer.ts

Lines changed: 71 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ import { ProgressTracker } from "./progressTracker";
2929
* Image match information extracted from markdown
3030
*/
3131
export interface ImageMatch {
32-
/** Full markdown image syntax */
32+
/** Full markdown image syntax (including link wrapper if present) */
3333
full: string;
3434
/** Image URL */
3535
url: string;
@@ -41,6 +41,8 @@ export interface ImageMatch {
4141
start: number;
4242
/** End position in source markdown */
4343
end: number;
44+
/** Hyperlink URL if image is wrapped in a link */
45+
linkUrl?: string;
4446
}
4547

4648
/**
@@ -76,22 +78,60 @@ const MAX_CONCURRENT_IMAGES = 5;
7678
/**
7779
* Extracts all image matches from markdown content
7880
*
79-
* Uses an improved regex pattern that:
80-
* - Matches until ')' not preceded by '\'
81-
* - Allows spaces (trimmed)
82-
* - Handles escaped parentheses in URLs
81+
* Handles both regular images and hyperlinked images:
82+
* - Regular: ![alt](url)
83+
* - Hyperlinked: [![alt](img-url)](link-url)
84+
*
85+
* Uses improved regex patterns that:
86+
* - Match until ')' not preceded by '\'
87+
* - Allow spaces (trimmed)
88+
* - Handle escaped parentheses in URLs
8389
*
8490
* @param sourceMarkdown - Source markdown content
8591
* @returns Array of image matches with position information
8692
*/
8793
export function extractImageMatches(sourceMarkdown: string): ImageMatch[] {
88-
// Improved URL pattern: match until a ')' not preceded by '\', allow spaces trimmed
89-
const imgRegex = /!\[([^\]]*)\]\(\s*((?:\\\)|[^)])+?)\s*\)/g;
9094
const imageMatches: ImageMatch[] = [];
91-
let m: RegExpExecArray | null;
9295
let tmpIndex = 0;
9396
let safetyCounter = 0;
9497

98+
// First, extract hyperlinked images: [![alt](img-url)](link-url)
99+
const hyperlinkedImgRegex =
100+
/\[!\[([^\]]*)\]\(\s*((?:\\\)|[^)])+?)\s*\)\]\(\s*((?:\\\)|[^)])+?)\s*\)/g;
101+
let m: RegExpExecArray | null;
102+
103+
while ((m = hyperlinkedImgRegex.exec(sourceMarkdown)) !== null) {
104+
if (++safetyCounter > SAFETY_LIMIT) {
105+
console.warn(
106+
chalk.yellow(
107+
`⚠️ Image match limit (${SAFETY_LIMIT}) reached; skipping remaining.`
108+
)
109+
);
110+
break;
111+
}
112+
const start = m.index;
113+
const full = m[0];
114+
const end = start + full.length;
115+
const rawImgUrl = m[2];
116+
const rawLinkUrl = m[3];
117+
const unescapedImgUrl = rawImgUrl.replace(/\\\)/g, ")");
118+
const unescapedLinkUrl = rawLinkUrl.replace(/\\\)/g, ")");
119+
120+
imageMatches.push({
121+
full,
122+
url: unescapedImgUrl,
123+
alt: m[1],
124+
idx: tmpIndex++,
125+
start,
126+
end,
127+
linkUrl: unescapedLinkUrl,
128+
});
129+
}
130+
131+
// Then, extract regular images: ![alt](url)
132+
// But skip positions already matched by hyperlinked images
133+
const imgRegex = /!\[([^\]]*)\]\(\s*((?:\\\)|[^)])+?)\s*\)/g;
134+
95135
while ((m = imgRegex.exec(sourceMarkdown)) !== null) {
96136
if (++safetyCounter > SAFETY_LIMIT) {
97137
console.warn(
@@ -101,11 +141,23 @@ export function extractImageMatches(sourceMarkdown: string): ImageMatch[] {
101141
);
102142
break;
103143
}
144+
104145
const start = m.index;
105146
const full = m[0];
106147
const end = start + full.length;
148+
149+
// Skip if this position overlaps with a hyperlinked image
150+
const overlaps = imageMatches.some(
151+
(existing) => start >= existing.start && start < existing.end
152+
);
153+
154+
if (overlaps) {
155+
continue;
156+
}
157+
107158
const rawUrl = m[2];
108159
const unescapedUrl = rawUrl.replace(/\\\)/g, ")");
160+
109161
imageMatches.push({
110162
full,
111163
url: unescapedUrl,
@@ -116,6 +168,14 @@ export function extractImageMatches(sourceMarkdown: string): ImageMatch[] {
116168
});
117169
}
118170

171+
// Sort by start position to maintain order
172+
imageMatches.sort((a, b) => a.start - b.start);
173+
174+
// Reassign indices after sorting
175+
imageMatches.forEach((match, index) => {
176+
match.idx = index;
177+
});
178+
119179
return imageMatches;
120180
}
121181

@@ -296,6 +356,9 @@ export async function processAndReplaceImages(
296356

297357
let replacementText: string;
298358
if (processResult.success && processResult.newPath) {
359+
// Replace the image URL with the new local path
360+
// This preserves the hyperlink wrapper if present, as match.full
361+
// contains the complete markdown syntax: [![alt](url)](link) or ![alt](url)
299362
replacementText = match.full.replace(
300363
processResult.imageUrl!,
301364
processResult.newPath

scripts/notionClient.ts

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,110 @@ const paragraphTransformer: BlockToMarkdown = async (block) => {
330330

331331
n2m.setCustomTransformer("paragraph", paragraphTransformer);
332332

333+
/**
334+
* Custom image transformer that preserves hyperlinks from Notion.
335+
* When an image has a hyperlink in Notion, this transformer wraps the
336+
* markdown image syntax with a link: [![alt](img-url)](link-url)
337+
*/
338+
const imageTransformer: BlockToMarkdown = async (block) => {
339+
const imageBlock = block as any;
340+
341+
if (imageBlock?.type !== "image") {
342+
return "";
343+
}
344+
345+
const image = imageBlock.image;
346+
if (!image) {
347+
return "";
348+
}
349+
350+
// Get image URL from external or file
351+
const imageUrl = image.external?.url || image.file?.url || image.url || "";
352+
353+
if (!imageUrl) {
354+
return "";
355+
}
356+
357+
// Check if image has a hyperlink
358+
// WORKAROUND: Since Notion's "Add link" feature doesn't expose links via the API,
359+
// we detect URLs in captions as an alternative approach
360+
let linkUrl = "";
361+
let altText = "";
362+
363+
// Method 1: Check for links in caption rich_text (when URL is formatted as a link)
364+
if (image.caption && Array.isArray(image.caption)) {
365+
for (const captionItem of image.caption) {
366+
// Check if this caption item has a link annotation
367+
if (captionItem.type === "text" && captionItem.text?.link?.url) {
368+
linkUrl = captionItem.text.link.url;
369+
if (!IS_TEST_ENV) {
370+
console.log(chalk.green(`✓ Found link in caption: ${linkUrl}`));
371+
}
372+
// Don't use the linked text as alt text - it's the URL destination
373+
break;
374+
} else if (captionItem.plain_text && !linkUrl) {
375+
// Use non-linked caption text as alt text
376+
altText += captionItem.plain_text || "";
377+
}
378+
}
379+
380+
// Method 2: Check for plain text URLs in caption (fallback)
381+
// This catches cases where users type URLs without Notion converting them
382+
if (!linkUrl) {
383+
const fullCaption = image.caption
384+
.map((item: any) => item.plain_text || "")
385+
.join("");
386+
387+
// Simple URL regex to detect http(s) URLs
388+
const urlMatch = fullCaption.match(/https?:\/\/[^\s]+/);
389+
if (urlMatch) {
390+
linkUrl = urlMatch[0];
391+
if (!IS_TEST_ENV) {
392+
console.log(
393+
chalk.green(`✓ Found plain text URL in caption: ${linkUrl}`)
394+
);
395+
}
396+
// Use the rest of the caption as alt text
397+
altText = fullCaption.replace(linkUrl, "").trim();
398+
} else {
399+
// No URL found, use full caption as alt text
400+
altText = fullCaption;
401+
}
402+
}
403+
}
404+
405+
// Method 3: Check for dedicated link property on the image object (API support if added)
406+
if (!linkUrl && image.link) {
407+
linkUrl = image.link;
408+
if (!IS_TEST_ENV) {
409+
console.log(chalk.green(`✓ Found image link property: ${linkUrl}`));
410+
}
411+
}
412+
413+
// Method 4: Check for link on the block level (API support if added)
414+
if (!linkUrl && imageBlock.link) {
415+
linkUrl = imageBlock.link;
416+
if (!IS_TEST_ENV) {
417+
console.log(chalk.green(`✓ Found block-level link: ${linkUrl}`));
418+
}
419+
}
420+
421+
// Generate markdown
422+
const imageMarkdown = `![${altText}](${imageUrl})`;
423+
424+
// If there's a hyperlink, wrap the image in a link
425+
if (linkUrl) {
426+
if (!IS_TEST_ENV) {
427+
console.log(chalk.green(`✓ Creating hyperlinked image: ${linkUrl}`));
428+
}
429+
return `[${imageMarkdown}](${linkUrl})` as MarkdownBlock;
430+
}
431+
432+
return imageMarkdown as MarkdownBlock;
433+
};
434+
435+
n2m.setCustomTransformer("image", imageTransformer);
436+
333437
export const DATABASE_ID = resolvedDatabaseId;
334438

335439
// For v5 API compatibility - export data source ID

0 commit comments

Comments
 (0)