feat: propagate input loading/collecting errors to top level #1864

katrinafyi · 2025-10-04T10:09:16Z

previously, errors which happened while collecting links (e.g., due to invalid base join) would print a terse message to the console and then they would be forgotten. this looked like:

[WARN] Error creating request: InvalidPathToUri("/relative-link/")
🔍 0 Total (in 0s) ✅ 0 OK 🚫 0 Errors

this is unhelpful because it only shows the internal enum name and it's not reported in the final error count, so it's very easy to miss.

now, these "early" errors get propagated into the final error list and displayed alongside the HTTP failures.

[a.md]:
   [ERROR] error:// | Error building URL for "/relative-link/" (Attribute: Some("href")): Cannot convert path '/relative-link/' to a URI: To resolve relative links in local files, provide a root dir

🔍 1 Total (in 0s) ✅ 0 OK 🚫 1 Error

this makes them more obvious and lets us make use of the existing functions for displaying error details and suggested fixes.

this is implemented by changing the request-constructing functions to return a Result<_, RequestError> rather than Result<_, ErrorKind>. RequestError is a new error type which contains information about the URL or input source which caused the error. it also contains the underlying ErrorKind.

lychee/lychee-lib/src/types/request.rs

Lines 7 to 19 in 976904a

    
           /// An error which occurs while trying to construct a [`Request`] object. 
        
           /// That is, an error which happens while trying to load links from an input 
        
           /// source. 
        
           #[derive(Error, Debug, PartialEq, Eq, Hash)] 
        
           pub enum RequestError { 
        
               /// Unable to construct a URL for a link appearing within the given source. 
        
               #[error("Error building URL for {0}: {2}")] 
        
               CreateRequestItem(RawUri, ResolvedInputSource, #[source] Box<ErrorKind>), 
        
               /// Unable to load the content of an input source. 
        
               #[error("Error reading input '{0}': {1}")] 
        
               GetInputContent(InputSource, #[source] Box<ErrorKind>), 
        
           }

i think it's nice to have it as a new error type because it makes it clear to the user that they only have to handle these two errors, and it avoids making the ErrorKind type recursive.

this makes it easier to handle the returned error. previously, if an Err did occur in the collected request stream, this would lead to an unexpected early exit and a tokio error. so, i think that changing this to Result<_, RequestError> is no great burden because lychee-bin did not have very good handling of the old ErrorKind.

some more examples of the new error messages:

$ cargo run -- https://google.com 'noasdjfi/[' non-existing-fjdaifdsjai.com no-perms https://example.com 
[http://non-existing-fjdaifdsjai.com/]:
   [ERROR] error:// | Error reading input 'http://non-existing-fjdaifdsjai.com/': Network error: Connection failed. Check network connectivity and firewall settings

[https://google.com/]:
     [404] https://google.com/images/branding/googlelogo/1x/googlelogo_white_background_color_272x92dp.png | Rejected status code (this depends on your "accept" configuration): Not Found

[no-perms]:
   [ERROR] error:// | Error reading input 'no-perms': Cannot traverse input directory: no-perms: IO error for operation on no-perms: Permission denied (os error 13): Directory traversal failed: no-perms: IO error for operation on no-perms: Permission denied (os error 13). Check directory permissions

[noasdjfi/[]:
   [ERROR] error:// | Error reading input 'noasdjfi/[': UNIX glob pattern is invalid: Invalid glob pattern: Pattern syntax error near position 9: invalid range pattern. Check pattern syntax

🔍 33 Total (in 4s) ✅ 5 OK 🚫 4 Errors 🔀 24 Redirects

compared to before (below). also note that the old behaviour is nondeterministic. any one of the errors could be the one which is printed, and it aborts the program with no other output.

$ cargo run -- https://google.com 'noasdjfi/[' non-existing-fjdaifdsjai.com no-perms 
thread 'tokio-runtime-worker' panicked at lychee-bin/src/commands/check.rs:256:18:
cannot send response to queue: SendError { .. }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: UNIX glob pattern is invalid

Caused by:
    Pattern syntax error near position 9: invalid range pattern

this implements step (2) of the plan here #1624 (comment)

outdated commentary. i changed it to a new RequestError type now

however, i don't like the way this is implemented. it has to "smuggle" the early errors through a new error case,

ErrorKind::CreateRequestItem(RawUri, ResolvedInputSource, Box<ErrorKind>)

and all of the commands have to deliberately handle this case by bypassing the check logic and directly constructing a failed Response. it would be very easy for a user to forget to do this. originally, i wanted to make this new error separate from the usual ErrorKind, but the lychee_lib::Result<T> type with ErrorKind is so pervasive and it would've needed extensive changes.

maybe, instead, i could embed this into the Request type by making its Uri field into a Result<Uri, CreatRequestError> but that seems not great too.

another downside of the current approach is it uses the fake error:// URL to display these messages. this is because the ResponseBody needs a "valid" Uri.

TODO:

tests. fix old and add new.
move check handling into handle function

also, along the way, this changes the ErrorKind::details cases for invalid path and invalid base join. these details previously just repeated the error message and would look like "Cannot convert path '/../../' to a URI: Cannot convert path to URI: '/../../'. Check path format".

related to #1265

This reverts commit a9fe9ad.

This reverts commit f7488dd.

This reverts commit 963b276.

katrinafyi · 2025-10-04T10:50:19Z

I see that this PR contradicts an existing test case which says the current behaviour is intentional.

lychee/lychee-bin/tests/cli.rs

Lines 1458 to 1462 in 54e425c

    
               /// If `base-dir` is not set, don't throw an error in case we encounter 
        
               /// an absolute local link (e.g. `/about`) within a file. 
        
               /// Instead, simply ignore the link. 
        
               #[test] 
        
               fn test_ignore_absolute_local_links_without_base() -> Result<()> {

Maybe there should be an option about whether to treat absolute links as an error when root-dir is not specified? At the moment, it would not be possible to get the old "ignoring" behaviour after this PR. Or, we just change the test

we still ned to feed it back to Status and that requires an old ErrorKind. maybe we add a new case to Status.

this slightly winds back the ResolvedInputSource usage and changes Response back to an ordinary InputSource

this shouldn't be needed anymore since we no longer use a recursive case in ErrorKind. instead, we use the new RequestError type.

This reverts commit f0eda83.

thomas-zahner · 2025-10-09T07:49:42Z

Thanks for the PR and thanks for the extensive description and thoughts. I like it and this should improve UX quite a bit.

Since we now pass errors through the Response struct we might consider making Uri an Option<Uri> so we could get rid of this sentinel value error://. But I'm not sure if it's really worth it.

Then it would also be great to add a CLI test which covers the example you provided in this PR cargo run -- https://google.com 'noasdjfi/[' non-existing-fjdaifdsjai.com no-perms https://example.com. To assert that lychee reports the errors as expected etc.

katrinafyi · 2025-10-09T07:55:21Z

I can try to change Response and report back on how it goes. I should add a CLI test as well.

Did you have thoughts about whether absolute links should be flagged as errors in the absence of root-dir/base-url? I noticed in the feature table, there is a tick for the "Skip relative URLs" feature in Lychee. Making root-relative URLs into failures by default would also conflict with this, in addition to the unit test I mentioned earlier.

thomas-zahner · 2025-10-09T10:05:52Z

Regarding the handling of absolute URLs I think it probably make sense to flag them as errors, as you proposed. We are now at quite a different stage compared to when Skip relative URLs was added to the feature table. This Skip relative URLs "feature" in the table predates the root-dir feature by many years so it makes sense to reconsider the behaviour. I think an error is much clearer than a warning that users easily glance over and I personally prefer that behaviour.

Maybe there should be an option about whether to treat absolute links as an error when root-dir is not specified? At the moment, it would not be possible to get the old "ignoring" behaviour after this PR. Or, we just change the test

So for me it makes sense to just change that behaviour, update/remove the test and update/remove the "feature" in the README.

thomas-zahner · 2025-10-09T10:21:47Z

@mre and I had a quick chat about this. We realised that this PR improves the current situation of panicking and crashing (among improving other things like skipping over relative URLs) but also that this new behaviour is not ideal. In our opinion erroneous user inputs should not be handled the same way as link check errors. For example take a look at the exit codes and how we differentiate between user errors and link check errors. We think user provided erroneous arguments should fail fast and make lychee exit before link checking.

Ideally we handle errors properly (which happens in this PR but not on master) while exiting early upon encountering user errors. The output could look something like this:

$ lychee https://google.com 'noasdjfi/[' non-existing-fjdaifdsjai.com no-perms https://example.com 

Problems reading these inputs:
- 'http://non-existing-fjdaifdsjai.com/': Network error: Connection failed. Check network connectivity and firewall settings
- 'no-perms': Cannot traverse input directory: no-perms: IO error for operation on no-perms: Permission denied (os error 13)
- 'noasdjfi/[': UNIX glob pattern is invalid: Pattern syntax error near position 9: invalid range pattern. Check pattern syntax

Please check for typos and try again.

So, could you try to keep user errors from making the program fail early instead of moving and converting user errors into link check errors?

katrinafyi · 2025-10-09T12:41:15Z

It's an interesting perspective, and thanks for discussing it and giving your thoughts. I agree it could do with more thinking.

With this PR, however, I think the errors that arise within an input file when resolving a URL are reasonably flagged as link checking errors. I'm interested in getting this part of the feature merged sooner, so would it be okay to revert the input reading errors to the old behaviour and merge the "Error building URL" errors on their own? That said, the changes to the Result type are such that the input reading errors would still have to be passed through the new RequestError::GetInputContent. So, I would make it behave the same on the outside while going through different logic on the inside.

About the erroneous inputs case, I think there is a scale. While some inputs are obviously wrong, like an invalid glob syntax, I think that the HTTP and filesystem errors are more nuanced. Unlike an invalid glob, these errors are affected by external state and this has some ramifications.

For one, they take time and resources to check - this may noticeably delay starting up Lychee.
For two, this means that, even if the inputs are checked preemptively, you must always account for the possibility that the error state happens after the check (for example, a laptop might lose wifi) (this is a TOCTOU-style problem). Lychee must still handle errors sensibly in these situations.
As another nuance, directories can obviously contain other directories. Since using a directory input walks recursively, it's possibly for a deep subdirectory to have no permissions. Would an early check recurse through all subdirectories?
Also, there is the possibility that a Lychee library user does not go through the usual CLI parsing and validating logic.

All this is to say, regardless of whether the inputs are validated beforehand, the link checking phase will need some way to sensibly handle these transient errors. I think this PR achieves that, and that it could complement - rather than replace - any extra early input checking which you want to add. (The glob syntax errors, however, should be handled through stricter checking in the InputSource constructor).

To summarise, I want URL building errors the most. For the input loading errors, I believe the current implementation doesn't conflict with your plans, but I'm also happy to scale it back. About implementing the early checks, I think that is best done in another PR (and I'm not the best person to do it, because I don't think special early checking is super beneficial (aside from the glob syntax)).

Sorry for the long text, let me know what you want :))

now, when providing an input with glob characters but an invalid glob pattern, this will cause a "cannot parse inputs" error rather than erroring out later during link checking. this makes use of the existing ErrorKind::InvalidGlobPattern error case. for example, a glob with unbalanced brackets is invalid and now looks like this: ```console $ cargo run -- 'noasdjfi/[' Error: Cannot parse inputs from arguments Caused by: 0: UNIX glob pattern is invalid 1: Pattern syntax error near position 9: invalid range pattern ``` i've also added some TODO notes which would make the logic neater. i didn't do those refactors in this PR because i am mindful of another currently-open (and much bigger) PR which touches these lines (https://www.github.com/lycheeverse/lychee/pull/1837) i wonder if the reason FsGlob stores strings rather than a Pattern is for flexibility to switch to different glob packages. this PR was prompted by findings in lycheeverse#1864 (comment)

thomas-zahner · 2025-10-10T10:32:45Z

Thanks for the thoughtful comment.

With this PR, however, I think the errors that arise within an input file when resolving a URL are reasonably flagged as link checking errors. I'm interested in getting this part of the feature merged sooner, so would it be okay to revert the input reading errors to the old behaviour and merge the "Error building URL" errors on their own?

Yes, splitting this PR up into two separate PRs would be great, if that's possible. Errors that arise within an input file should definitely be flagged as link check errors, as indicated by my previous comment. So it's just the direct user input which we are discussing about and as this might be getting somewhat complicated this would deserve a separate issue and PR. This would allow us to quickly merge the part you are most interested in and then we could continue on discussions with user input handling where you didn't need to participate when you mention you aren't as interested in that area.

About the erroneous inputs case, I think there is a scale. While some inputs are obviously wrong, like an invalid glob syntax, I think that the HTTP and filesystem errors are more nuanced. Unlike an invalid glob, these errors are affected by external state and this has some ramifications.

Thanks, that makes sense. Maybe I was focusing too much on "early exit" in my previous comment. It's true that some errors are impossible to catch "early". However, it is still possible to differentiate between user errors and link-check errors, which makes sense from a UX perspective. For example running lychee https://invalid.example.com should result in a clear user error with exit code 1 because the user supplied an invalid argument to lychee. (note that this is not the same as echo https://invalid.example.com | lychee -) But this PR moves/converts the error into a link check error with exit code 2 which means that the currently documented exit codes are no longer accurate. We would prefer to keep the separation between the two different types of errors with their respective exit codes.

I realise that the output example of my previous comment might not be worth the effort. It would be much simpler to exit on the first user error, as is the current behaviour, rather than listing all user errors.

In any case it would make sense to split up this PR if possible and potentially create a new issue.

katrinafyi · 2025-10-10T10:42:12Z

For example running lychee https://invalid.example.com should result in a clear user error with exit code 1 because the user supplied an invalid argument to lychee.

Right, yeah, I see. The old code would've exited with 1 because rust uses 1 for an Err returned from the main function. I think the error handling just looked so noisy that I hadn't considered that some aspects of it could be desired.

I'll see what I can do. It'll need some way to distinguish input input errors from discovered input errors - maybe an extra case in RequestError.

…rrors Conflicts: lychee-lib/src/utils/request.rs

should this just be a boolean in GetInputContent case?

`make lint` fails error[E0658]: `let` expressions in this position are unstable --> /home/x/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wiremock-0.6.5/src/matchers.rs:214:12 | 214 | if let Ok(url) = Url::parse(&path) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |

adjusted existing test case `test_ignore_absolute_local_links_without_base`

should it be replaced with something else instead?

katrinafyi · 2025-10-24T11:12:35Z

Bloop! I think the review comments should be addressed now. I've brought back the early exit with code 1 for failing to load CLI inputs, and I've updated the tests and readme to match the new error behaviour for InvalidBaseJoin.

In the implementation, the early user errors should behave the same as before, but the path they go through is changed. Now, they are packaged into a RequestError::UserInputContent and they get sent to the request channel task before being sent back to the main thread as an error.

Some other misc notes:

the Response inputsource is changed from ResolvedInputSource to InputSource, slightly reverting refactor!: use ResolvedInputSource downstream of InputContent #1840. this is because Response has to represent both real HTTP responses and fake "responses" which arise from failed input resolution. this is kind of undesirable
similar to above, i find it's too hard to change ResponseBody's uri field to an Option, just because of how often it's used. so, we have to use the fake error:// URL to represent URL resolving failures

katrinafyi added 10 commits October 4, 2025 18:06

stash

f7488dd

stash very broken

a9fe9ad

Revert "stash very broken"

cef997e

This reverts commit a9fe9ad.

Revert "stash"

25f130b

This reverts commit f7488dd.

feat: add CreateRequestItem error kind

d272a2b

wrap

f5fd8df

smuggle request errors through CreateRequestItem

e4c5482

stash Result with RawUri

963b276

Revert "stash Result with RawUri"

9461325

This reverts commit 963b276.

add help

c1bd7ca

katrinafyi changed the title ~~feat: propagate URL parsing/resolving errors to top level~~ feat: propagate URL parsing/joining errors to top level Oct 4, 2025

katrinafyi added 2 commits October 4, 2025 20:19

add CollectResult to be less dubious

65afcdd

fix lints

891255d

katrinafyi added 13 commits October 4, 2025 20:56

touch

ff87745

introduce separate RequestError type. but...

024772a

we still ned to feed it back to Status and that requires an old ErrorKind. maybe we add a new case to Status.

add Status::RequestError

3ee280e

mark as error

1ad2020

remove ErrorKind::CreateRequestItem case

25d32ce

blah

c7536d1

uncomment

d10190a

restore old panic behaviour for input-source errors

4f74a77

docs

3bca6c8

remove unused imports

49c550e

fix example

6a007c2

clippy. includes boxing errorkind because it's big

d7ebbc0

propagate input loading errors too

17fad2c

this slightly winds back the ResolvedInputSource usage and changes Response back to an ordinary InputSource

katrinafyi changed the title ~~feat: propagate URL parsing/joining errors to top level~~ feat: propagate input loading/collecting errors to top level Oct 7, 2025

katrinafyi and others added 2 commits October 7, 2025 16:53

handle is no longer fallible. add helper function

976904a

Delete clippy.toml

f0eda83

this shouldn't be needed anymore since we no longer use a recursive case in ErrorKind. instead, we use the new RequestError type.

katrinafyi added 2 commits October 9, 2025 17:32

Revert "Delete clippy.toml"

704ad98

This reverts commit f0eda83.

explain ignore-interior-mutability

e5152cc

katrinafyi mentioned this pull request Oct 9, 2025

refactor: check glob validity when parsing input source arguments #1869

Open

thomas-zahner force-pushed the master branch 2 times, most recently from fcdf77c to e0912ab Compare October 21, 2025 12:53

katrinafyi added 7 commits October 23, 2025 23:16

Merge remote-tracking branch 'upstream/master' into propagate-early-e…

6c959f3

…rrors Conflicts: lychee-lib/src/utils/request.rs

fix compilation

eef8b78

review: add new error case for user-provided input failures, but

1a1b5df

should this just be a boolean in GetInputContent case?

lint

f331dbe

move UserInputContent case into fn handle

5716ee9

add early checking for file and dir permissions (and fmt 🙊)

e22f13c

katrinafyi force-pushed the propagate-early-errors branch from 2cb4afd to e22f13c Compare October 24, 2025 05:25

katrinafyi added 4 commits October 24, 2025 15:25

lint

5cc0c1b

use DirTraversal error for dir failures

45906e8

update tests, including adjusting existing

2397577

adjusted existing test case `test_ignore_absolute_local_links_without_base`

remove "Skip relative URLs" from readme feature table

802427b

should it be replaced with something else instead?

katrinafyi mentioned this pull request Oct 31, 2025

new srcset parsing breaks on sentry-docs #1190

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: propagate input loading/collecting errors to top level #1864

feat: propagate input loading/collecting errors to top level #1864

Uh oh!

katrinafyi commented Oct 4, 2025 •

edited

Loading

Uh oh!

katrinafyi commented Oct 4, 2025

Uh oh!

thomas-zahner commented Oct 9, 2025

Uh oh!

katrinafyi commented Oct 9, 2025

Uh oh!

thomas-zahner commented Oct 9, 2025

Uh oh!

thomas-zahner commented Oct 9, 2025

Uh oh!

katrinafyi commented Oct 9, 2025 •

edited

Loading

Uh oh!

thomas-zahner commented Oct 10, 2025 •

edited

Loading

Uh oh!

katrinafyi commented Oct 10, 2025 •

edited

Loading

Uh oh!

katrinafyi commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	/// An error which occurs while trying to construct a [`Request`] object.
	/// That is, an error which happens while trying to load links from an input
	/// source.
	#[derive(Error, Debug, PartialEq, Eq, Hash)]
	pub enum RequestError {
	/// Unable to construct a URL for a link appearing within the given source.
	#[error("Error building URL for {0}: {2}")]
	CreateRequestItem(RawUri, ResolvedInputSource, #[source] Box<ErrorKind>),

	/// Unable to load the content of an input source.
	#[error("Error reading input '{0}': {1}")]
	GetInputContent(InputSource, #[source] Box<ErrorKind>),
	}

Uh oh!

feat: propagate input loading/collecting errors to top level #1864

Are you sure you want to change the base?

feat: propagate input loading/collecting errors to top level #1864

Uh oh!

Conversation

katrinafyi commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

katrinafyi commented Oct 4, 2025

Uh oh!

thomas-zahner commented Oct 9, 2025

Uh oh!

katrinafyi commented Oct 9, 2025

Uh oh!

thomas-zahner commented Oct 9, 2025

Uh oh!

thomas-zahner commented Oct 9, 2025

Uh oh!

katrinafyi commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomas-zahner commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

katrinafyi commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

katrinafyi commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

katrinafyi commented Oct 4, 2025 •

edited

Loading

katrinafyi commented Oct 9, 2025 •

edited

Loading

thomas-zahner commented Oct 10, 2025 •

edited

Loading

katrinafyi commented Oct 10, 2025 •

edited

Loading