-
-
Notifications
You must be signed in to change notification settings - Fork 183
feat: propagate input loading/collecting errors to top level #1864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I see that this PR contradicts an existing test case which says the current behaviour is intentional. lychee/lychee-bin/tests/cli.rs Lines 1458 to 1462 in 54e425c
Maybe there should be an option about whether to treat absolute links as an error when root-dir is not specified? At the moment, it would not be possible to get the old "ignoring" behaviour after this PR. Or, we just change the test |
we still ned to feed it back to Status and that requires an old ErrorKind. maybe we add a new case to Status.
this slightly winds back the ResolvedInputSource usage and changes Response back to an ordinary InputSource
this shouldn't be needed anymore since we no longer use a recursive case in ErrorKind. instead, we use the new RequestError type.
This reverts commit f0eda83.
|
Thanks for the PR and thanks for the extensive description and thoughts. I like it and this should improve UX quite a bit. Since we now pass errors through the Then it would also be great to add a CLI test which covers the example you provided in this PR |
|
I can try to change Response and report back on how it goes. I should add a CLI test as well. Did you have thoughts about whether absolute links should be flagged as errors in the absence of root-dir/base-url? I noticed in the feature table, there is a tick for the "Skip relative URLs" feature in Lychee. Making root-relative URLs into failures by default would also conflict with this, in addition to the unit test I mentioned earlier. |
|
Regarding the handling of absolute URLs I think it probably make sense to flag them as errors, as you proposed. We are now at quite a different stage compared to when
So for me it makes sense to just change that behaviour, update/remove the test and update/remove the "feature" in the README. |
|
@mre and I had a quick chat about this. We realised that this PR improves the current situation of panicking and crashing (among improving other things like skipping over relative URLs) but also that this new behaviour is not ideal. In our opinion erroneous user inputs should not be handled the same way as link check errors. For example take a look at the exit codes and how we differentiate between user errors and link check errors. We think user provided erroneous arguments should fail fast and make lychee exit before link checking. Ideally we handle errors properly (which happens in this PR but not on master) while exiting early upon encountering user errors. The output could look something like this: So, could you try to keep user errors from making the program fail early instead of moving and converting user errors into link check errors? |
|
It's an interesting perspective, and thanks for discussing it and giving your thoughts. I agree it could do with more thinking. With this PR, however, I think the errors that arise within an input file when resolving a URL are reasonably flagged as link checking errors. I'm interested in getting this part of the feature merged sooner, so would it be okay to revert the input reading errors to the old behaviour and merge the "Error building URL" errors on their own? That said, the changes to the Result type are such that the input reading errors would still have to be passed through the new RequestError::GetInputContent. So, I would make it behave the same on the outside while going through different logic on the inside. About the erroneous inputs case, I think there is a scale. While some inputs are obviously wrong, like an invalid glob syntax, I think that the HTTP and filesystem errors are more nuanced. Unlike an invalid glob, these errors are affected by external state and this has some ramifications.
All this is to say, regardless of whether the inputs are validated beforehand, the link checking phase will need some way to sensibly handle these transient errors. I think this PR achieves that, and that it could complement - rather than replace - any extra early input checking which you want to add. (The glob syntax errors, however, should be handled through stricter checking in the InputSource constructor). To summarise, I want URL building errors the most. For the input loading errors, I believe the current implementation doesn't conflict with your plans, but I'm also happy to scale it back. About implementing the early checks, I think that is best done in another PR (and I'm not the best person to do it, because I don't think special early checking is super beneficial (aside from the glob syntax)). Sorry for the long text, let me know what you want :)) |
now, when providing an input with glob characters but an invalid glob
pattern, this will cause a "cannot parse inputs" error rather than
erroring out later during link checking. this makes use of the
existing ErrorKind::InvalidGlobPattern error case.
for example, a glob with unbalanced brackets is invalid and now looks
like this:
```console
$ cargo run -- 'noasdjfi/['
Error: Cannot parse inputs from arguments
Caused by:
0: UNIX glob pattern is invalid
1: Pattern syntax error near position 9: invalid range pattern
```
i've also added some TODO notes which would make the logic neater.
i didn't do those refactors in this PR because i am mindful of another
currently-open (and much bigger) PR which touches these lines
(https://www.github.com/lycheeverse/lychee/pull/1837)
i wonder if the reason FsGlob stores strings rather than a Pattern is
for flexibility to switch to different glob packages.
this PR was prompted by findings in
lycheeverse#1864 (comment)
|
Thanks for the thoughtful comment.
Yes, splitting this PR up into two separate PRs would be great, if that's possible. Errors that arise within an input file should definitely be flagged as link check errors, as indicated by my previous comment. So it's just the direct user input which we are discussing about and as this might be getting somewhat complicated this would deserve a separate issue and PR. This would allow us to quickly merge the part you are most interested in and then we could continue on discussions with user input handling where you didn't need to participate when you mention you aren't as interested in that area.
Thanks, that makes sense. Maybe I was focusing too much on "early exit" in my previous comment. It's true that some errors are impossible to catch "early". However, it is still possible to differentiate between user errors and link-check errors, which makes sense from a UX perspective. For example running I realise that the output example of my previous comment might not be worth the effort. It would be much simpler to exit on the first user error, as is the current behaviour, rather than listing all user errors. In any case it would make sense to split up this PR if possible and potentially create a new issue. |
Right, yeah, I see. The old code would've exited with 1 because rust uses 1 for an Err returned from the main function. I think the error handling just looked so noisy that I hadn't considered that some aspects of it could be desired. I'll see what I can do. It'll need some way to distinguish input input errors from discovered input errors - maybe an extra case in RequestError. |
fcdf77c to
e0912ab
Compare
…rrors Conflicts: lychee-lib/src/utils/request.rs
should this just be a boolean in GetInputContent case?
`make lint` fails
error[E0658]: `let` expressions in this position are unstable
--> /home/x/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wiremock-0.6.5/src/matchers.rs:214:12
|
214 | if let Ok(url) = Url::parse(&path)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
2cb4afd to
e22f13c
Compare
adjusted existing test case `test_ignore_absolute_local_links_without_base`
should it be replaced with something else instead?
|
Bloop! I think the review comments should be addressed now. I've brought back the early exit with code 1 for failing to load CLI inputs, and I've updated the tests and readme to match the new error behaviour for InvalidBaseJoin. In the implementation, the early user errors should behave the same as before, but the path they go through is changed. Now, they are packaged into a RequestError::UserInputContent and they get sent to the request channel task before being sent back to the main thread as an error. Some other misc notes:
|
previously, errors which happened while collecting links (e.g., due to invalid base join) would print a terse message to the console and then they would be forgotten. this looked like:
this is unhelpful because it only shows the internal enum name and it's not reported in the final error count, so it's very easy to miss.
now, these "early" errors get propagated into the final error list and displayed alongside the HTTP failures.
this makes them more obvious and lets us make use of the existing functions for displaying error details and suggested fixes.
this is implemented by changing the request-constructing functions to return a
Result<_, RequestError>rather thanResult<_, ErrorKind>. RequestError is a new error type which contains information about the URL or input source which caused the error. it also contains the underlying ErrorKind.lychee/lychee-lib/src/types/request.rs
Lines 7 to 19 in 976904a
i think it's nice to have it as a new error type because it makes it clear to the user that they only have to handle these two errors, and it avoids making the ErrorKind type recursive.
this makes it easier to handle the returned error. previously, if an Err did occur in the collected request stream, this would lead to an unexpected early exit and a tokio error. so, i think that changing this to
Result<_, RequestError>is no great burden because lychee-bin did not have very good handling of the old ErrorKind.some more examples of the new error messages:
compared to before (below). also note that the old behaviour is nondeterministic. any one of the errors could be the one which is printed, and it aborts the program with no other output.
this implements step (2) of the plan here #1624 (comment)
outdated commentary. i changed it to a new RequestError type now
however, i don't like the way this is implemented. it has to "smuggle" the early errors through a new error case,
and all of the commands have to deliberately handle this case by bypassing the check logic and directly constructing a failed Response. it would be very easy for a user to forget to do this. originally, i wanted to make this new error separate from the usual ErrorKind, but the
lychee_lib::Result<T>type with ErrorKind is so pervasive and it would've needed extensive changes.maybe, instead, i could embed this into the Request type by making its Uri field into a
Result<Uri, CreatRequestError>but that seems not great too.another downside of the current approach is it uses the fake
error://URL to display these messages. this is because the ResponseBody needs a "valid" Uri.TODO:
also, along the way, this changes the
ErrorKind::detailscases for invalid path and invalid base join. these details previously just repeated the error message and would look like "Cannot convert path '/../../' to a URI: Cannot convert path to URI: '/../../'. Check path format".related to #1265