-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add a doc_comments_missing_terminal_punctuation
lint
#15758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add a doc_comments_missing_terminal_punctuation
lint
#15758
Conversation
Lintcheck changes for 0df651d
This comment will be updated if you push new changes |
48a2590
to
1025af5
Compare
Are the lint warnings found by the lintcheck check something that should be addressed in any way or is it just to get an idea of what it looks like to run the new lint on popular crates? |
Some changes occurred in clippy_lints/src/doc cc @notriddle |
r? @Alexendoo rustbot has assigned @Alexendoo. Use |
This seems hard to maintain. I would prefer if it was based on the actual markdown parser instead of doing it like this. I've rewritten a version, that you can see here: AudaciousAxiom/rust-clippy@feat/doc_comments_missing_terminal_punctuation...notriddle:rust-clippy:feat/doc_comments_missing_terminal_punctuation If it looks better to everyone else, we can push it to your branch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@notriddle Thanks a lot for spending time on this! I did not go the parser route initially in part because of performance concerns, as the happy path (when there is punctuation) can be very fast, while this otherwise requires parsing doc comment all the time.
I have pushed your version to my branch, and modified it to handle sentences in parentheses, and exclude #[doc]
attributes and doc comments that end with a list/code block/table to avoid false positives as much as possible. Clearly this looks cleaner and more maintainable, and functionally better (handling autolinks properly seems very useful).
My only concern is possibly performance as currently that makes the Markdown parser run on every doc comment, but it should be trivial to short-circuit early if the doc comment ends with a period, before triggering the parser, if we realize later that performance is an issue.
If this looks good to you, I can reorganize the tests, and later rewrite the history if you want.
text_offset = Some(offset.end); | ||
}, | ||
Event::Text(..) if no_report_depth == 0 && !offset.is_empty() => { | ||
if doc_string[..offset.end].trim_end().ends_with(')') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@notriddle This now only checks for parentheses, I'm not sure why you were trimming brackets and curly braces as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually remember what my reasoning was, either...
TagEnd::CodeBlock | TagEnd::Heading(_) | TagEnd::HtmlBlock | TagEnd::List(_) | TagEnd::Table, | ||
) => { | ||
no_report_depth -= 1; | ||
text_offset = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split this arm so text_offset
gets overwritten if these constructs are at the end of the doc comments (and added some tests for these cases).
/// ``` | ||
#[doc = ""] | ||
/// ``` | ||
struct DocAttribute; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opted for simply ignoring interleaved #[doc]
attributes to reduce false positives, and removed the _unfixable
test file.
enum Trailers { | ||
/// (Sometimes the last sentence is in parentheses, and that's ok.) | ||
ParensPassing, | ||
/// (But sometimes it is missing a period.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is now properly handled: the period is added inside the parentheses instead of outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this example:
/// Perform a binary search, returning the first element of the list
/// that is equal (Ok) or the position where the element should be
/// inserted (Err)
fn binary_search<T: Eq>(slice: &[T], elem: T) -> Result<usize, usize> {
unimplemented!();
}
"(Err)" is not a complete sentence, so it should be "(Err).", not "(Err.)".
The more complex solution should be used if it's more common, but is it? Are parenthesis more commonly used for sentence fragments, or for whole sentences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more complex solution should be used if it's more common, but is it?
Clearly it isn't. I've considered also accepting as valid comments that end with .)
, but realized that would result in false positives for comments that end with etc.)
for instance (and just in the std there are three of them).
So after all I've completely removed any special treatment of parentheses (and updated the tests to reflect this). If you think it would be better to also accept .)
to avoid some false positives, I can re-add that easily.
(I've dropped the commit that was adding the special treatment of parens, I'm not sure what the stance is on rewriting history in PRs, but in any case I can easily bring it back later if needed.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought of another corner case...
/// Says "yes."
pub fn affirm()
Putting the period inside the quotes doesn't make sense, but it's standard US English orthography, so we don't want to warn on it. British orthography puts the full stop outside the quotes, the same way brackets work, so we need to support both.
Whatever solution you use for quotes-with-.
-inside can probably also be used for parens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I actually thought of that one before but decided against as I thought it was just too rare to actually happen in doc comments and especially at their very end. Out of curiosity I just grepped the std code and only found this one occurrence. Do you think this is worth the additional complexity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how common it is. Maybe lintcheck can be used to try to find out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepting doc comments that end with double quotation marks (and with proper punctuation inside them) brought the linkcheck count from 6669 down to 6614. I don't think the slight additional complexity is an issue actually.
Event::Start(Tag::Link { .. }) | Event::End(TagEnd::Link) if no_report_depth == 0 && !offset.is_empty() => { | ||
text_offset = Some(offset.end); | ||
}, | ||
Event::Code(..) | Event::Text(..) if no_report_depth == 0 && !offset.is_empty() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the output of the lintcheck check, I realized that trailing code spans were not properly treated (the lint was suggesting to add a period just before the code span, instead of after it). Adding Event::Code
here fixes that.
f543c33
to
70a7dd6
Compare
}, | ||
Event::Text(..) if no_report_depth == 0 && !offset.is_empty() => { | ||
// American-style quotes require punctuation to be placed inside closing quotation marks. | ||
if doc_string[..offset.end].trim_end().ends_with('"') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so this code can be extended to handle parens with basically-trivial additional complexity, right?
if doc_string[..offset.end].trim_end().ends_with('"') { | |
if doc_string[..offset.end].trim_end().ends_with(|c| c == '"' || c == ')') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but that means that the fix suggestion would always add a period inside the parentheses, while it is more likely that adding it outside would be the correct choice. That makes me realize that the suggestion for quotes would also arbitrarily favor the American style over the British one.
Maybe we should treat these cases (quotes and parentheses) as unfixable: detect when terminal punctuation is missing but give no suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I think we are overfocusing on edge cases right now: this lint (even without handling quotes and parentheses) would already be really useful for rolling out on new projects to ensure a consistent style, and we know that we will not be able to handle every little edge case without full-blown NLP anyway. These edge cases can easily be circumvented by simply slightly rephrasing (and I understand that some people do not like tools dictating how they write, but in this case this seems pretty minor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should treat these cases (quotes and parentheses) as unfixable: detect when terminal punctuation is missing but give no suggestions?
Yes, I agree. That would be great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I think we are overfocusing on edge cases right now: this lint (even without handling quotes and parentheses) would already be really useful for rolling out on new projects to ensure a consistent style, and we know that we will not be able to handle every little edge case without full-blown NLP anyway.
It's certainly true that we can't reach perfection without full AI. When we inevitably err, though, I'd prefer to err on the side of false negatives instead of false positives. We could even solve the parens and quotes problem by adding them as accepted terminating punctuation:
const TERMINAL_PUNCTUATION_MARKS: &[char] = &['.', '?', '!', '…', ')', '"', '\''];
The point is that we don't want to make people #[allow]
the lint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed an update that does the following:
- Consider both
.)
and).
to be correct (thus there could be false negatives in
the former case, but nothing we can do about it). - Consider both
."
and".
to be correct (same). - Otherwise the lint triggers and offers help, but does not suggest a fix.
The implementation could likely use some refactoring, but we should agree on the
behavior before that.
I did not modify TERMINAL_PUNCTUATION_MARKS
because that was causing issues with Markdown links, which also end with a parenthesis. I also did not do anything about single quotation marks, because my understanding is that those are used in British English and I would expect American style not be used in that case (and therefore no special treatment is needed).
This introduces a new lint that aims to check for missing terminal punctuation in doc comments.
Lint scope and motivation
It partially addresses #8371 by currently focusing on the case that is both the easiest to look for without advanced NLP capacities and still very useful: missing punctuation at the end of the last sentence of doc comments. This is particularly useful when working in a team to enforce a style guide and make sure all doc comments end with some kind of punctuation, which is often forgotten in the case of short, single-sentence doc comments. The lint is biased towards avoiding false positives so it can be easily adopted without requiring an excessive number of
#[expect]
attributes.It is currently only concerned with Latin languages, but adding support for
。
for instance should be very easy.Even if consistently ending doc comments (even very short ones) with punctuation is somewhat of an opinion, it seems sufficiently well-established, at least in the
std
, to warrant its own lint, even if of course it would be allow-by-default.Lint category and possible evolution
Because it is unclear how useful and more complex it would be to also look for missing punctuation at the end of each Markdown paragraphs, I opted for the
nursery
category for now. My understanding of the stability guarantees is that would it not be acceptable to add such capability later if it was part ofpedantic
orrestriction
categories right away. I tried to make sure the lint name allows this kind of iterative evolution.Testing
I ran the lint against the
core
library which led me to introduce a few special cases (which seem common enough) to avoid false positives: e.g., the last sentence being in parentheses. After excluding specific modules which should likely be excluded (e.g.,core_arch
,intrinsincs
), it currently finds a bit more than 200 errors in thecore
library.This lint also helped find issues with a few doc comments, which are now fixed.
How to review this PR
I suppose it is easier to have a look at the UI tests first before checking the implementation.
Please write a short comment explaining your change (or "none" for internal only changes)
changelog: add new lint [
doc_comments_missing_terminal_punctuation
]