Skip to content

Conversation

AudaciousAxiom
Copy link

This introduces a new lint that aims to check for missing terminal punctuation in doc comments.

Lint scope and motivation

It partially addresses #8371 by currently focusing on the case that is both the easiest to look for without advanced NLP capacities and still very useful: missing punctuation at the end of the last sentence of doc comments. This is particularly useful when working in a team to enforce a style guide and make sure all doc comments end with some kind of punctuation, which is often forgotten in the case of short, single-sentence doc comments. The lint is biased towards avoiding false positives so it can be easily adopted without requiring an excessive number of #[expect] attributes.

It is currently only concerned with Latin languages, but adding support for for instance should be very easy.

Even if consistently ending doc comments (even very short ones) with punctuation is somewhat of an opinion, it seems sufficiently well-established, at least in the std, to warrant its own lint, even if of course it would be allow-by-default.

Lint category and possible evolution

Because it is unclear how useful and more complex it would be to also look for missing punctuation at the end of each Markdown paragraphs, I opted for the nursery category for now. My understanding of the stability guarantees is that would it not be acceptable to add such capability later if it was part of pedantic or restriction categories right away. I tried to make sure the lint name allows this kind of iterative evolution.

Testing

I ran the lint against the core library which led me to introduce a few special cases (which seem common enough) to avoid false positives: e.g., the last sentence being in parentheses. After excluding specific modules which should likely be excluded (e.g., core_arch, intrinsincs), it currently finds a bit more than 200 errors in the core library.

This lint also helped find issues with a few doc comments, which are now fixed.

How to review this PR

I suppose it is easier to have a look at the UI tests first before checking the implementation.


Please write a short comment explaining your change (or "none" for internal only changes)

changelog: add new lint [doc_comments_missing_terminal_punctuation]

Copy link

github-actions bot commented Sep 24, 2025

Lintcheck changes for 0df651d

Lint Added Removed Changed
clippy::doc_comments_missing_terminal_punctuation 6458 0 0

This comment will be updated if you push new changes

@AudaciousAxiom AudaciousAxiom force-pushed the feat/doc_comments_missing_terminal_punctuation branch from 48a2590 to 1025af5 Compare September 24, 2025 19:02
@AudaciousAxiom
Copy link
Author

Are the lint warnings found by the lintcheck check something that should be addressed in any way or is it just to get an idea of what it looks like to run the new lint on popular crates?

@AudaciousAxiom AudaciousAxiom marked this pull request as ready for review September 24, 2025 19:17
@rustbot
Copy link
Collaborator

rustbot commented Sep 24, 2025

Some changes occurred in clippy_lints/src/doc

cc @notriddle

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Sep 24, 2025
@rustbot
Copy link
Collaborator

rustbot commented Sep 24, 2025

r? @Alexendoo

rustbot has assigned @Alexendoo.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@notriddle
Copy link
Contributor

This seems hard to maintain. I would prefer if it was based on the actual markdown parser instead of doing it like this.

I've rewritten a version, that you can see here: AudaciousAxiom/rust-clippy@feat/doc_comments_missing_terminal_punctuation...notriddle:rust-clippy:feat/doc_comments_missing_terminal_punctuation

If it looks better to everyone else, we can push it to your branch?

Copy link
Author

@AudaciousAxiom AudaciousAxiom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@notriddle Thanks a lot for spending time on this! I did not go the parser route initially in part because of performance concerns, as the happy path (when there is punctuation) can be very fast, while this otherwise requires parsing doc comment all the time.

I have pushed your version to my branch, and modified it to handle sentences in parentheses, and exclude #[doc] attributes and doc comments that end with a list/code block/table to avoid false positives as much as possible. Clearly this looks cleaner and more maintainable, and functionally better (handling autolinks properly seems very useful).

My only concern is possibly performance as currently that makes the Markdown parser run on every doc comment, but it should be trivial to short-circuit early if the doc comment ends with a period, before triggering the parser, if we realize later that performance is an issue.

If this looks good to you, I can reorganize the tests, and later rewrite the history if you want.

View changes since this review

text_offset = Some(offset.end);
},
Event::Text(..) if no_report_depth == 0 && !offset.is_empty() => {
if doc_string[..offset.end].trim_end().ends_with(')') {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@notriddle This now only checks for parentheses, I'm not sure why you were trimming brackets and curly braces as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually remember what my reasoning was, either...

TagEnd::CodeBlock | TagEnd::Heading(_) | TagEnd::HtmlBlock | TagEnd::List(_) | TagEnd::Table,
) => {
no_report_depth -= 1;
text_offset = None;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split this arm so text_offset gets overwritten if these constructs are at the end of the doc comments (and added some tests for these cases).

/// ```
#[doc = ""]
/// ```
struct DocAttribute;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opted for simply ignoring interleaved #[doc] attributes to reduce false positives, and removed the _unfixable test file.

enum Trailers {
/// (Sometimes the last sentence is in parentheses, and that's ok.)
ParensPassing,
/// (But sometimes it is missing a period.)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is now properly handled: the period is added inside the parentheses instead of outside.

Copy link
Contributor

@notriddle notriddle Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this example:

/// Perform a binary search, returning the first element of the list
/// that is equal (Ok) or the position where the element should be
/// inserted (Err)
fn binary_search<T: Eq>(slice: &[T], elem: T) -> Result<usize, usize> {
    unimplemented!();
}

"(Err)" is not a complete sentence, so it should be "(Err).", not "(Err.)".

The more complex solution should be used if it's more common, but is it? Are parenthesis more commonly used for sentence fragments, or for whole sentences?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more complex solution should be used if it's more common, but is it?

Clearly it isn't. I've considered also accepting as valid comments that end with .), but realized that would result in false positives for comments that end with etc.) for instance (and just in the std there are three of them).

So after all I've completely removed any special treatment of parentheses (and updated the tests to reflect this). If you think it would be better to also accept .) to avoid some false positives, I can re-add that easily.

(I've dropped the commit that was adding the special treatment of parens, I'm not sure what the stance is on rewriting history in PRs, but in any case I can easily bring it back later if needed.)

Copy link
Contributor

@notriddle notriddle Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought of another corner case...

/// Says "yes."
pub fn affirm()

Putting the period inside the quotes doesn't make sense, but it's standard US English orthography, so we don't want to warn on it. British orthography puts the full stop outside the quotes, the same way brackets work, so we need to support both.

Whatever solution you use for quotes-with-.-inside can probably also be used for parens.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I actually thought of that one before but decided against as I thought it was just too rare to actually happen in doc comments and especially at their very end. Out of curiosity I just grepped the std code and only found this one occurrence. Do you think this is worth the additional complexity?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how common it is. Maybe lintcheck can be used to try to find out?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting doc comments that end with double quotation marks (and with proper punctuation inside them) brought the linkcheck count from 6669 down to 6614. I don't think the slight additional complexity is an issue actually.

Event::Start(Tag::Link { .. }) | Event::End(TagEnd::Link) if no_report_depth == 0 && !offset.is_empty() => {
text_offset = Some(offset.end);
},
Event::Code(..) | Event::Text(..) if no_report_depth == 0 && !offset.is_empty() => {
Copy link
Author

@AudaciousAxiom AudaciousAxiom Sep 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the output of the lintcheck check, I realized that trailing code spans were not properly treated (the lint was suggesting to add a period just before the code span, instead of after it). Adding Event::Code here fixes that.

@AudaciousAxiom AudaciousAxiom force-pushed the feat/doc_comments_missing_terminal_punctuation branch from f543c33 to 70a7dd6 Compare September 30, 2025 16:19
},
Event::Text(..) if no_report_depth == 0 && !offset.is_empty() => {
// American-style quotes require punctuation to be placed inside closing quotation marks.
if doc_string[..offset.end].trim_end().ends_with('"') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so this code can be extended to handle parens with basically-trivial additional complexity, right?

Suggested change
if doc_string[..offset.end].trim_end().ends_with('"') {
if doc_string[..offset.end].trim_end().ends_with(|c| c == '"' || c == ')') {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but that means that the fix suggestion would always add a period inside the parentheses, while it is more likely that adding it outside would be the correct choice. That makes me realize that the suggestion for quotes would also arbitrarily favor the American style over the British one.

Maybe we should treat these cases (quotes and parentheses) as unfixable: detect when terminal punctuation is missing but give no suggestions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I think we are overfocusing on edge cases right now: this lint (even without handling quotes and parentheses) would already be really useful for rolling out on new projects to ensure a consistent style, and we know that we will not be able to handle every little edge case without full-blown NLP anyway. These edge cases can easily be circumvented by simply slightly rephrasing (and I understand that some people do not like tools dictating how they write, but in this case this seems pretty minor).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should treat these cases (quotes and parentheses) as unfixable: detect when terminal punctuation is missing but give no suggestions?

Yes, I agree. That would be great!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I think we are overfocusing on edge cases right now: this lint (even without handling quotes and parentheses) would already be really useful for rolling out on new projects to ensure a consistent style, and we know that we will not be able to handle every little edge case without full-blown NLP anyway.

It's certainly true that we can't reach perfection without full AI. When we inevitably err, though, I'd prefer to err on the side of false negatives instead of false positives. We could even solve the parens and quotes problem by adding them as accepted terminating punctuation:

const TERMINAL_PUNCTUATION_MARKS: &[char] = &['.', '?', '!', '…', ')', '"', '\''];

The point is that we don't want to make people #[allow] the lint.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed an update that does the following:

  • Consider both .) and ). to be correct (thus there could be false negatives in
    the former case, but nothing we can do about it).
  • Consider both ." and ". to be correct (same).
  • Otherwise the lint triggers and offers help, but does not suggest a fix.

The implementation could likely use some refactoring, but we should agree on the
behavior before that.

I did not modify TERMINAL_PUNCTUATION_MARKS because that was causing issues with Markdown links, which also end with a parenthesis. I also did not do anything about single quotation marks, because my understanding is that those are used in British English and I would expect American style not be used in that case (and therefore no special treatment is needed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants