Skip to content

Conversation

@bripeticca
Copy link
Contributor

@bripeticca bripeticca commented Nov 11, 2025

This PR refactors diagnostic handling in the Swift build system by introducing a dedicated message handler and per-task output buffering to properly parse and emit compiler diagnostics individually.

Key Changes

SwiftBuildSystemMessageHandler

  • Introduced a new dedicated handler class to process SwiftBuildMessage events from the build operation
  • Moved message handling logic out of inline nested functions for better organization and testability
  • Maintains build state, progress animation, and diagnostic processing in a single cohesive component

Per-Task Data Buffering

  • Added taskBuffer: [String: Data] dictionary in BuildState to capture compiler output per task signature
  • Task output is accumulated in the buffer as .output messages arrive
  • Buffer contents are processed when tasks complete, ensuring all output is captured before parsing

Diagnostic Parsing & Emission

  • Implemented splitIntoDiagnostics() method that uses regex to parse compiler output into individual
    diagnostic segments (ParsedDiagnostic)
  • Recognizes standard Swift diagnostic format: path:line:column: severity: message
  • To prevent duplication, we selectively emit DiagnosticInfo depending on whether the flag appendToOutputStream is true.
  • Each parsed diagnostic includes file location, severity, message text, and full context (code snippets,
    carets, etc.)
  • Falls back to legacy emission behaviour when no structured diagnostics are found in output
  • Prevents duplicate emission by tracking processed task signatures

* Built tentative test class SwiftBuildSystemOutputParser to
  handle the compiler output specifically
* Added a handleDiagnostic method to possibly substitute the
  emitEvent local scope implementation of handling a
  SwiftBuildMessage diagnostic
* the flag `appendToOutputStream` helps us to determine whether
  a diagnostic is to be emitted or whether we'll be emitting
  the compiler output via OutputInfo
* separate the emitEvent method into the SwiftBuildSystemMessageHandler
@bripeticca bripeticca force-pushed the swb/diagnosticcodesnippet branch from a20f020 to f3aaabf Compare November 20, 2025 18:56
@bripeticca bripeticca force-pushed the swb/diagnosticcodesnippet branch from 1dcaaeb to c48e606 Compare November 20, 2025 18:58
@bripeticca
Copy link
Contributor Author

@swift-ci please test

@bripeticca bripeticca changed the title [WIP] Capture code snippet from diagnostic compiler output Capture code snippet from diagnostic compiler output Nov 20, 2025
@bripeticca bripeticca marked this pull request as ready for review November 20, 2025 19:21
Copy link
Contributor

@owenv owenv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thie generally lgtm but I have some concerns about the regex-based parsing when we emit the textual compiler output.

  1. Perf - It's important this is fast so that it doesn't block the end of the build if a command produces huge quantities of output. It's hard to say if this will be a real issue without some testing
  2. We're re-parsing information which we're already getting from the compiler in structured form. I see the appeal of not reporting a diagnostic twice if multiple compile jobs report it though

targetsByID[target.targetID] = target
}

mutating func target(for task: SwiftBuild.SwiftBuildMessage.TaskStartedInfo) throws -> SwiftBuild.SwiftBuildMessage.TargetStartedInfo? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be mutating?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not, will modify this

buildState.appendToBuffer(taskSignature, data: info.data)
}

return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're unable to get a taskID, this may be target/global output, so I think we should print it the best we can instead of returning and silently dropping it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also suggested using a locationContext as the key, which might help.

It might be about time to clean up these deprecated APIs, too. I recall there was something missing from the new ones which blocked total removal.


// If we've captured the compiler output with formatted diagnostics, emit them.
// if let buffer = buildState.dataBuffer(for: startedInfo) {
emitDiagnosticCompilerOutput(startedInfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be something other than a compile task which emits output with different formatting than we expect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. There are some guards in emitDiagnosticCompilerOutput that checks whether there is an existing data buffer that we've logged from the received SwiftBuildMessage.OutputInfo case so if this is anything else it will not be emitted.

Is it preferred to have this check be more explicit re: asserting this before calling this method?


/// Split compiler output into individual diagnostic segments
/// Format: /path/to/file.swift:line:column: severity: message
private func splitIntoDiagnostics(_ output: String) -> [ParsedDiagnostic] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be doing any output parsing at this level, Swift Build already has an entire subsystem dedicated to doing this.

Copy link
Contributor Author

@bripeticca bripeticca Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakepetroules I would agree, however the events we're receiving from SwiftBuild accumulates every diagnostic message into a singular data buffer -- when emitting this as-is, it's possible that we'd be coupling some info-level diagnostics with error-level diagnostics with no way to separate the severity. Splitting the string on a per-diagnostic basis is the only way to achieve this in this way.

The observability scope that we use to emit these diagnostics will capture the entire string blob, and we have to decide with what severity to emit the entire message. I'm not sure if there's an alternative path here that would give us the same ergonomics on the user front, but IMO it's preferred to separate each diagnostic and ensure that we're emitting them with the appropriate severity, rather than sending the entire string of possibly many diagnostics with varying severities.

Copy link
Contributor

@jakepetroules jakepetroules Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to clarify: Swift Build already parses the individual diagnostic messages into structured data objects independently of the singular output buffer, which is for the textual output of the tool. So Swift Build is already doing the equivalent of what splitIntoDiagnostics is doing, and those messages are given to you in the diagnostics message. They include rich information including severity, line number, and so on.

Similar to what I mentioned elsewhere about the taskStarted/taskEnded events, you want to capture those diagnostics' association to the specific task (e.g. unprocessedDiagnostics should be a dictionary rather than an array, and belongs in BuildState rather than in the top level object), and then replay them at task completion time, without attempting to do your own parsing.

struct BuildState {
private var targetsByID: [Int: SwiftBuild.SwiftBuildMessage.TargetStartedInfo] = [:]
private var activeTasks: [Int: SwiftBuild.SwiftBuildMessage.TaskStartedInfo] = [:]
private var taskBuffer: [String: Data] = [:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a string as the key, can you use a SwiftBuildMessage.LocationContext?

buildSystem.delegate?.buildSystem(buildSystem, didFinishCommand: BuildSystemCommand(startedInfo, targetInfo: targetInfo))
if let targetName = targetInfo?.targetName {
serializedDiagnosticPathsByTargetName[targetName, default: []].append(contentsOf: startedInfo.serializedDiagnosticsPaths.compactMap {
try? Basics.AbsolutePath(validating: $0.pathString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just let this throw, if this isn't an absolute path something has gone very wrong.

self.observabilityScope.emit(info: "\(info.executionDescription)")
}

if self.logLevel.isVerbose {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not write any output on task started events and instead always defer its emission to task completed events. The reason is that multiple tasks can start in parallel, and the outputStream content emitted when the taskStarted and taskEnded events are emitted may not be reasonably ordered at all. so you could have:

  • task 1 header
  • task 2 output
  • task 1 header
  • task 2 output

...which will be really confusing since task 2's output will look like it belongs to task 1, and so on.

The observabilityScope stuff might be OK, provided that doesn't also end up in the textual output stream and is only for API clients.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation! I'm in agreement -- this is some legacy implementation that I moved from the inline method to this class, so it's possible that this kind of behaviour is sprinkled elsewhere in emitEvent. I'll keep an eye on that :)

@jakepetroules
Copy link
Contributor

2. We're re-parsing information which we're already getting from the compiler in structured form. I see the appeal of not reporting a diagnostic twice if multiple compile jobs report it though

I also pointed this out inline, though I'm not sure why the re-parsing relates to deduplication? We should be able to deduplicate diagnostics whether we parse them again or use the existing ones.

@bripeticca
Copy link
Contributor Author

bripeticca commented Nov 21, 2025

though I'm not sure why the re-parsing relates to deduplication

(@jakepetroules @owenv -- tagging for visibility, GitHub notifications can be weird)

Re-parsing doesn't affect the de-duplication -- when tracking the data buffer per task, I also track whether we've emitted the associated output for a given task (using its signature) and guard against this before emitting for that task again. We only go down this path (emitting for a given task) once we've received the task completed event.

I mentioned this inline as well but for visibility: the re-parsing just addresses the fact that for a given task signature, we have an accumulated data buffer that contains all possible diagnostic messages coming from the compiler. I find that the user ergonomics aren't great when simply emitting the entire string blob through the observability scope, since these diagnostics can have varying severities and we'll have to decide up-front which severity to choose to emit the entire string of all diagnostics.

I do also maintain a list of the DiagnosticInfo that we omit in favour of emitting the OutputInfo containing the same diagnostic messages, but with the plus of having the pre-formatted code snippet (the DiagnosticInfo is missing the pre-formatted code snippet but contains enough information to recreate it ourselves, and it was suggested that we instead fall back to using the OutputInfo for that reason).

Perhaps some more discussion is needed here. 😄

* Remove taskStarted outputStream emissions
* Propagate exception if unable to find path
@bripeticca
Copy link
Contributor Author

@swift-ci please test

@bripeticca
Copy link
Contributor Author

@swift-ci please test windows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants