-
Notifications
You must be signed in to change notification settings - Fork 103
feat: telemetry v2 #2485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: telemetry v2 #2485
Conversation
🦋 Changeset detectedLatest commit: 81b9274 The changes in this PR will be included in the next version bump. This PR includes changesets to release 5 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
packages/platform-core/src/telemetry-data/telemetry_data_emitter.test.ts
Fixed
Show fixed
Hide fixed
| // @public | ||
| export class BackendDeployerFactory { | ||
| constructor(packageManagerController: PackageManagerController, formatter: BackendDeployerOutputFormatter, backendDeployerIOHost: AmplifyIOHost, sdkProfileResolver: SDKProfileResolver); | ||
| constructor(packageManagerController: PackageManagerController, formatter: BackendDeployerOutputFormatter, backendDeployerIOHost: AmplifyIOHost, sdkProfileResolver: SDKProfileResolver, telemetryDataEmitter: TelemetryDataEmitter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason that we're passing telemetry emitter to deployer?
We used to have deployer to return times to sandbox and sandbox emitting the metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to use
| private populateCDKOutputFromStdout = async ( |
With this gone after the introduction of CDK toolkit we now have to rely on the events from CDK which are already being captured in AmplifyEventLogger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the event logging captures when a hotswap happens, I can try to work on copying how we used to do this but we lose the ability to differentiate hotswap from deployment times (so in that case maybe we remove hotswap from telemetry schema)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the event logging captures when a hotswap happens
That can also be achieved if we emit this data from deployer via return type and/or exception. Let's continue this discussion below in #2485 (comment)
| ['amplify'], | ||
| ); | ||
|
|
||
| // can't test arguments of mockTelemetryEmitFailure because of flakiness around stack trace and latency times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we can't assert values, we could at least assert presence of certain elements in the event.
| await this.telemetryDataEmitter.emitFailure(error, latencyDetails, { | ||
| subCommands: 'SandboxEvent', | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and below.
we shouldn't be encoding "sandbox" logic in deployer.
will this be correct if top level command is pipeline-deploy?
I'm guessing that the problem being solved here is to get some latency metrics in error case.
For that I would suggest two alternatives:
- Develop
DeploymentError(or so) that can carry both the root cause error and metrics back to caller and process that data in sandbox/pipeline deploy. - Perhaps introducing the concept of
UsageDataCollectorfrom Emit notices metrics #2698 that I opened in parallel would solve this use case ? I.e. we could inject the collector here to get these metrics but leave emission up to cli and sandbox.
Edit: after reaching end of this PR I notices we haven't touched this piece
amplify-backend/packages/cli/src/commands/sandbox/sandbox_event_handler_factory.ts
Line 40 in 26f122d
| await usageDataEmitter.emitSuccess( |
amplify-backend/packages/cli/src/commands/sandbox/sandbox_event_handler_factory.ts
Line 76 in 26f122d
| await usageDataEmitter.emitFailure(deployError, { |
perhaps it would be possible to plumb metrics from deployer up to there and just replace emitter call there ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this won't always necessarily be exclusively sandbox but I believe we would still want to capture this even for pipeline-deploy. It may be more suitable to rename this and other places to be DeploymentEvent instead?
after reaching end of this PR I notices we haven't touched this piece
I did touch those places before but they were intentionally removed after merging in changes from pretty sandbox, see my previous comment #2485 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw. Some other variation of 2. above would be
process.emit('amplifyMetrics', {...})` in deployer, listening to that in emitter, and emit in sandbox event handlers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other thought.
If we emit in deployer then for pipeline deploy - would we double emit ? after deployment and after command exists? If we do double emit then this would be another reason to split metrics collection/recording and emission, so that.
@Amplifiyer plz take a look at this discussion and share your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we would double emit, which I believe is what is happening today for sandbox but not for pipeline deploy: 1 emit for sandbox command and another emit when deployment succeeds/fails (which would be Sandbox in our telemetry)
Edit: And on this note I believe we currently don't have the granularity for time metrics for pipeline deploy (we have total time it took to execute the pipeline-deploy command but I guess we don't have it broken down into synthesis and deployment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the double emit, I don't see that as a problem, 1 is a deployment event and 1 is a command event. The deployment could be a failure from customer issue, but the command would have been a success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I forgot I could add more this.ioHost.notify in deployer for failures... Also is there a benefit over the other for new "logger" vs. new event listener?
As for double emit, since this would also happen for pipeline deploy if we are using some form of event logger (new or existing) then I would probably rename the sub command SandboxEvent -> DeploymentEvent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also is there a benefit over the other for new "logger" vs. new event listener?
To keep concerns separate for logging versus emitting metrics in case we are adding a bunch of logic for telemetry.
add more this.ioHost.notify in deployer for failures...
For this, why don't we emit from Sandbox when deployer returns with an error. It's sandbox where we swallow errors, so we should be able to emit there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep concerns separate for logging versus emitting metrics in case we are adding a bunch of logic for telemetry.
+1 on that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this, why don't we emit from Sandbox when deployer returns with an error. It's sandbox where we swallow errors, so we should be able to emit there?
Problem is we don't know when a hotswap happens instead of full deployment without CDK toolkit code CDK_TOOLKIT_I5403. I can definitely move emits back to sandbox_event_handler_factory.ts but we lose the ability to differentiate hotswap so we would remove hotswap from LatencyDetails in that case.
| latencyDetails.deployment = deploymentTime; | ||
| } | ||
| await this.telemetryDataEmitter.emitSuccess(latencyDetails, { | ||
| subCommands: 'SandboxEvent', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| subCommands: 'SandboxEvent', | |
| subCommands: 'SandboxDeployment', |
maybe that's better name for this event ?
additionally - is this the right place to do this?
should we have sandbox -> event logger -> metrics
or fan out
sandbox -> event logger
sandbox -> metrics
I would say that event logging and metrics are two separate concerns that shouldn't be coupled together.
packages/cli/src/ampx.ts
Outdated
| // eslint-disable-next-line @typescript-eslint/no-misused-promises | ||
| process.on('beforeExit', async (code) => { | ||
| if (telemetryEmitCount !== 0) { | ||
| process.exit(code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| process.exit(code); | |
| return; |
process is already exiting at this point.
| latencyDetails, | ||
| extractCommandInfo(parser), | ||
| ); | ||
| process.exit(code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| process.exit(code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this causes an infinite loop as awaiting emitSuccess adds to the event loop which runs beforeExit again.
Tried process.once('beforeExit') and other tricks with no luck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, I learnt something today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this causes an infinite loop as awaiting emitSuccess adds to the event loop which runs beforeExit again.
Then I wonder if bringing back that flag tracking if abortion was emitted at least once would help here. I.e. if we do that, second firing of beforeExit would be no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually tried that this morning and, while it does fix the infinite loop, the Warning: Detected unsettled top-level await message now appears.
packages/cli/src/ampx.ts
Outdated
| // Trying to do await telemetryDataEmitter.emitAbortion in errorHandler ends up with: | ||
| // Warning: Detected unsettled top-level await | ||
| let telemetryEmitCount = 0; | ||
| // eslint-disable-next-line @typescript-eslint/no-misused-promises |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to sidestep this warning you could do something like:
const handleAbortion = async (code) => {
// code from current handler
}
process.on('beforeExit', (code) => void handleAbortion(code));
Example
amplify-backend/packages/sandbox/src/file_watching_sandbox.ts
Lines 86 to 87 in 26f122d
| process.once('SIGINT', () => void this.stop()); | |
| process.once('SIGTERM', () => void this.stop()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was able to let me remove the eslint disable comment, but the warning is coming from other async functions after the parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the warning is coming from other async functions after the parser.
it seems linter suppression is gone in recent iteration. right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes switching to your suggestion allowed me to remove the linter suppression. The comment for "Warning: Detected unsettled top-level await" is still there if I remove the process.exit at the end.
After testing looks like:
try {
// yargs processes 'ctrl+c' behavior and we capture and do stuff
// before process exits with process.on('beforeExit') defined above
await parser.parseAsync(hideBin(process.argv));
...
} catch (e) {
if (e instanceof Error) {
...
// these awaits would be unsettled
await noticesRenderer.tryFindAndPrintApplicableNotices({
event: 'postCommand',
error: e,
});
await errorHandler(format.error(e), e);
}
}This warning is also seen in main actually if you run sandbox delete and ctrl+c without answering the prompt:
Warning: Detected unsettled top-level await at file:///Users/rotp/workplace/amplify-backend/packages/cli/lib/ampx.js:74
await noticesRenderer.tryFindAndPrintApplicableNotices({
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so the warning is runtime, not compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to repro this locally to say more.
|
Closing in favor of #2717 which is merged |
Changes
Introduces new version of telemetry.
Not in scope
Removal and cleanup of old version of telemetry
High level overview of changes (what's new in v2):
sandboxcommand withSandbox, report asSandboxEventto differentiate more from thesandboxcommandtelemetry.enabledconfig so current users don't have to opt out of telemetry againusage_data_preferences.json{ "telemetry": { "enabled": false, }, "<path to project without homedir>": { "projectId": "<uuid>" } }ctrl+c)Overall I try to call
telemetryDataEmitterwhereverusageDataEmitteris called, only time I don't is for deployments because:telemetryDataEmitter.emitSuccessis called inAmplifyEventLoggerCDKDeployerand latency data is available,telemetryDataEmitter.emitFailureis called thereExample emitSuccess payload:
Example emitFailure payload:
Example emitAbortion payload:
Corresponding docs PR, if applicable:
Validation
Unit tests and manually with snapshot release.
Checklist
run-e2elabel set.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.