-
Notifications
You must be signed in to change notification settings - Fork 17
[AE-782] Build dap collector job for incrementality experiments #387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
jobs/ads-incrementality-dap-collector/ads_incrementality_dap_collector/constants.py
Outdated
Show resolved
Hide resolved
jobs/ads-incrementality-dap-collector/ads_incrementality_dap_collector/helpers.py
Outdated
Show resolved
Hide resolved
ba47bdf
to
65ef418
Compare
6c6a067
to
50591b2
Compare
jobs/ads-incrementality-dap-collector/ads_incrementality_dap_collector/helpers.py
Outdated
Show resolved
Hide resolved
jobs/ads-incrementality-dap-collector/ads_incrementality_dap_collector/helpers.py
Show resolved
Hide resolved
jobs/ads-incrementality-dap-collector/ads_incrementality_dap_collector/constants.py
Outdated
Show resolved
Hide resolved
jobs/ads-incrementality-dap-collector/ads_incrementality_dap_collector/main.py
Outdated
Show resolved
Hide resolved
jobs/ads-incrementality-dap-collector/ads_incrementality_dap_collector/models.py
Outdated
Show resolved
Hide resolved
jobs/ads-incrementality-dap-collector/ads_incrementality_dap_collector/main.py
Show resolved
Hide resolved
2f54368
to
63053fc
Compare
jobs/ads-incrementality-dap-collector/ads_incrementality_dap_collector/helpers.py
Outdated
Show resolved
Hide resolved
Co-Authored-By: Glenda Leonard <[email protected]>
jobs/ads-incrementality-dap-collector/ads_incrementality_dap_collector/helpers.py
Outdated
Show resolved
Hide resolved
jobs/ads-incrementality-dap-collector/ads_incrementality_dap_collector/main.py
Show resolved
Hide resolved
jobs/ads-incrementality-dap-collector/ads_incrementality_dap_collector/models.py
Outdated
Show resolved
Hide resolved
jobs/ads-incrementality-dap-collector/ads_incrementality_dap_collector/main.py
Outdated
Show resolved
Hide resolved
jobs/ads-incrementality-dap-collector/ads_incrementality_dap_collector/main.py
Show resolved
Hide resolved
… test_mocks, where experiment is instantiated Co-authored-by: Chris Gramberg <[email protected]> Co-authored-by: Luc Lisi <[email protected]>
Some examples of existing metrics are | ||
- "url visit counting", which increments counters in DAP when a firefox client visits an ad landing page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of the type of metric is not needed here.
Great care is taken to preserve privacy and anonymity of these metrics. The DAP technology itself agreggates counts | ||
in separate systems and adds noise. The DAP telemetry feature will only submit a count to DAP once per week per client. | ||
All DAP reports are deleted after 2 weeks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Details of DAP usage are not needed either since they are not relevant to this ETL job.
There is also a `dev_runbook.md` doc that walks through what is required to set up a DAP account, create some DAP | ||
tasks for testing, and the DAP credentials setup and management. The `public_key_to_hpke_config.py` utility will help | ||
with encoding the DAP credentials for consumption by this job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove DAP specific comments
return tasks_to_collect | ||
|
||
|
||
# TODO Trigger Airflow errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is good now. If we except
from this function, it'll bubble up to that top level exception handling in main.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fill should be removed and the contents moved to an internal page
"nimbus": { | ||
"api_url": "https://stage.experimenter.nonprod.webservices.mozgcp.net/api/v6/experiments", | ||
"experiments": [{ | ||
"slug": "traffic-impact-study-1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename the test experiment slugs to be more generic e.g. exp-1
.
"hpke_config": "AQAgAAEAAQAgpdceoGiuWvIiogA8SPCdprkhWMNtLq_y0GSePI7EhXE" | ||
}, | ||
"nimbus": { | ||
"api_url": "https://stage.experimenter.nonprod.webservices.mozgcp.net/api/v6/experiments", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A dummy URL can be used here.
"table": "incrementality" | ||
}, | ||
"dap": { | ||
"hpke_config": "AQAgAAEAAQAgpdceoGiuWvIiogA8SPCdprkhWMNtLq_y0GSePI7EhXE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can an obvious dummy value be used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file isn't needed for the job and should be removed.
"--task-id", | ||
"mubArkO3So8Co1X98CBo62-lSCM4tB-NZPOUGJ83N1o", | ||
"--leader", | ||
"https://dap-09-3.api.divviup.org", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a dummy endpoint
], | ||
), | ||
bigquery.SchemaField( | ||
"created_at", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to created_timestamp
?
|
||
@click.command() | ||
@click.option("--gcp_project", help="GCP project id", required=True) | ||
@click.option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional comment from @gleonard-m:
Config file and BQ results table will be in different projects.
This gcp_project
should just be used for the location of the config bucket. Rename to be more specific.
And we need to be able to separately specify the gcp_project
for the BQ table where the results will be written. That will go in the BQ part of the config json file.
Problem Statement:
Currently, we are running incrementality experiments on FFHNT, measuring whether our Tiles are effective by seeing if they give a lift to organic visits, or to checkouts, to the tile advertiser. We're doing this in a private, anonymized manner, by sending the experiment results to DAP. However, we have no way to collect those results from DAP in an automated way and bring them into our usual data pipeline, so those results are currently being manually collected by an engineer.
Solution:
Create a
docker-etl
job that can go out to Nimbus and find out how to collect the results for each experiment branch, go out and collect those results from DAP, and finally write those results into BQ.Checklist for reviewer:
referenced, the pull request should include the bug number in the title)
.circleci/config.yml
) will cause environment variables (particularlycredentials) to be exposed in test logs
telemetry-airflow
responsibly.