Skip to content

Conversation

mashalifshin
Copy link

@mashalifshin mashalifshin commented Aug 18, 2025

Problem Statement:

Currently, we are running incrementality experiments on FFHNT, measuring whether our Tiles are effective by seeing if they give a lift to organic visits, or to checkouts, to the tile advertiser. We're doing this in a private, anonymized manner, by sending the experiment results to DAP. However, we have no way to collect those results from DAP in an automated way and bring them into our usual data pipeline, so those results are currently being manually collected by an engineer.

Solution:

Create a docker-etl job that can go out to Nimbus and find out how to collect the results for each experiment branch, go out and collect those results from DAP, and finally write those results into BQ.

Checklist for reviewer:

  • Commits should reference a bug or github issue, if relevant (if a bug is
    referenced, the pull request should include the bug number in the title)
  • Scan the PR and verify that no changes (particularly to
    .circleci/config.yml) will cause environment variables (particularly
    credentials) to be exposed in test logs
  • Ensure the container image will be using permissions granted to
    telemetry-airflow
    responsibly.

@mashalifshin mashalifshin force-pushed the ae-782-build-dap-collector-job branch from ba47bdf to 65ef418 Compare August 29, 2025 00:53
@mashalifshin mashalifshin marked this pull request as ready for review August 30, 2025 02:22
@mashalifshin mashalifshin force-pushed the ae-782-build-dap-collector-job branch from 6c6a067 to 50591b2 Compare September 3, 2025 01:13
@mashalifshin mashalifshin force-pushed the ae-782-build-dap-collector-job branch from 2f54368 to 63053fc Compare September 12, 2025 01:15
Comment on lines 17 to 18
Some examples of existing metrics are
- "url visit counting", which increments counters in DAP when a firefox client visits an ad landing page.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of the type of metric is not needed here.

Comment on lines 20 to 22
Great care is taken to preserve privacy and anonymity of these metrics. The DAP technology itself agreggates counts
in separate systems and adds noise. The DAP telemetry feature will only submit a count to DAP once per week per client.
All DAP reports are deleted after 2 weeks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Details of DAP usage are not needed either since they are not relevant to this ETL job.

Comment on lines 81 to 83
There is also a `dev_runbook.md` doc that walks through what is required to set up a DAP account, create some DAP
tasks for testing, and the DAP credentials setup and management. The `public_key_to_hpke_config.py` utility will help
with encoding the DAP credentials for consumption by this job.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove DAP specific comments

return tasks_to_collect


# TODO Trigger Airflow errors

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO still relevant?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is good now. If we except from this function, it'll bubble up to that top level exception handling in main.py

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fill should be removed and the contents moved to an internal page

"nimbus": {
"api_url": "https://stage.experimenter.nonprod.webservices.mozgcp.net/api/v6/experiments",
"experiments": [{
"slug": "traffic-impact-study-1"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename the test experiment slugs to be more generic e.g. exp-1.

"hpke_config": "AQAgAAEAAQAgpdceoGiuWvIiogA8SPCdprkhWMNtLq_y0GSePI7EhXE"
},
"nimbus": {
"api_url": "https://stage.experimenter.nonprod.webservices.mozgcp.net/api/v6/experiments",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dummy URL can be used here.

"table": "incrementality"
},
"dap": {
"hpke_config": "AQAgAAEAAQAgpdceoGiuWvIiogA8SPCdprkhWMNtLq_y0GSePI7EhXE"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can an obvious dummy value be used here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file isn't needed for the job and should be removed.

"--task-id",
"mubArkO3So8Co1X98CBo62-lSCM4tB-NZPOUGJ83N1o",
"--leader",
"https://dap-09-3.api.divviup.org",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a dummy endpoint

],
),
bigquery.SchemaField(
"created_at",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to created_timestamp ?


@click.command()
@click.option("--gcp_project", help="GCP project id", required=True)
@click.option(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional comment from @gleonard-m:
Config file and BQ results table will be in different projects.

This gcp_project should just be used for the location of the config bucket. Rename to be more specific.

And we need to be able to separately specify the gcp_project for the BQ table where the results will be written. That will go in the BQ part of the config json file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants