Add check for jira_ticket parameter being passed via spark-run for adhoc spark jobs #161

sidsatyelp · 2025-05-22T06:16:51Z

What this change does

In order to more accurately attribute the costs of feature development to infrastructure costs we are making a change to how we submit adhoc spark jobs. Users will need to pass an additional parameter jira_ticket in spark-args. The value of this parameter should be the top level jira ticket being used to track your project. This will allow us to aggregate the entire project's spark development costs over time.

Will work only after the flag spark.yelp.jira_ticket.enabled is set to true in srv-configs and the new version with these changes are included in paasta and spark-tools.

Testing

# Shouldn't launch because we don't provide jira_ticket
paasta spark-run --aws-profile=dev-cloud-economics --cmd "spark-submit /code/integration_tests/s3.py"
# output: https://fluffy.yelpcorp.com/i/XDdDpJB7vrXLvcqVNkLBxvBdtTFl44RX.html

# Should launch because we provide jira_ticket
paasta spark-run --aws-profile=dev-cloud-economics --jira-ticket=ABC-123 --cmd "spark-submit /code/integration_tests/s3.py"
# output: https://fluffy.yelpcorp.com/i/1VCTvzjk0wK6mbC97KF8JPRqt2WmsXTd.html

jira_ticket label is being added to executors
https://app.cloudzero.com/explorer?activeCostType=real_cost&granularity=hourly&partitions=k8slabel%3Aspark_yelp_com_jira_ticket&dateRange=Last%2024%20Hours&k8slabel%3Aspark_yelp_com_user=sids&showRightFlyout=filters

Tech Spec
Jira Ticket

…hoc spark jobs

chi-yelp · 2025-05-22T11:43:22Z

How about letting user to use a separate argument for the jira ticket?
It's easy to miss (or be copied from other jobs along with other spark configs) when the option is part of the job's --spark-args, and we normally use a prefix spark.yelp.* for Yelp-specific Spark configs for easier tracking.

I.e.

# Ad-hoc CLI
paasta spark-run --aws-profile=dev-core-ml --jira-ticket="ABC-123" --spark-args "..."

# Jupyter
spark = paasta.create_spark_session(
    {
        'spark.executor.cores': '4',
        ...
    },
    profile_name="dev-core-ml",
    jira_ticket="ABC-123",
)

jhereth · 2025-05-22T14:16:23Z

How about letting user to use a separate argument for the jira ticket? It's easy to miss (or be copied from other jobs along with other spark configs) when the option is part of the job's --spark-args, and we normally use a prefix spark.yelp.* for Yelp-specific Spark configs for easier tracking.

I.e.
# Ad-hoc CLI
paasta spark-run --aws-profile=dev-core-ml --jira-ticket="ABC-123" --spark-args "..."

# Jupyter
spark = paasta.create_spark_session(
    {
        'spark.executor.cores': '4',
        ...
    },
    profile_name="dev-core-ml",
    jira_ticket="ABC-123",
)

This is a cleaner solution I think. Would this work out of the box? Or do users need to update spark-tools to be able to use this additional kwarg?

chi-yelp · 2025-05-22T15:49:23Z

How about letting user to use a separate argument for the jira ticket? It's easy to miss (or be copied from other jobs along with other spark configs) when the option is part of the job's --spark-args, and we normally use a prefix spark.yelp.* for Yelp-specific Spark configs for easier tracking.
I.e.
# Ad-hoc CLI
paasta spark-run --aws-profile=dev-core-ml --jira-ticket="ABC-123" --spark-args "..."

# Jupyter
spark = paasta.create_spark_session(
    {
        'spark.executor.cores': '4',
        ...
    },
    profile_name="dev-core-ml",
    jira_ticket="ABC-123",
)
This is a cleaner solution I think. Would this work out of the box? Or do users need to update spark-tools to be able to use this additional kwarg?

yeah this will need updating spark_run in paasta and spark-tools, but both approaches require bumping servcie_configuration_lib in paasta and spark-tools anyway

jhereth · 2025-05-22T16:23:21Z

This is a cleaner solution I think. Would this work out of the box? Or do users need to update spark-tools to be able to use this additional kwarg?

yeah this will need updating spark_run in paasta and spark-tools, but both approaches require bumping servcie_configuration_lib in paasta and spark-tools anyway

Then I'd prefer the cleaner way.

sidsatyelp · 2025-05-28T21:15:01Z

@chi-yelp and @jhereth Can you please review these changes?

chi-yelp

Commented for a nit, lgtm otherwise

chi-yelp · 2025-05-29T11:12:41Z

service_configuration_lib/spark_config.py

        user_spark_opts = _convert_user_spark_opts_value_to_str(user_spark_opts)

+        if self.mandatory_default_spark_srv_conf.get('spark.yelp.jira_ticket.enabled') == 'true':
+            needs_jira_check = os.environ.get('USER', '') not in ['batch', 'TRON', '']


nit: can be simplified as follows, and add a comment

Suggested change

needs_jira_check = os.environ.get('USER', '') not in ['batch', 'TRON', '']

# Check Jira ticket if the run is not a Tron run

needs_jira_check = os.environ.get('USER') not in ['batch', 'TRON']

Add check for jira_ticket parameter being passed in spark_args for ad…

aecd9c3

…hoc spark jobs

sidsatyelp requested a review from chi-yelp May 22, 2025 06:17

mock_spark_srv_conf_file in new test class

f94e0a6

sidsatyelp requested a review from jhereth May 22, 2025 07:10

update to pass jira ticket via paasta spark-run parameter

8c3751a

sidsatyelp changed the title ~~Add check for jira_ticket parameter being passed in spark_args for~~ Add check for jira_ticket parameter being passed via spark-run for adhoc spark jobs May 27, 2025

fix failing tests

66cf3de

chi-yelp approved these changes May 29, 2025

View reviewed changes

sidsatyelp merged commit 249370c into master May 29, 2025
1 check passed

sidsatyelp mentioned this pull request May 29, 2025

add jira-ticket parameter to paasta spark-run Yelp/paasta#4073

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add check for jira_ticket parameter being passed via spark-run for adhoc spark jobs #161

Add check for jira_ticket parameter being passed via spark-run for adhoc spark jobs #161

Uh oh!

sidsatyelp commented May 22, 2025 •

edited

Loading

Uh oh!

chi-yelp commented May 22, 2025 •

edited

Loading

Uh oh!

jhereth commented May 22, 2025

Uh oh!

chi-yelp commented May 22, 2025 •

edited

Loading

Uh oh!

jhereth commented May 22, 2025

Uh oh!

sidsatyelp commented May 28, 2025

Uh oh!

chi-yelp left a comment

Uh oh!

chi-yelp May 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	needs_jira_check = os.environ.get('USER', '') not in ['batch', 'TRON', '']
	# Check Jira ticket if the run is not a Tron run
	needs_jira_check = os.environ.get('USER') not in ['batch', 'TRON']

Add check for jira_ticket parameter being passed via spark-run for adhoc spark jobs #161

Add check for jira_ticket parameter being passed via spark-run for adhoc spark jobs #161

Uh oh!

Conversation

sidsatyelp commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this change does

Testing

Uh oh!

chi-yelp commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhereth commented May 22, 2025

Uh oh!

chi-yelp commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhereth commented May 22, 2025

Uh oh!

sidsatyelp commented May 28, 2025

Uh oh!

chi-yelp left a comment

Choose a reason for hiding this comment

Uh oh!

chi-yelp May 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sidsatyelp commented May 22, 2025 •

edited

Loading

chi-yelp commented May 22, 2025 •

edited

Loading

chi-yelp commented May 22, 2025 •

edited

Loading