Skip to content

Conversation

@sidsatyelp
Copy link
Member

@sidsatyelp sidsatyelp commented May 22, 2025

What this change does

In order to more accurately attribute the costs of feature development to infrastructure costs we are making a change to how we submit adhoc spark jobs. Users will need to pass an additional parameter jira_ticket in spark-args. The value of this parameter should be the top level jira ticket being used to track your project. This will allow us to aggregate the entire project's spark development costs over time.

Will work only after the flag spark.yelp.jira_ticket.enabled is set to true in srv-configs and the new version with these changes are included in paasta and spark-tools.

Testing

# Shouldn't launch because we don't provide jira_ticket
paasta spark-run --aws-profile=dev-cloud-economics --cmd "spark-submit /code/integration_tests/s3.py"
# output: https://fluffy.yelpcorp.com/i/XDdDpJB7vrXLvcqVNkLBxvBdtTFl44RX.html

# Should launch because we provide jira_ticket
paasta spark-run --aws-profile=dev-cloud-economics --jira-ticket=ABC-123 --cmd "spark-submit /code/integration_tests/s3.py"
# output: https://fluffy.yelpcorp.com/i/1VCTvzjk0wK6mbC97KF8JPRqt2WmsXTd.html

jira_ticket label is being added to executors
https://app.cloudzero.com/explorer?activeCostType=real_cost&granularity=hourly&partitions=k8slabel%3Aspark_yelp_com_jira_ticket&dateRange=Last%2024%20Hours&k8slabel%3Aspark_yelp_com_user=sids&showRightFlyout=filters

Tech Spec
Jira Ticket

@sidsatyelp sidsatyelp requested a review from chi-yelp May 22, 2025 06:17
@sidsatyelp sidsatyelp requested a review from jhereth May 22, 2025 07:10
@chi-yelp
Copy link
Contributor

chi-yelp commented May 22, 2025

How about letting user to use a separate argument for the jira ticket?
It's easy to miss (or be copied from other jobs along with other spark configs) when the option is part of the job's --spark-args, and we normally use a prefix spark.yelp.* for Yelp-specific Spark configs for easier tracking.

I.e.

# Ad-hoc CLI
paasta spark-run --aws-profile=dev-core-ml --jira-ticket="ABC-123" --spark-args "..."

# Jupyter
spark = paasta.create_spark_session(
    {
        'spark.executor.cores': '4',
        ...
    },
    profile_name="dev-core-ml",
    jira_ticket="ABC-123",
)

@jhereth
Copy link

jhereth commented May 22, 2025

How about letting user to use a separate argument for the jira ticket? It's easy to miss (or be copied from other jobs along with other spark configs) when the option is part of the job's --spark-args, and we normally use a prefix spark.yelp.* for Yelp-specific Spark configs for easier tracking.

I.e.

# Ad-hoc CLI
paasta spark-run --aws-profile=dev-core-ml --jira-ticket="ABC-123" --spark-args "..."

# Jupyter
spark = paasta.create_spark_session(
    {
        'spark.executor.cores': '4',
        ...
    },
    profile_name="dev-core-ml",
    jira_ticket="ABC-123",
)

This is a cleaner solution I think. Would this work out of the box? Or do users need to update spark-tools to be able to use this additional kwarg?

@chi-yelp
Copy link
Contributor

chi-yelp commented May 22, 2025

How about letting user to use a separate argument for the jira ticket? It's easy to miss (or be copied from other jobs along with other spark configs) when the option is part of the job's --spark-args, and we normally use a prefix spark.yelp.* for Yelp-specific Spark configs for easier tracking.
I.e.

# Ad-hoc CLI
paasta spark-run --aws-profile=dev-core-ml --jira-ticket="ABC-123" --spark-args "..."

# Jupyter
spark = paasta.create_spark_session(
    {
        'spark.executor.cores': '4',
        ...
    },
    profile_name="dev-core-ml",
    jira_ticket="ABC-123",
)

This is a cleaner solution I think. Would this work out of the box? Or do users need to update spark-tools to be able to use this additional kwarg?

yeah this will need updating spark_run in paasta and spark-tools, but both approaches require bumping servcie_configuration_lib in paasta and spark-tools anyway

@jhereth
Copy link

jhereth commented May 22, 2025

This is a cleaner solution I think. Would this work out of the box? Or do users need to update spark-tools to be able to use this additional kwarg?

yeah this will need updating spark_run in paasta and spark-tools, but both approaches require bumping servcie_configuration_lib in paasta and spark-tools anyway

Then I'd prefer the cleaner way.

@sidsatyelp sidsatyelp changed the title Add check for jira_ticket parameter being passed in spark_args for Add check for jira_ticket parameter being passed via spark-run for adhoc spark jobs May 27, 2025
@sidsatyelp
Copy link
Member Author

@chi-yelp and @jhereth Can you please review these changes?

Copy link
Contributor

@chi-yelp chi-yelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented for a nit, lgtm otherwise

user_spark_opts = _convert_user_spark_opts_value_to_str(user_spark_opts)

if self.mandatory_default_spark_srv_conf.get('spark.yelp.jira_ticket.enabled') == 'true':
needs_jira_check = os.environ.get('USER', '') not in ['batch', 'TRON', '']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be simplified as follows, and add a comment

Suggested change
needs_jira_check = os.environ.get('USER', '') not in ['batch', 'TRON', '']
# Check Jira ticket if the run is not a Tron run
needs_jira_check = os.environ.get('USER') not in ['batch', 'TRON']

@sidsatyelp sidsatyelp merged commit 249370c into master May 29, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants