Replies: 1 comment
-
Hello @drewbanin, @jtcohen6 , @emmyoop, @gshank, @MichelleArk, @MichelleArk, @cmcarthur, @dbeatty10, Let me know in case of any queries. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Scalability of dbt for Large Projects
We’re starting this thread to discuss the scalability of dbt when handling a large number of models. In our organization, we have a dbt project with 2,000+ models. To execute these models, we’ve integrated dbt with Airflow, running dbt commands via the Airflow BashOperator.
Current Challenge:
While executing models through bash commands, we pass certain variables using the
--var
flag. However, due to a known dbt limitation, dbt is unable to do partial parse and it starts a full parse for the entire project for each airflow dbt task run. This significantly increases parsing time as the number of models is growing.Optimization Efforts So Far:
We’ve explored the following optimizations as suggested in the dbt documentation:
Despite these efforts, the parsing time remains a bottleneck due to the sheer scale of our project.
Request for Suggestions:
We’re seeking additional strategies or best practices to optimize dbt parsing time for large projects. If you’ve faced similar challenges or have insights to share, we’d love to hear from you!
Tagging top contributors for attention: @drewbanin, @jtcohen6 , @emmyoop, @gshank, @MichelleArk
@MichelleArk, @cmcarthur, @dbeatty10
Beta Was this translation helpful? Give feedback.
All reactions