Since 2022, our data engineering team has been running Databricks and dbt Core to power our Data Vault environment. Everything ran smoothly—until we encountered the “remote client cannot create a SparkContext” error. This issue forced us to switch to creating a SparkSession instead and prompted a deep dive into its cause and solution.

That streak of reliability came to an abrupt stop last week when our DBT Python models running on Databricks started failing with the following error message:

[CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT] The remote client cannot create a SparkContext. Create SparkSession instead.

This unexpected error disrupted our DBT runs, which had been stable for years. At first, it seemed related to how Spark contexts were being initialized—something that had not changed in our codebase. We then conducted a deep dive into recent Databricks platform updates. These updates affected DBT’s execution model when connecting remotely.

Why the Remote Client Cannot Create a SparkContext and How to Fix It

Initial Debugging Attempts

We spent hours debugging our code, testing different approaches, and combing through Databricks and DBT documentation for clues—but nothing seemed to resolve the issue. The error persisted across multiple models and environments, leaving us puzzled. Eventually, we decided to experiment with our infrastructure itself. By switching the cluster type, we finally managed to get our dbt jobs running again. This confirmed that the problem wasn’t within our code or dbt configuration, but rather linked to the Databricks cluster environment.

Using the dbt_cli Cluster

During our investigation, we discovered Databricks’ dedicated dbt_cli cluster, which runs DBT jobs efficiently. This cluster simplifies integration by providing a pre-configured environment where DBT Core and its dependencies come pre-installed. Setup becomes faster, and the cluster reduces compatibility issues. However, it primarily supports job execution rather than interactive development or broader data processing tasks. While convenient and lightweight, it offers less flexibility and scalability than an all-purpose cluster. For example, it cannot handle mixed workloads or support ad-hoc queries as efficiently. In our case, switching to the dbt_cli cluster resolved the SparkContext problem. We did need to adjust our workflow to match the job-oriented design of this cluster type.

Exploring Serverless Clusters

In addition to the dbt_cli cluster, Databricks also offers serverless clusters, which have recently become a strong option for development and debugging. We found that when the cluster configuration includes”spark.databricks.serverless.environmentVersion”: “3”, it fully supports dbt runs without the SparkContext issue. Serverless clusters start up quickly, scale efficiently, and provide a clean environment that’s ideal for testing and interactive development. However, there’s a trade-off—these clusters have limited direct access to Unity Catalog in notebooks.

Why All-Purpose Clusters Remain the Best Choice

In the end, we found that the all-purpose clusters remain the best and fastest option for running our dbt workloads in Databricks. Their flexibility, performance, and compatibility with our Data Vault framework make them ideal for both development and production. While the recent issue forced us to explore alternatives like the dbt_cli and serverless clusters, these workarounds kept our pipelines running and gave us valuable insights into Databricks’ evolving infrastructure. Hopefully, future updates will restore full support for running dbt Python models directly on all-purpose clusters—bringing back the seamless experience we’ve enjoyed since 2022