Investigation: New timeseries metrics models are too large for current dbt configuration

### What is it?

The timeseries metrics have proven too large for dbt and bigquery to execute as we hope for them to be executed. 

Some things were attempted to get these things working properly which unfortunately has led to a larger investigation. 

Things tried:
* Split the CTEs that compose the `int_timeseries_` models into smaller models. This led to a discovery that generating `0` events for all artifacts created too large a model (55TB for developers * artifacts * days)
* SQLMesh against clickhouse
    * Sadly they don't yet have a clickhouse "engine" created yet. So I attempted to use the mysql engine but because clickhouse isn't a complete mysql implementation that failed. 

The remaining methods to try (in order) now are:
* Use SQLMesh against bigquery
* Use Dagster against clickhouse with our own SQL execution against clickhouse

Other things that have been considered and not tried:
* Spark
    * For the queries we'd like to do, having a spark cluster could make it possible against our bigquery dataset
* Trino
    * Similar to spark. This gives us the lever to make those queries. 
* Apache Iceberg
* Delta Lake

The reasons we didn't do most of these is mostly that we want to be able to expose exploring the dimensions of our metrics to users. These could still be useful in that world.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigation: New timeseries metrics models are too large for current dbt configuration #1970

What is it?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigation: New timeseries metrics models are too large for current dbt configuration #1970

Description

What is it?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions