How to convert delete+insert SQL into DBT module

I'm learning DBT and would like to rewrite following Snowflake procedure with DBT model. Unfortunately I don't know how to express SQL delete/inserts in DBT. Here is my procedure:

create or replace procedure staging.ingest_google_campaigns_into_master() returns varchar language sql
as
$$ begin DELETE FROM GOOGLE_ADWORD_CAMPAIGN WHERE DT IN (SELECT DISTINCT ORIGINALDATE AS DT FROM GOOGLEADWORDS_CAMPAIGN); INSERT INTO GOOGLE_ADWORD_CAMPAIGN SELECT DISTINCT * FROM ( SELECT g.* , YEAR(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)) AS YEAR, LPAD(MONTH(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)),2,0) AS MONTH, LPAD(DAY(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)),2,0) AS DAY, TO_DATE(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR) AS DT FROM GOOGLEADWORDS_CAMPAIGN g ) t; end;
$$
;

The procedure first remove old rows from table GOOGLE_ADWORD_CAMPAIGN and later replace them with the fresh one.

1 Answer

This pattern is called an "incremental" materialization in dbt. See the docs for more background.

On Snowflake, there are a few different "strategies" you can use for incremental materializations. One strategy is called delete+insert, which does exactly what your stored procedure does. Another option is merge, which may perform better.

Adapting your code to dbt would look like this:

{{ config( materialized='incremental', unique_key='ORIGINALDATE', incremental_strategy='delete+insert', )
}}
SELECT DISTINCT g.* , YEAR(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)) AS YEAR, LPAD(MONTH(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)),2,0) AS MONTH, LPAD(DAY(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)),2,0) AS DAY, TO_DATE(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR) AS DT
FROM GOOGLEADWORDS_CAMPAIGN g

Note that this assumes there is just a single record in GOOGLEADWORDS_CAMPAIGN for each ORIGINALDATE -- if that is not true, you will want to substitute your own unique_key in the config block.

This also assumes that GOOGLEADWORDS_CAMPAIGN contains a sliding date window already. If that isn't the case, and you want to filter the dates contained in that table (and only update a subset of the data), you want to add a conditional WHERE statement when the model is built in incremental mode:

...
FROM GOOGLEADWORDS_CAMPAIGN g
{% if is_incremental() %}
WHERE DT >= (SELECT MAX(DT) FROM {{ this }}) - INTERVAL '7 DAYS'
{% endif %}

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.

You Might Also Like