I'm learning DBT and would like to rewrite following Snowflake procedure with DBT model. Unfortunately I don't know how to express SQL delete/inserts in DBT. Here is my procedure:
create or replace procedure staging.ingest_google_campaigns_into_master() returns varchar language sql
as
$$ begin DELETE FROM GOOGLE_ADWORD_CAMPAIGN WHERE DT IN (SELECT DISTINCT ORIGINALDATE AS DT FROM GOOGLEADWORDS_CAMPAIGN); INSERT INTO GOOGLE_ADWORD_CAMPAIGN SELECT DISTINCT * FROM ( SELECT g.* , YEAR(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)) AS YEAR, LPAD(MONTH(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)),2,0) AS MONTH, LPAD(DAY(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)),2,0) AS DAY, TO_DATE(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR) AS DT FROM GOOGLEADWORDS_CAMPAIGN g ) t; end;
$$
;The procedure first remove old rows from table GOOGLE_ADWORD_CAMPAIGN and later replace them with the fresh one.
1 Answer
This pattern is called an "incremental" materialization in dbt. See the docs for more background.
On Snowflake, there are a few different "strategies" you can use for incremental materializations. One strategy is called delete+insert, which does exactly what your stored procedure does. Another option is merge, which may perform better.
Adapting your code to dbt would look like this:
{{ config( materialized='incremental', unique_key='ORIGINALDATE', incremental_strategy='delete+insert', )
}}
SELECT DISTINCT g.* , YEAR(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)) AS YEAR, LPAD(MONTH(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)),2,0) AS MONTH, LPAD(DAY(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)),2,0) AS DAY, TO_DATE(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR) AS DT
FROM GOOGLEADWORDS_CAMPAIGN gNote that this assumes there is just a single record in GOOGLEADWORDS_CAMPAIGN for each ORIGINALDATE -- if that is not true, you will want to substitute your own unique_key in the config block.
This also assumes that GOOGLEADWORDS_CAMPAIGN contains a sliding date window already. If that isn't the case, and you want to filter the dates contained in that table (and only update a subset of the data), you want to add a conditional WHERE statement when the model is built in incremental mode:
...
FROM GOOGLEADWORDS_CAMPAIGN g
{% if is_incremental() %}
WHERE DT >= (SELECT MAX(DT) FROM {{ this }}) - INTERVAL '7 DAYS'
{% endif %}