Data Engineering Foundations: A Practical Introduction to Snowflake, Fivetran, and dbt

Replicating a CSV dataset: Import a CSV file into Snowflake using Fivetran, then create a basic dbt model to clean and select essential columns

ยท

10 min read

Data Engineering Foundations: A Practical Introduction to Snowflake, Fivetran, and dbt

Photo by Dollar Gill on Unsplash

Welcome to our exciting journey through the world of data integration and transformation! Today, we're diving into a practical project that involves a trio of modern data tools: Snowflake, Fivetran, and dbt. Whether you're a data enthusiast, a professional looking to sharpen your skills, or just curious about how data moves and transforms in the cloud, you're in for a treat!

Engineering Your Data Path

Embark on a practical and enlightening journey through the landscape of data engineering with Snowflake, Fivetran, and dbt. This blog serves as your roadmap, detailing each step to transform raw data into insightful information.

Here's what we'll cover:

  1. Setting Up Your Workspace: Begin by securing a trial account with Snowflake.

  2. Acquiring the Data: Dive into Kaggle to fetch the Video Game Sales dataset, your key to hands-on learning, and prepare it for its cloud journey by loading it into a Google Cloud Storage (GCS) bucket.

  3. Integrating with Fivetran: Create your Fivetran account and connect it with your GCS bucket.

  4. Connecting the Dots with Snowflake: With Fivetran as your conduit, establish a link to Snowflake, setting up the database and configuring access within Fivetran to facilitate data synchronization.

  5. Synchronizing Your Data: Initiate the sync process, transferring the Video Game Sales data from your GCS bucket directly into Snowflake.

  6. Preparing for Transformation with dbt: Install the dbt-Snowflake adapter on your computer, laying the groundwork for data modeling and transformation.

  7. Launching Your dbt Project: Create a simple dbt project tailored to the Video Game Sales database in Snowflake.

  8. Executing Your Transformation: Run the dbt sync to materialize your new view in Snowflake, witnessing the culmination of your efforts as your data is transformed and ready for exploration.

1. Setting Up Your Workspace

๐Ÿ’ก
Snowflake โ€“ Where Data Comes to Play

Diving into the world of cloud data warehousing has never been easier, especially with Snowflake's offer of a trial account that allows you to test-drive its impressive features without spending a dime.

  1. Access the Snowflake Signup Page:

  2. Complete the Signup Form:

    • Basic Information: Provide your first name, last name, email address, company name, and phone number.

    • Account Setup: Create a username and choose your preferred cloud provider (AWS, Azure, or GCP) and region.

    • Edition: Select the 'Enterprise Edition' for your trial.

    • Agree to Terms: Review and agree to Snowflake's terms of service.

    • Click "Continue":

  3. Activate Your Account:

    • You'll receive an email from Snowflake with an activation link. Click on the link to confirm your email and activate your trial account.
  4. Create Username and Password:

    • Set up a secure username and password to access your Snowflake account.

2. Acquiring the Data

To acquire the data necessary for your analysis, you'll need to download the "Video Game Sales" dataset from Kaggle.

  1. Download Dataset
    This dataset is available in CSV format and can be obtained by visiting the following link: Video Game Sales Dataset on Kaggle.

    (direct download link: Video Game Sales Dataset)

  2. Upload the file to Google Cloud Storage Bucket

    • go to Google Cloud Storage Console

    • create a bucket (this blog will assume that the name of the buckets is: my-video-game-sales

    • upload the csv file downloaded from Kaggle into the bucket.

3. Integrating with Fivetran

๐Ÿ’ก
Fivetran โ€“ The Data Conductor

Fivetran simplifies data synchronization, offering a plug-and-play solution that requires minimal configuration. To start, visit Fivetran's website and sign up for a trial.

  1. Access the Fivetran Signup Page:

  2. Complete the form with your details or signup with Google

  3. Verify your email address by clicking the link sent to your inbox.

4. Connecting the Dots with Snowflake

We've just embarked on a spectacular journey, setting up a pipeline from the ground of Google Cloud Storage all the way to the lofty heights of Snowflake, with Fivetran as our trusty guide. It's a tale of connecting dots, a dance of data, where each step is meticulously choreographed to ensure the smooth flow of information.

  1. Configure Snowflake "Destination" in Fivetran

    • Sign in to Fivetran.

    • Click "Add Destination".

    • Name: Enter "my_snowflake"

    • Destination Type: Select "Snowflake"

    • Host: Enter your Snowflake host URL in the format your-account.snowflakecomputing.com (find this in your Snowflake welcome email or profile).

    • Password: Choose a strong password and record it securely.

    • Configure Snowflake

    • In Fivetran, click "Save & Test".

    • Wait for verification checks to pass, then click "View Destination".

  2. Configure Fivetran "Connector" (to GCS bucket)

    1. Click "Connectors".

    2. Click "Add Connector".

    3. Destination: Select "my_snowflake" (created earlier).

    4. Data Source: Select "Google Cloud Storage" and click "Setup".

    5. Destination Schema: Enter "my_fivetran_schema".

    6. Destination Table: Enter "my_video_game_sales".

    7. GCS Bucket Name: Enter "my-video-game-sales".

    8. Note the Fivetran email address: (e.g., 'g-gradation-scalping@fivetran-production.iam.gserviceaccount.com')

      Note: Look for text starting with Share your Google Cloud Storage bucket with this email address

    9. Configure Permissions (GCS):

      • Go to the Google Cloud Storage console.

      • Click on your bucket "my-video-game-sales".

      • Click "Permissions".

      • Click "Grant Access".

      • Paste the Fivetran email address.

      • Select "Storage Object Viewer" role.

      • Click "Save".

    10. In Fivetran, click "Save & Test".

    11. Wait for verification checks to pass, then click "Continue".

5. Synchronizing Your Data

  1. Initiate Data Sync:

    • Within Fivetran, navigate to the "Connectors" section.

    • Select the newly configured connector.

    • Click "Start Initial Sync" to begin the data transfer process.

  2. Verify Data in Snowflake

    • Open a new SQL worksheet in your Snowflake environment.

    • Perform the following queries to check your transferred data:

      Query 1:

        SELECT *
        FROM FIVETRAN_DATABASE.MY_FIVETRAN_SCHEMA.MY_VIDEO_GAME_SALES
        LIMIT 10;
      

      Expected Output:

        RANK    NAME                PLATFORM    YEAR        GENRE
        1        Wii Sports            Wii            2006        Sports
        2        Super Mario Bros.    NES            1985        Platform
        3        Mario Kart Wii        Wii            2008        Racing
        ...
      
      • Query 2:

          SELECT count(*)
          FROM FIVETRAN_DATABASE.MY_FIVETRAN_SCHEMA.MY_VIDEO_GAME_SALES;
        

        Expected Output:

          COUNT(*)
          16598
        

6. Preparing for Transformation with dbt

๐Ÿ’ก
dbt โ€“ The Data Sculptor

This is where dbt, our Data Build Tool, comes into play. Think of dbt as a sculptor, skilled at transforming raw blocks of data into meaningful insights. We'll set up a dbt project, connecting it to Snowflake, and prepare to mold our data.

Before using dbt with Snowflake, you need to install the correct adapter:

  1. Open a terminal or command prompt.

  2. Execute the following command:

     python -m pip install dbt-snowflake
    

    (https://docs.getdbt.com/docs/core/pip-install#installing-the-adapter)

7. Launching Your dbt Project

  1. Initializing Your dbt Project

    1. Open a terminal

    2. Create a dbt Project: Utilize the dbt init command to setup a new directory specifically for your dbt project.

      • Execute the command: dbt init videogamesales

      • This command kick-starts the creation of your new dbt project named "videogamesales."

    3. Configure Snowflake Details: After executing the dbt init command, you'll be prompted to enter your Snowflake credentials.

       dbt init videogamesales                                                                                                                                                
       07:58:36  Running with dbt=1.5.10                                                                                                                                           
       Which database would you like to use?
       [1] snowflake
      
       (Don't see the one you want? https://docs.getdbt.com/docs/available-adapters)
      
       Enter a number: 1
       account (https://<this_value>.snowflakecomputing.com): <your snowflake account identifier>
       user (dev username): <your username>
       [1] password
       [2] keypair
       [3] sso
       Desired authentication type option (enter a number): 1
       password (dev password): <your snowflake account password>
       role (dev role): 
       role (dev role):         
       role (dev role): accountadmin
       warehouse (warehouse name): FIVETRAN_WAREHOUSE
       database (default database that dbt will build objects in): FIVETRAN_DATABASE
       schema (default schema that dbt will build objects in): MY_FIVETRAN_SCHEMA
       threads (1 or more) [1]: 
       ...
       Your new dbt project "videogamesales" was created!
      
       ...
      
  2. Create Your First dbt Model

    • Navigate to themodels folder within your dbt project.

    • Create a new SQL file namedvideogames_clean.sql.

    • Add the following SQL code to thevideogames_clean.sql file:

        SELECT
            RANK,
            NAME,
            PLATFORM,
            YEAR,
            GENRE,
            PUBLISHER,
            GLOBAL_SALES
        FROM FIVETRAN_DATABASE.MY_FIVETRAN_SCHEMA.MY_VIDEO_GAME_SALES
      

      Note: This simple dbt model selects the core columns from your raw videogame sales data.

8. Executing Your Transformation

1. Run the Model

  • Open your terminal and navigate to your project directory (videogamesales).

  • Execute the following command: dbt run

  • This will trigger dbt to process your model, run the SQL code against your Snowflake database, and generate a new table storing the transformed results.

      dbt run                                                                                                                                                            1 โ†ต
      08:43:31  Running with dbt=1.5.10
      08:43:31  Registered adapter: snowflake=1.5.6
      08:43:31  Found 3 models, 4 tests, 0 snapshots, 0 analyses, 325 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
      08:43:31  
      08:43:33  Concurrency: 1 threads (target='dev')
      08:43:33  
      08:43:33  1 of 3 START sql table model MY_FIVETRAN_SCHEMA.my_first_dbt_model ............. [RUN]
      08:43:33  1 of 3 OK created sql table model MY_FIVETRAN_SCHEMA.my_first_dbt_model ........ [SUCCESS 1 in 0.86s]
      08:43:33  2 of 3 START sql view model MY_FIVETRAN_SCHEMA.videogames_clean ................ [RUN]
      08:43:34  2 of 3 OK created sql view model MY_FIVETRAN_SCHEMA.videogames_clean ........... [SUCCESS 1 in 0.45s]
      08:43:34  3 of 3 START sql view model MY_FIVETRAN_SCHEMA.my_second_dbt_model ............. [RUN]
      08:43:34  3 of 3 OK created sql view model MY_FIVETRAN_SCHEMA.my_second_dbt_model ........ [SUCCESS 1 in 0.47s]
      08:43:34  
      08:43:34  Finished running 1 table model, 2 view models in 0 hours 0 minutes and 3.13 seconds (3.13s).
      08:43:34  
      08:43:34  Completed successfully
      08:43:34  
      08:43:34  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
    
  1. Verify Data in Snowflake

    • Access your Snowflake environment and open a new SQL worksheet.

    • Run the following queries to validate your transformation:

      Query 1: View the newly created view

        SELECT *
        FROM FIVETRAN_DATABASE.MY_FIVETRAN_SCHEMA.VIDEOGAMES_CLEAN;
      

      Expected output:

        RANK    NAME                        PLATFORM    YEAR    GENRE            PUBLISHER    GLOBAL_SALES
        1        Wii Sports                    Wii            2006    Sports            Nintendo    82.74
        2        Super Mario Bros.            NES            1985    Platform        Nintendo    40.24
        3        Mario Kart Wii                Wii            2008    Racing            Nintendo    35.82
        4        Wii Sports Resort            Wii            2009    Sports            Nintendo    33
        5        Pokemon Red/Pokemon Blue    GB            1996    Role-Playing    Nintendo    31.37
      
  • Query 2: Check the original table

      SELECT *
      FROM FIVETRAN_DATABASE.MY_FIVETRAN_SCHEMA.MY_VIDEO_GAME_SALES
      ORDER BY GLOBAL_SALES DESC
      LIMIT 5;
    

    Expected output:

      _FILE            _LINE    _MODIFIED                        _FIVETRAN_SYNCED                RANK    NAME                        PLATFORM    YEAR    GENRE            PUBLISHER    NA_SALES    EU_SALES    JP_SALES    OTHER_SALES    GLOBAL_SALES
      vgsales.csv        0        2024-03-11 17:23:33.679 +0000    2024-03-11 18:58:32.888 +0000    1        Wii Sports                    Wii            2006    Sports            Nintendo    41.49        29.02        3.77        8.46        82.74
      vgsales.csv        1        2024-03-11 17:23:33.679 +0000    2024-03-11 18:58:32.893 +0000    2        Super Mario Bros.            NES            1985    Platform        Nintendo    29.08        3.58        6.81        0.77        40.24
      vgsales.csv        2        2024-03-11 17:23:33.679 +0000    2024-03-11 18:58:32.894 +0000    3        Mario Kart Wii                Wii            2008    Racing            Nintendo    15.85        12.88        3.79        3.31        35.82
      vgsales.csv        3        2024-03-11 17:23:33.679 +0000    2024-03-11 18:58:32.894 +0000    4        Wii Sports Resort            Wii            2009    Sports            Nintendo    15.75        11.01        3.28        2.96        33
      vgsales.csv        4        2024-03-11 17:23:33.679 +0000    2024-03-11 18:58:32.894 +0000    5        Pokemon Red/Pokemon Blue    GB            1996    Role-Playing    Nintendo    11.27        8.89        10.22        1            31.37
    

Journey Recap

In this blog, we embarked on a transformative data engineering expedition. We established our foundation with Snowflake, pulled in raw data from Kaggle through Google Cloud Storage, and streamlined the flow using Fivetran. Finally, dbt gracefully stepped in to model and transform our data, leaving us with analytics-ready insights within Snowflake.

This hands-on journey demonstrates the remarkable synergy between Snowflake, Fivetran, and dbt. You're now equipped to:

  • Workspace Establishment: You started by provisioning a Snowflake trial account, setting up the foundation for your project.

  • Data Source: You sourced the Video Game Sales dataset from Kaggle and moved it to Google Cloud Storage, readying it for the cloud.

  • Fivetran Integration: You seamlessly connected your GCS bucket with Fivetran for automated data ingestion.

  • Snowflake Connection: You linked Fivetran to Snowflake, enabling data flow into your cloud data warehouse.

  • Data Synchronization: You successfully synchronized your video game sales data from GCS to Snowflake.

  • dbt Foundations: You installed the dbt-Snowflake adapter and created a basic dbt project to begin your data transformation journey.

  • Data Transformation: You executed your dbt sync, witnessing the transformation of your data and the creation of a new analytics-ready view in Snowflake.

And That's a Wrap!

This exploration through the realms of data integration and transformation demonstrates the power and synergy of using Fivetran, Snowflake, and dbt in unison. Each tool played a pivotal role in morphing raw data into a refined asset ready for analysis. For data enthusiasts, professionals, or anyone curious about data's journey from raw to refined, this guide serves as a testament to the capabilities of modern data tools in extracting meaningful stories from numbers and figures.

Embarking on this journey not only equips you with the knowledge of using these tools but also instills a deeper appreciation for the art and science of data transformation. As you venture forward, remember that each step, from data preparation to transformation, is a building block in crafting your data narrative, empowering you to uncover insights that can inform decisions, spark innovation, and illuminate the hidden narratives within your data.

๐Ÿš€
May the Forge be with you!
ย