A Beginner’s Guide to Ingesting Data: Understanding ETL, ELT, and Modern Data Platforms

Brad Boldenow
Dec 11, 2024
4 min read

Updated: Dec 13, 2024

You keep hearing how important data is for your business, but how do you actually get that data where you need it? In this post, we’ll explore how you can collect the data you need and the different approaches you can take.

To illustrate this, we’ll walk you through a project we tackled for a client to set up an automated data ingestion process.

The Challenge: Automating Data Collection for Actionable Insights

A financial services client needed dashboards to understand how marketing investments were driving website conversions. They needed answers to key questions:

What sort of content drives the most engaged website visitors?
What social audiences are most likely to subscribe?
Are there certain mediums that drive uninterested visitors?

The problem? They didn’t have access to the marketing data required to answer these questions. Reports from their agency were too basic, and while they could export CSV reports, no one wants to spend hours every day manually pulling data. What they needed was an automated way to collect and deliver actionable insights.

Designing the Data Ingestion Process

Our goal was to build a system that could automate data collection to enable the insights the client needed. We started by working closely with the client to identify their data sources and specific fields of interest. This allowed us to design a data model that aligned with their goals.

Curious what this sort of "data model" actually looks like? Check out this simple example!

This data model was then used as the project roadmap. We used this to ensure what we were building was aligned with their business needs.

Next, we faced the challenge of determining how to ingest the data. This leads to two of the worst acronyms in the world: ETL and ELT.

ETL vs. ELT: Choosing the Right Approach

While the acronyms are confusingly similar, understanding the difference between ETL and ELT is key to choosing the right data ingestion strategy:

ETL (Extract > Transform > Load): Data is extracted, cleaned, and transformed before being loaded into the storage system.
ELT (Extract > Load > Transform): Data is extracted and loaded into the storage system as-is, and transformations occur afterward.

The key distinction between ETL and ELT lies in the timing of the data transformation process. In ETL (Extract, Transform, Load), the data is cleaned and transformed before it is loaded into the storage system, ensuring that only processed, structured data is stored. In contrast, ELT (Extract, Load, Transform) loads raw data directly into the storage system first, where transformations are performed later, leveraging the power of modern, scalable storage solutions.

While there are many considerations to make when determining the approach that works best for you, at a high level, the choice often comes down to your data’s complexity, volume, and how quickly you need results. ETL might be the better option if you need structured, ready-to-use data immediately. On the other hand, ELT offers more flexibility and is ideal for handling massive amounts of raw data, especially when using modern cloud-based tools.

For our client, we chose ELT as the best approach. They were working with a large amount of data and needed extensive transformations to extract the insights their team required. ELT was a better fit because it allowed us to efficiently extract and store raw data in Snowflake. From there, we could take advantage of Snowflake’s powerful features and other tools to handle the heavy-duty transformations.

Building the Data Pipeline

With ELT selected as our approach, the next step was deciding how to implement it: should we choose an off-the-shelf tool or build a custom solution from scratch?

This decision hinges on a few key factors. Using a pre-built tool can save significant time and effort, especially when you need to integrate with multiple data sources or scale quickly. Many platforms come with built-in connectors, automation, and ongoing support, making them ideal for teams that want to focus more on insights and less on infrastructure.

On the flip side, building a custom solution offers greater control and flexibility. It’s often the preferred route when dealing with highly unique requirements or when the organization has the engineering resources to support and maintain it over time.

In our case, the client’s requirements and timeline made a pre-built tool the clear choice. We selected Fivetran, a leading ELT platform, because of its extensive library of pre-built connectors for all the client’s data sources. With Fivetran, you simply select the sources of data you need to pull from, which fields you want and Fivetran will auto-magically handle extracting all that data and loading it into your storage system. No SQL code needed.

Data Storage and Transformation

For this project, the client was already leveraging Snowflake as their data warehouse, simplifying the decision on where to load the data. With Fivetran’s seamless integration capabilities, ingesting data into Snowflake became straightforward.

The remaining task was to handle the T(ransformations) phase.

In Snowflake, transformations involved writing SQL queries to create new tables, aligning the data structure with reporting needs. Here are a few key examples of these transformations:

Data Cleaning: Standardizing formats (e.g., consistent dates).
Data Enrichment: Adding context by mapping campaign IDs to budgets.
Data Aggregation: Grouping the data into the relevant buckets the team wants to focus on.

With transformations complete, the data was ready to be brought into Mode, our client's visualization platform. Because our data was structured around the questions the team needs to answer, building impactful dashboards was a breeze.

Results and Key Lessons Learned

By automating the data pipeline, the client eliminated time-consuming manual data pulls and gained access to real-time insights. This empowerment led to confident, data-driven decision-making.

Key Takeaways:

Understand Your Needs: Why do you need an ingestion process? What questions need to be answered?
Understand Your Data Sources: Clearly define what data is required and its origins.
Choose the Right Approach: Tailor your ETL or ELT strategy to your transformation and volume needs.
Automate Wherever Possible: Allow your team to focus on insights rather than manual data handling.

Conclusion: Unlock the Power of Your Data

By building a streamlined data ingestion pipeline, our client achieved a competitive edge, driving smarter marketing decisions. If you’re encountering similar challenges, a robust data ingestion process could be your solution.

Ready to get started? With the right tools, approach, and strategy, turning your data into actionable insights is within reach.

boreas