Skip to main content
  1. posts/

the difference between snowflake and the "other" databases

 Author
Author
philip mathew hern
philliant
Table of Contents
snowflake - This article is part of a series.
Part : This Article

when you first step into data engineering, the sheer number of database options can be overwhelming. i spend a lot of my time working in snowflake, but it is definitely not the only tool in the shed.

to build a solid data platform, you have to understand where your data comes from and how different systems handle it. i want to break down how snowflake compares to two other popular systems you will encounter often: relational engines such as amazon rds and nosql patterns such as amazon dynamodb.

this is a basic guide written from a data engineer’s perspective. i will point out the similarities, the differences, and when you should use each one.

the contenders
#

before we compare them, let us define what we are looking at:

  • snowflake: a cloud-native data warehouse built for analytics (olap - online analytical processing). storage and compute are separate, so you pay for each part independently
  • amazon rds: a managed relational database service that runs engines like postgresql, mysql, sql server, and oracle. it is built for transactional app workloads (oltp - online transactional processing)
  • amazon dynamodb: a fully managed nosql key-value and document store built for very low-latency lookups at high scale

similarities and differences
#

at a high level, snowflake and rds both use sql, which makes them feel familiar. dynamodb is api-first for key-value access. it also offers partiql support, but it does not behave like a relational sql engine with joins and broad ad hoc querying.

the biggest architectural difference is how they handle storage and compute.

featuresnowflakeamazon RDSamazon dynamodb
primary use caseanalytics and reporting (olap)application backends (oltp)high-scale key-value applications
query languagesql (warehouse dialect)sql (engine-specific)api-first (partiql support)
architecturestorage and compute decoupledstorage and compute coupleddistributed key-value store
data structuretables and semi-structured datarelational tableskey-value and document items
indexesmicro-partition pruning metadatauser-managed indexespartition and sort keys

snowflake does not require traditional user-managed indexes for most workloads. it uses micro-partitions and pruning metadata. in rds, you usually create and manage indexes manually to keep queries fast. dynamodb pushes you to define access patterns up front using partition and sort keys.

scalability: small, big, and the limits
#

scalability means something completely different depending on which database you are talking about.

snowflake
#

snowflake scales compute and storage independently. storage is backed by cloud object storage, so it is effectively unbounded for most teams. you can store petabytes of data without thinking about disks.

compute is handled by virtual warehouses, which come in t-shirt sizes from x-small to 6x-large. if a query is too slow, you can use a larger warehouse. if you have too many concurrent users, you can add clusters.

  • small: gigabytes of data running on an x-small warehouse
  • big: petabytes of data running on a 4x-large warehouse
  • limits: practically none for storage; compute is limited only by your budget

amazon RDS
#

rds usually scales write capacity vertically by moving to a larger instance class. depending on engine and change type, this can trigger a restart or failover window. you can also scale read traffic horizontally with read replicas.

  • small: a few gigabytes on a t3.micro instance
  • big: several terabytes on a massive m6g.16xlarge instance
  • limits: many engines cap around 64 terabytes per instance (roughly 64 tib), and you still hit single-instance ceilings on cpu, memory, and concurrent connections

amazon dynamodb
#

dynamodb scales horizontally and automatically. it partitions your data across many servers behind the scenes.

  • small: a few megabytes with single-digit read/write capacity units
  • big: hundreds of terabytes handling millions of requests per second
  • limits: very high scale for key-value access, but full-table scans and broad analytical queries are usually expensive and inefficient

ease of learning curve and use
#

as a data engineer, the learning curve dictates how fast you can deliver value.

snowflake (easiest for analysts)
#

if you know sql, you can use snowflake quickly. the learning curve is low because there is very little infrastructure to manage. you do not worry about vacuuming tables, managing indexes, or tuning memory pools. you mostly load data and query it.

amazon RDS (moderate)
#

rds is easy to start with because postgresql and mysql are industry standards. however, the learning curve gets steep when you hit scale. you have to read query plans, manage indexes, handle connection pooling, and tune database parameters.

amazon dynamodb (steepest for modeling)
#

dynamodb has a notoriously steep learning curve for data modeling. because it is not a relational join engine, you usually model around known access patterns (often with single-table design). if those patterns change later, you may need redesign work and backfills.

when to use each
#

choosing the right database is about matching the tool to the workload.

use snowflake when:

  • you need to analyze massive amounts of data
  • you are building dashboards, reports, or a data warehouse
  • you need to join data from many different sources
  • your queries read millions of rows at a time

use amazon RDS when:

  • you are building a standard web application
  • you need strong transactional guarantees (ACID compliance)
  • your queries typically look up or update a single row or a small batch of rows
  • your core workload is transactional and fits a single relational instance pattern, even as it grows into multi-terabyte range

use amazon dynamodb when:

  • you are building a shopping cart, session store, or gaming leaderboard
  • you need guaranteed single-digit millisecond response times
  • you have massive, unpredictable spikes in traffic
  • you do not need to run complex analytical queries

as a data engineer, your job is often to extract data from operational systems (like rds and dynamodb) and load it into an analytical system (like snowflake). that is what allows the business to understand what is happening.

references
#

related reading#

snowflake - This article is part of a series.
Part : This Article

Related