dbt docs

Table of Contents

dbt - This article is part of a series.

Part : This Article

most data engineers i know will spend hours getting a model right, then skip the one step that makes it discoverable to everyone else. dbt docs are that step, and they are worth the effort.

quick answer
#

dbt docs is a built-in feature that generates a browsable website from your dbt project. it pulls descriptions from your yml files, renders a searchable model catalog, and draws a lineage graph showing how every model connects. running dbt docs generate followed by dbt docs serve gives you a local site instantly. the real payoff is that teammates who never open your sql files can still understand what each model does, what columns it exposes, and where the data comes from. this is especially useful to downstream consumers of your objects because they can see the exact format of the object, including column names and data types, etc.

who this is for
#

data engineers who use dbt but skip writing descriptions
analysts, product managers, or business users who need to understand available data without reading sql
team leads looking for a low-effort way to make data structures discoverable

why this matters
#

when you build a dbt project, the models represent real business concepts. customers, products, sales, inventory, whatever the domain is. the people who consume those models in dashboards or reports often have no involvement in building them and no reason to read raw sql.

without documentation, those consumers rely on tribal knowledge, slack messages, and guesswork. that does not scale. dbt docs solve this by turning the metadata you already maintain (yml files, project config, source definitions) into a navigable reference that anyone on the team can use. the effort to write a good description is small, and the compound value to the rest of the organization grows with every model you add.

i think of it like this: if i write a model and do not document it, the only person who truly understands it is me, and even that fades after a few months. if i write a two-sentence description and add column-level context, that knowledge lives in the project permanently and serves everyone who touches the data.

what dbt docs generates
#

when you run dbt docs generate, dbt produces two main artifacts in your target/ directory:

manifest.json contains the full project graph, including every model, source, seed, snapshot, and macro, along with their descriptions, tags, and configuration
catalog.json contains the schema-level metadata pulled from your warehouse, including column names, data types, and row counts

together with a bundled index.html, these files power a static site that you can open locally with dbt docs serve or host anywhere that serves static files.

the site includes
#

a searchable list of every model and data source in your project
model-level and column-level descriptions pulled from your yml files
the full sql compiled for each model (in each environment)
a lineage graph (dag - directed acyclic graph) that shows upstream sources, intermediate models, and downstream consumers for any selected node

the lineage graph is especially useful when someone asks “where does this column come from” or “what breaks if i change this source table”. instead of tracing through sql files manually, the graph answers it visually.

how to document your models
#

dbt reads documentation from yml files that live alongside your models. you are probably already using these for configuration and source definitions, so adding descriptions is a natural extension.

model and column descriptions in yml
#

the most common approach is adding description fields directly in your yml files. here is what that looks like for a view in an information delivery layer:

version: 2

models:
  - name: ORDER_SUMMARY_V
    description: >
      aggregated view of customer orders with totals and status
      breakdowns, consumed by the reporting dashboard and the
      customer details page in the application.
    columns:
      - name: CUSTOMER_ID
        description: "unique identifier for the customer"
      - name: TOTAL_ORDERS
        description: "count of all orders placed by this customer"
      - name: TOTAL_SPEND
        description: "sum of order amounts across all completed orders"
      - name: LAST_ORDER_DATE
        description: "most recent order date for this customer"

every model and every column can have a description. the more specific you are, the more useful the generated docs become. “id” as a column description does not help anyone. “unique identifier for the customer, sourced from the application database” does.

source descriptions
#

sources benefit from the same treatment. when your project ingests raw data from an external system, describing those sources in your shared sources file makes the lineage graph meaningful from the very first node:

version: 2

sources:
  - name: RAW_ORDERS
    description: "raw order data ingested from the transactional database via kafka"
    database: ANALYTICS_DEV_DB
    schema: RAW_DATA
    tables:
      - name: ORDERS
        description: "one row per order, includes order status and timestamps"
      - name: ORDER_ITEMS
        description: "line items for each order, one row per product per order"

doc blocks for longer descriptions
#

when a model needs more than a sentence or two of context, dbt supports doc blocks. these are markdown files (.md) that live in your project and can be referenced from yml descriptions:

{% docs order_summary_description %}

this view surfaces the aggregated order history for each customer.
it joins the hub, satellite, and link tables from the refined data
layer to produce a single wide row per customer.

**grain:** one row per customer.

**consumers:** reporting dashboard, customer details api endpoint.

{% enddocs %}

then in your yml:

models:
  - name: ORDER_SUMMARY_V
    description: '{{ doc("order_summary_description") }}'

doc blocks are useful when the context is long enough that embedding it inline in yaml becomes awkward. they also let you reuse the same description across multiple references if needed.

a few useful options in dbt docs
#

persist_docs
#

by default, dbt docs only live in the generated static site. if you want the descriptions to also appear in your warehouse catalog (so someone querying snowflake information_schema can see them), you can enable persist_docs:

models:
  my_data_product:
    +persist_docs:
      relation: true
      columns: true

with this enabled, dbt run pushes your yml descriptions into the COMMENT property on the table or view and on each column in the warehouse. this is valuable because it means the documentation is available even outside the dbt docs site, directly in the database catalog that tools like snowflake and bi platforms already read.

the lineage graph
#

the generated site includes an interactive dag that visualizes every model, source, and their connections. you can click on any node to see its upstream dependencies and downstream consumers. this is one of the most powerful features in dbt docs because it makes the data flow tangible for people who do not read sql (or maybe they just do not have access to your workspace, but still have a need to understand it).

when you have a project with dozens or hundreds of models organized into layers (raw, refined, business, information delivery), the lineage graph shows how a raw source table flows through transformations into the final views that analysts query. it replaces the need for manually maintained architecture diagrams that go stale the moment someone adds a new model.

exposures
#

exposures let you document where your dbt models are consumed outside of dbt. dashboards, applications, api endpoints, anything downstream. defining them makes the lineage graph extend beyond the dbt project boundary:

exposures:
  - name: customer_dashboard
    type: dashboard
    description: "executive dashboard showing customer order trends and retention"
    depends_on:
      - ref('ORDER_SUMMARY_V')
      - ref('CUSTOMER_RETENTION_V')
    owner:
      name: analytics team
      email: analytics@example.com

exposures show up in the lineage graph as leaf nodes, making it clear which models are actively consumed and by what. this is helpful when you are deciding whether it is safe to refactor or deprecate a model.

hosting dbt docs on github pages
#

running dbt docs serve is great for local browsing, but the real value comes from hosting the site where the whole team can access it without installing dbt or cloning the repo. github pages is a straightforward, free option for this.

how it works
#

after dbt docs generate runs, the target/ directory contains everything needed to serve the site: index.html, manifest.json, and catalog.json. you copy those files to a branch or directory that github pages serves, and the docs are live.

a basic github actions workflow
#

here is a minimal workflow that generates the docs on every push to main and deploys them to github pages:

name: deploy dbt docs

on:
  push:
    branches: [main]

permissions:
  contents: read
  pages: write
  id-token: write

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    steps:
      - uses: actions/checkout@v4

      - name: set up python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: install dbt
        run: pip install dbt-snowflake

      - name: generate docs
        run: dbt docs generate --profiles-dir .
        working-directory: my_dbt_project
        env:
          DBT_SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
          DBT_SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }}
          DBT_SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}

      - name: prepare pages artifact
        run: |
          mkdir -p pages
          cp my_dbt_project/target/index.html pages/
          cp my_dbt_project/target/manifest.json pages/
          cp my_dbt_project/target/catalog.json pages/

      - name: upload pages artifact
        uses: actions/upload-pages-artifact@v3
        with:
          path: pages

      - name: deploy to github pages
        id: deployment
        uses: actions/deploy-pages@v4

once this runs, your dbt docs are available at https://<org>.github.io/<repo>/ and automatically update every time someone merges to main. no one needs to install anything or run any commands to browse the documentation.

keep the credentials out of the repo
#

the workflow above uses github secrets for warehouse credentials. never commit profiles with real credentials. use environment variables or a ci-specific profiles.yml that references secrets, and make sure your .gitignore excludes any local profiles that contain actual passwords or tokens.

faq
#

do i need to write descriptions for every column?
#

you do not need to, but the columns that matter most to consumers deserve it. at minimum, describe the primary key, any business key, and any column whose meaning is not obvious from the name alone. over time, filling in the rest pays off as the team grows.

can i generate docs without connecting to the warehouse?
#

dbt docs generate pulls catalog metadata from the warehouse, so it does need a connection. however, if you already have a catalog.json from a previous run, you can serve the site locally with just those files.

how is this different from a wiki or confluence page?
#

dbt docs stay in sync with your code automatically. a wiki page about your data model goes stale the moment someone adds a column or renames a table. dbt docs regenerate from the source of truth (your yml files and your warehouse) every time you run the command, so the documentation and the code never drift apart.

dbt docs

quick answer
#

who this is for
#

why this matters
#

what dbt docs generates
#

the site includes
#

how to document your models
#

model and column descriptions in yml
#

source descriptions
#

doc blocks for longer descriptions
#

a few useful options in dbt docs
#

persist_docs
#

the lineage graph
#

exposures
#

hosting dbt docs on github pages
#

how it works
#

a basic github actions workflow
#

keep the credentials out of the repo
#

faq
#

do i need to write descriptions for every column?
#

can i generate docs without connecting to the warehouse?
#

how is this different from a wiki or confluence page?
#

references
#

related reading
#

Related

quick answer#

who this is for#

why this matters#

what dbt docs generates#

the site includes#

how to document your models#

model and column descriptions in yml#

source descriptions#

doc blocks for longer descriptions#

a few useful options in dbt docs#

persist_docs#

the lineage graph#

exposures#

hosting dbt docs on github pages#

how it works#

a basic github actions workflow#

keep the credentials out of the repo#

faq#

do i need to write descriptions for every column?#

can i generate docs without connecting to the warehouse?#

how is this different from a wiki or confluence page?#

references#

related reading#

Related

quick answer
#

who this is for
#

why this matters
#

what dbt docs generates
#

the site includes
#

how to document your models
#

model and column descriptions in yml
#

source descriptions
#

doc blocks for longer descriptions
#

a few useful options in dbt docs
#

persist_docs
#

the lineage graph
#

exposures
#

hosting dbt docs on github pages
#

how it works
#

a basic github actions workflow
#

keep the credentials out of the repo
#

faq
#

do i need to write descriptions for every column?
#

can i generate docs without connecting to the warehouse?
#

how is this different from a wiki or confluence page?
#

references
#

related reading
#