left join an effective satellite without duplicating rows (use a cte)

data-vault-2.0 - This article is part of a series.

Part : This Article

the dreaded left join to an effective satellite…..

i wanted to write this note, if for no other reason, so the pattern is easier to find when someone searches the same problem. the situation is uncommon enough that public answers are rare, and the fix is small once you see it.

quick answer
#

if you need attributes from an effective satellite for an optional relationship, do not left join the link, hub, and effective satellite as separate tables in one flat from clause next to your driving link. that shape often reintroduces ambiguity where you can match more than one effective row or change the result grain in ways that are hard to reason about. instead, build a cte (or subquery) that already resolves one current effective row per link (for example load_end_ts = '9999-12-31' or your project’s equivalent), then left join that cte to your main link on the hub or link key you already carry.

who this is for
#

anyone modeling optional links in data vault 2.0 who still needs a stable one row per fact output
readers who inherited sql where optional dimensions quietly duplicate keys

why this matters
#

grain is the contract downstream queries trust. when an optional path multiplies rows, dashboards and application reads disagree on what “one entity” means, and the bug shows up late because the sql “looks” like a normal join tree.

step-by-step
#

1) define the starting point
#

you have a driving link (for example link between a case and a cycle) with inner joins to its own hub, effective satellite, and attribute satellite so the core row is current and unique. one foreign key on that link points at an optional hub (for example an optional grouping). not every driving row has that key set.

2) apply the change
#

put the optional side in a cte: start from the optional link, join the effective satellite with an inner predicate that keeps only the open interval row, then attach the hub if you need the business key. in the outer query, left join the cte on the optional hub hash (or whatever stable key you store on the driving link).

3) validate the result
#

count distinct keys on the driving natural or surrogate key before and after the change. the counts should match when the optional side is empty, and optional matches should add columns without adding rows.

a minimal example
#

this is anonymized and trimmed; names stand in for a real link, hub, and satellites. the following sql shows the shape i use.

WITH cte_optional_scope AS (
SELECT
    link_scope.scope_h,
    hub_scope.scope_bk
FROM link_scope link_scope
LEFT JOIN hub_scope hub_scope ON link_scope.scope_h = hub_scope.scope_h
INNER JOIN eff_sat_scope eff_sat_scope ON link_scope.l_scope_h = eff_sat_scope.l_scope_h
                                       AND eff_sat_scope.load_end_ts = '9999-12-31'
)
SELECT
    hub_case.case_bk AS case_id,
    sat_case.status_code,
    cte_optional_scope.scope_bk AS optional_scope_id
FROM link_case link_case
INNER JOIN hub_case hub_case ON link_case.case_h = hub_case.case_h
INNER JOIN eff_sat_case eff_sat_case ON link_case.l_case_h = eff_sat_case.l_case_h
                                     AND eff_sat_case.load_end_ts = '9999-12-31'
INNER JOIN sat_case sat_case ON eff_sat_case.case_h = sat_case.case_h
                             AND sat_case.load_end_ts = '9999-12-31'
LEFT JOIN cte_optional_scope cte_optional_scope ON link_case.scope_h = cte_optional_scope.scope_h

the important part is not the column list, it is the shape: the effective satellite predicate lives inside the cte, and the outer query only left joins the already-resolved optional row set.

faq
#

what is the most important caveat?
#

your project may use a different literal or rule for “current” effective rows. keep that predicate consistent with every other query that reads the same satellite family.

what should i do first?
#

reproduce the duplicate key count on a small filter (one day, one tenant, one test id) before you rewrite the full model so you can prove the cte removed multiplication.

references
#

data vault 2.0 series

left join an effective satellite without duplicating rows (use a cte)

quick answer
#

who this is for
#

why this matters
#

step-by-step
#

1) define the starting point
#

2) apply the change
#

3) validate the result
#

a minimal example
#

faq
#

what is the most important caveat?
#

what should i do first?
#

references
#

related reading
#

Related

quick answer#

who this is for#

why this matters#

step-by-step#

1) define the starting point#

2) apply the change#

3) validate the result#

a minimal example#

faq#

what is the most important caveat?#

what should i do first?#

references#

related reading#

Related

quick answer
#

who this is for
#

why this matters
#

step-by-step
#

1) define the starting point
#

2) apply the change
#

3) validate the result
#

a minimal example
#

faq
#

what is the most important caveat?
#

what should i do first?
#

references
#

related reading
#