Duration: Days 12-14 | 6-8 hours total
Goal: Create business-ready aggregations, KPIs, and analytics models
In Phase 5, you will:
Gold Layer Philosophy: Business-ready, highly aggregated data optimized for reporting and dashboards. No joins required for end users.
Before starting Phase 5:
Silver Tables (dimensional model)
↓
dbt Gold Models
↓
Gold Tables (aggregated)
↓
[customer_metrics]
[product_performance]
[daily_revenue]
[rfm_segments]
[cohort_analysis]
cd dbt
mkdir -p models/gold/metrics
mkdir -p models/gold/aggregations
mkdir -p models/gold/cohorts
dbt_project.yml with Gold configuration:Add this to the models section:
models:
ecommerce_analytics:
# ... existing staging and silver config ...
# Gold layer - business aggregations
gold:
+materialized: table
+schema: gold
metrics:
+tags: ['gold', 'metrics']
aggregations:
+tags: ['gold', 'aggregations']
cohorts:
+tags: ['gold', 'cohorts']
In Databricks SQL Editor or notebook:
CREATE DATABASE IF NOT EXISTS gold;
USE gold;
✅ CHECKPOINT
Build comprehensive customer analytics.
models/gold/metrics/customer_metrics.sql:{{
config(
materialized='table',
schema='gold'
)
}}
with customers as (
select * from {{ ref('dim_customers') }}
),
orders as (
select * from {{ ref('fact_orders') }}
where status = 'completed'
),
-- Customer order metrics
customer_orders as (
select
customer_id,
count(*) as order_count,
sum(total) as total_revenue,
avg(total) as avg_order_value,
min(order_date) as first_order_date,
max(order_date) as last_order_date,
datediff(max(order_date), min(order_date)) as customer_lifespan_days,
sum(num_items) as total_items_purchased,
avg(num_items) as avg_items_per_order
from orders
group by customer_id
),
-- Recent activity
recent_orders as (
select
customer_id,
count(*) as orders_last_30_days
from orders
where order_date >= date_sub(current_date(), 30)
group by customer_id
),
final as (
select
-- Customer info
c.customer_id,
c.full_name,
c.email,
c.country,
c.state,
c.segment,
c.value_segment,
c.lifecycle_stage,
c.marketing_opt_in,
-- Registration
c.registration_date,
c.days_since_registration,
-- Order metrics
coalesce(co.order_count, 0) as lifetime_orders,
coalesce(co.total_revenue, 0) as lifetime_revenue,
coalesce(co.avg_order_value, 0) as avg_order_value,
coalesce(co.total_items_purchased, 0) as total_items_purchased,
coalesce(co.avg_items_per_order, 0) as avg_items_per_order,
-- Dates
co.first_order_date,
co.last_order_date,
datediff(current_date(), co.last_order_date) as days_since_last_order,
co.customer_lifespan_days,
-- Recent activity
coalesce(ro.orders_last_30_days, 0) as orders_last_30_days,
-- Engagement metrics
case
when co.customer_lifespan_days > 0
then round(co.order_count / (co.customer_lifespan_days / 30.0), 2)
else 0
end as avg_orders_per_month,
case
when co.order_count > 0
then round(co.total_revenue / co.order_count, 2)
else 0
end as revenue_per_order,
-- Customer quality score (0-100)
least(100,
(case when co.order_count >= 10 then 25 else co.order_count * 2.5 end) +
(case when co.total_revenue >= 1000 then 25 else co.total_revenue / 40 end) +
(case when datediff(current_date(), co.last_order_date) <= 30 then 25
when datediff(current_date(), co.last_order_date) <= 90 then 15
else 5 end) +
(case when c.marketing_opt_in then 15 else 0 end) +
(case when c.is_complete_profile then 10 else 0 end)
) as customer_quality_score,
-- Timestamps
current_timestamp() as metric_updated_at
from customers c
left join customer_orders co on c.customer_id = co.customer_id
left join recent_orders ro on c.customer_id = ro.customer_id
)
select * from final
models/gold/metrics/customer_metrics.yml:version: 2
models:
- name: customer_metrics
description: |
Comprehensive customer metrics for analysis and segmentation.
One row per customer with all key performance indicators.
columns:
- name: customer_id
description: "Primary key"
tests:
- unique
- not_null
- name: lifetime_revenue
description: "Total revenue from completed orders"
- name: customer_quality_score
description: |
Composite score (0-100) based on:
- Order frequency (25 points)
- Total spend (25 points)
- Recent activity (25 points)
- Marketing opt-in (15 points)
- Complete profile (10 points)
dbt run --models customer_metrics
✅ CHECKPOINT
Implement Recency, Frequency, Monetary segmentation.
models/gold/cohorts/rfm_segments.sql:{{
config(
materialized='table',
schema='gold'
)
}}
with customer_rfm_values as (
select
customer_id,
datediff(current_date(), last_order_date) as recency_days,
lifetime_orders as frequency,
lifetime_revenue as monetary
from {{ ref('customer_metrics') }}
where lifetime_orders > 0
),
-- Calculate quartiles for each RFM component
rfm_quartiles as (
select
percentile_approx(recency_days, 0.25) as recency_q1,
percentile_approx(recency_days, 0.50) as recency_q2,
percentile_approx(recency_days, 0.75) as recency_q3,
percentile_approx(frequency, 0.25) as frequency_q1,
percentile_approx(frequency, 0.50) as frequency_q2,
percentile_approx(frequency, 0.75) as frequency_q3,
percentile_approx(monetary, 0.25) as monetary_q1,
percentile_approx(monetary, 0.50) as monetary_q2,
percentile_approx(monetary, 0.75) as monetary_q3
from customer_rfm_values
),
-- Assign RFM scores (1-4, where 4 is best)
rfm_scores as (
select
cv.customer_id,
cv.recency_days,
cv.frequency,
cv.monetary,
-- Recency score (lower days = better = higher score)
case
when cv.recency_days <= q.recency_q1 then 4
when cv.recency_days <= q.recency_q2 then 3
when cv.recency_days <= q.recency_q3 then 2
else 1
end as recency_score,
-- Frequency score (more orders = better = higher score)
case
when cv.frequency >= q.frequency_q3 then 4
when cv.frequency >= q.frequency_q2 then 3
when cv.frequency >= q.frequency_q1 then 2
else 1
end as frequency_score,
-- Monetary score (more revenue = better = higher score)
case
when cv.monetary >= q.monetary_q3 then 4
when cv.monetary >= q.monetary_q2 then 3
when cv.monetary >= q.monetary_q1 then 2
else 1
end as monetary_score
from customer_rfm_values cv
cross join rfm_quartiles q
),
final as (
select
customer_id,
recency_days,
frequency,
monetary,
recency_score,
frequency_score,
monetary_score,
-- Combined RFM score
concat(
cast(recency_score as string),
cast(frequency_score as string),
cast(monetary_score as string)
) as rfm_score,
-- Overall RFM value (1-12 scale)
recency_score + frequency_score + monetary_score as rfm_total,
-- Segment classification
case
when recency_score >= 3 and frequency_score >= 3 and monetary_score >= 3
then 'Champions'
when recency_score >= 3 and frequency_score <= 2 and monetary_score >= 3
then 'Loyal Customers'
when recency_score >= 3 and frequency_score <= 2 and monetary_score <= 2
then 'Potential Loyalists'
when recency_score >= 3 and frequency_score >= 3 and monetary_score <= 2
then 'Recent Customers'
when recency_score <= 2 and frequency_score >= 3 and monetary_score >= 3
then 'At Risk'
when recency_score <= 2 and frequency_score >= 2 and monetary_score >= 2
then 'Cant Lose Them'
when recency_score <= 2 and frequency_score <= 2 and monetary_score >= 3
then 'Hibernating High Value'
when recency_score <= 1 and frequency_score >= 2 and monetary_score <= 2
then 'About to Sleep'
when recency_score <= 1 and frequency_score <= 1 and monetary_score <= 2
then 'Lost'
else 'Other'
end as rfm_segment,
current_timestamp() as segment_updated_at
from rfm_scores
)
select * from final
models/gold/cohorts/rfm_segment_summary.sql:{{
config(
materialized='table',
schema='gold'
)
}}
with rfm as (
select * from {{ ref('rfm_segments') }}
),
segment_stats as (
select
rfm_segment,
count(*) as customer_count,
round(avg(recency_days), 1) as avg_recency_days,
round(avg(frequency), 1) as avg_frequency,
round(avg(monetary), 2) as avg_monetary,
round(sum(monetary), 2) as total_revenue,
round(avg(rfm_total), 1) as avg_rfm_score
from rfm
group by rfm_segment
)
select
rfm_segment,
customer_count,
round(customer_count * 100.0 / sum(customer_count) over(), 2) as pct_of_customers,
avg_recency_days,
avg_frequency,
avg_monetary,
total_revenue,
round(total_revenue * 100.0 / sum(total_revenue) over(), 2) as pct_of_revenue,
avg_rfm_score
from segment_stats
order by total_revenue desc
dbt run --models rfm_segments rfm_segment_summary
SELECT * FROM gold.rfm_segment_summary ORDER BY total_revenue DESC;
✅ CHECKPOINT
models/gold/metrics/product_performance.sql:{{
config(
materialized='table',
schema='gold'
)
}}
with products as (
select * from {{ ref('dim_products') }}
),
-- Sales metrics by time period
sales_30d as (
select
oi.product_id,
sum(oi.quantity) as qty_sold_30d,
sum(oi.total_price) as revenue_30d,
count(distinct oi.order_id) as orders_30d
from {{ ref('fact_order_items') }} oi
join {{ ref('fact_orders') }} o on oi.order_id = o.order_id
where o.order_date >= date_sub(current_date(), 30)
and o.status = 'completed'
group by oi.product_id
),
sales_90d as (
select
oi.product_id,
sum(oi.quantity) as qty_sold_90d,
sum(oi.total_price) as revenue_90d
from {{ ref('fact_order_items') }} oi
join {{ ref('fact_orders') }} o on oi.order_id = o.order_id
where o.order_date >= date_sub(current_date(), 90)
and o.status = 'completed'
group by oi.product_id
),
-- Web engagement metrics
product_engagement as (
select
product_id,
count(*) as total_views,
count(distinct session_id) as unique_sessions,
count(distinct customer_id) as unique_viewers
from {{ ref('fact_web_events') }}
where event_type = 'product_view'
and product_id is not null
and event_date >= date_sub(current_date(), 30)
group by product_id
),
final as (
select
-- Product info
p.product_id,
p.product_name,
p.category,
p.subcategory,
p.price,
p.cost,
p.profit_margin_pct,
p.stock_quantity,
p.stock_status,
p.is_active,
-- Lifetime metrics
p.total_quantity_sold as lifetime_quantity_sold,
p.total_revenue as lifetime_revenue,
p.times_ordered as lifetime_orders,
-- 30-day metrics
coalesce(s30.qty_sold_30d, 0) as qty_sold_30d,
coalesce(s30.revenue_30d, 0) as revenue_30d,
coalesce(s30.orders_30d, 0) as orders_30d,
-- 90-day metrics
coalesce(s90.qty_sold_90d, 0) as qty_sold_90d,
coalesce(s90.revenue_90d, 0) as revenue_90d,
-- Web metrics (30 days)
coalesce(pe.total_views, 0) as views_30d,
coalesce(pe.unique_sessions, 0) as sessions_30d,
coalesce(pe.unique_viewers, 0) as viewers_30d,
-- Conversion metrics
case
when coalesce(pe.total_views, 0) > 0
then round(coalesce(s30.orders_30d, 0) * 100.0 / pe.total_views, 2)
else 0
end as view_to_purchase_rate_30d,
-- Velocity metrics (units per day)
round(coalesce(s30.qty_sold_30d, 0) / 30.0, 2) as daily_velocity_30d,
round(coalesce(s90.qty_sold_90d, 0) / 90.0, 2) as daily_velocity_90d,
-- Profit metrics
round(coalesce(s30.revenue_30d, 0) * (p.profit_margin_pct / 100), 2) as profit_30d,
round(p.total_revenue * (p.profit_margin_pct / 100), 2) as lifetime_profit,
-- Days of inventory remaining
case
when coalesce(s30.qty_sold_30d, 0) > 0
then round(p.stock_quantity / (s30.qty_sold_30d / 30.0), 0)
else 999
end as days_of_inventory,
-- Product health score (0-100)
least(100,
(case when coalesce(s30.orders_30d, 0) >= 10 then 30 else coalesce(s30.orders_30d, 0) * 3 end) +
(case when p.stock_quantity >= 100 then 20
when p.stock_quantity >= 50 then 15
when p.stock_quantity > 0 then 10
else 0 end) +
(case when p.profit_margin_pct >= 50 then 25
when p.profit_margin_pct >= 30 then 20
when p.profit_margin_pct >= 20 then 15
else 10 end) +
(case when coalesce(pe.total_views, 0) >= 100 then 15 else coalesce(pe.total_views, 0) * 0.15 end) +
(case when p.is_active then 10 else 0 end)
) as product_health_score,
-- Performance tier
case
when coalesce(s30.revenue_30d, 0) >= 1000 then 'Platinum'
when coalesce(s30.revenue_30d, 0) >= 500 then 'Gold'
when coalesce(s30.revenue_30d, 0) >= 100 then 'Silver'
when coalesce(s30.revenue_30d, 0) > 0 then 'Bronze'
else 'Inactive'
end as performance_tier_30d,
current_timestamp() as metric_updated_at
from products p
left join sales_30d s30 on p.product_id = s30.product_id
left join sales_90d s90 on p.product_id = s90.product_id
left join product_engagement pe on p.product_id = pe.product_id
)
select * from final
models/gold/metrics/category_performance.sql:{{
config(
materialized='table',
schema='gold'
)
}}
select
category,
count(*) as product_count,
sum(case when is_active then 1 else 0 end) as active_products,
-- Revenue metrics
round(sum(revenue_30d), 2) as revenue_30d,
round(sum(revenue_90d), 2) as revenue_90d,
round(sum(lifetime_revenue), 2) as lifetime_revenue,
-- Profit metrics
round(sum(profit_30d), 2) as profit_30d,
round(sum(lifetime_profit), 2) as lifetime_profit,
round(avg(profit_margin_pct), 2) as avg_profit_margin,
-- Sales metrics
sum(qty_sold_30d) as units_sold_30d,
sum(qty_sold_90d) as units_sold_90d,
sum(lifetime_quantity_sold) as lifetime_units_sold,
-- Engagement metrics
sum(views_30d) as total_views_30d,
round(avg(view_to_purchase_rate_30d), 2) as avg_conversion_rate,
-- Inventory
sum(stock_quantity) as total_inventory,
round(avg(days_of_inventory), 0) as avg_days_of_inventory,
-- Performance
round(avg(product_health_score), 1) as avg_health_score,
current_timestamp() as metric_updated_at
from {{ ref('product_performance') }}
group by category
order by revenue_30d desc
dbt run --models product_performance category_performance
✅ CHECKPOINT
models/gold/aggregations/daily_revenue.sql:{{
config(
materialized='table',
schema='gold'
)
}}
with orders as (
select * from {{ ref('fact_orders') }}
where status = 'completed'
),
daily_metrics as (
select
order_date,
-- Order metrics
count(distinct order_id) as total_orders,
count(distinct customer_id) as unique_customers,
-- Revenue metrics
round(sum(total), 2) as gross_revenue,
round(sum(subtotal), 2) as subtotal,
round(sum(discount), 2) as total_discounts,
round(sum(shipping_cost), 2) as shipping_revenue,
round(sum(tax), 2) as tax_collected,
-- Average metrics
round(avg(total), 2) as avg_order_value,
round(avg(num_items), 2) as avg_items_per_order,
round(avg(discount_pct), 2) as avg_discount_pct,
-- Order composition
sum(num_items) as total_items_sold,
sum(total_quantity) as total_units_sold,
-- Payment methods
sum(case when payment_method = 'credit_card' then 1 else 0 end) as orders_credit_card,
sum(case when payment_method = 'debit_card' then 1 else 0 end) as orders_debit_card,
sum(case when payment_method = 'paypal' then 1 else 0 end) as orders_paypal,
sum(case when payment_method = 'apple_pay' then 1 else 0 end) as orders_apple_pay,
-- Order sizes
sum(case when order_size = 'Small' then 1 else 0 end) as orders_small,
sum(case when order_size = 'Medium' then 1 else 0 end) as orders_medium,
sum(case when order_size = 'Large' then 1 else 0 end) as orders_large,
sum(case when order_size = 'Extra Large' then 1 else 0 end) as orders_xl,
-- Flags
sum(case when has_discount then 1 else 0 end) as orders_with_discount,
sum(case when free_shipping then 1 else 0 end) as orders_free_shipping,
sum(case when is_weekend then 1 else 0 end) as orders_weekend
from orders
group by order_date
),
-- Add rolling averages
with_rolling as (
select
*,
-- 7-day rolling averages
round(avg(gross_revenue) over (
order by order_date rows between 6 preceding and current row
), 2) as revenue_7d_avg,
round(avg(total_orders) over (
order by order_date rows between 6 preceding and current row
), 1) as orders_7d_avg,
round(avg(avg_order_value) over (
order by order_date rows between 6 preceding and current row
), 2) as aov_7d_avg,
-- 30-day rolling averages
round(avg(gross_revenue) over (
order by order_date rows between 29 preceding and current row
), 2) as revenue_30d_avg,
round(avg(total_orders) over (
order by order_date rows between 29 preceding and current row
), 1) as orders_30d_avg
from daily_metrics
)
select
*,
current_timestamp() as metric_updated_at
from with_rolling
order by order_date desc
models/gold/aggregations/monthly_revenue.sql:{{
config(
materialized='table',
schema='gold'
)
}}
with orders as (
select * from {{ ref('fact_orders') }}
where status = 'completed'
),
monthly_metrics as (
select
date_format(order_date, 'yyyy-MM') as order_month,
year(order_date) as order_year,
month(order_date) as order_month_num,
-- Order metrics
count(distinct order_id) as total_orders,
count(distinct customer_id) as unique_customers,
round(count(distinct order_id) * 1.0 / count(distinct customer_id), 2) as orders_per_customer,
-- Revenue metrics
round(sum(total), 2) as gross_revenue,
round(sum(discount), 2) as total_discounts,
round(avg(total), 2) as avg_order_value,
-- Items
sum(num_items) as total_items,
round(avg(num_items), 2) as avg_items_per_order,
-- Customer segments
count(distinct case when shipping_country = 'USA' then customer_id end) as customers_usa,
count(distinct case when shipping_country != 'USA' then customer_id end) as customers_international
from orders
group by
date_format(order_date, 'yyyy-MM'),
year(order_date),
month(order_date)
),
with_growth as (
select
*,
-- Month-over-month growth
round(
(gross_revenue - lag(gross_revenue) over (order by order_month)) /
lag(gross_revenue) over (order by order_month) * 100,
2
) as revenue_mom_growth_pct,
round(
(total_orders - lag(total_orders) over (order by order_month)) * 1.0 /
lag(total_orders) over (order by order_month) * 100,
2
) as orders_mom_growth_pct,
-- Year-over-year growth
round(
(gross_revenue - lag(gross_revenue, 12) over (order by order_month)) /
lag(gross_revenue, 12) over (order by order_month) * 100,
2
) as revenue_yoy_growth_pct
from monthly_metrics
)
select
*,
current_timestamp() as metric_updated_at
from with_growth
order by order_month desc
models/gold/aggregations/weekly_web_activity.sql:{{
config(
materialized='table',
schema='gold'
)
}}
with events as (
select * from {{ ref('fact_web_events') }}
),
weekly_metrics as (
select
date_trunc('week', event_date) as week_start_date,
-- Event counts
count(*) as total_events,
count(distinct session_id) as unique_sessions,
count(distinct customer_id) as unique_authenticated_users,
-- Event types
sum(case when event_type = 'page_view' then 1 else 0 end) as page_views,
sum(case when event_type = 'product_view' then 1 else 0 end) as product_views,
sum(case when event_type = 'add_to_cart' then 1 else 0 end) as add_to_carts,
sum(case when event_type = 'search' then 1 else 0 end) as searches,
sum(case when event_type = 'wishlist_add' then 1 else 0 end) as wishlist_adds,
-- Devices
sum(case when device_type = 'desktop' then 1 else 0 end) as events_desktop,
sum(case when device_type = 'mobile' then 1 else 0 end) as events_mobile,
sum(case when device_type = 'tablet' then 1 else 0 end) as events_tablet,
-- User types
sum(case when user_type = 'Authenticated' then 1 else 0 end) as events_authenticated,
sum(case when user_type = 'Anonymous' then 1 else 0 end) as events_anonymous,
-- Engagement
round(count(*) * 1.0 / count(distinct session_id), 2) as avg_events_per_session
from events
group by date_trunc('week', event_date)
)
select
*,
-- Conversion funnel metrics
round(add_to_carts * 100.0 / nullif(product_views, 0), 2) as product_to_cart_rate,
round(searches * 100.0 / nullif(page_views, 0), 2) as search_rate,
-- Device mix
round(events_mobile * 100.0 / total_events, 2) as mobile_pct,
round(events_desktop * 100.0 / total_events, 2) as desktop_pct,
current_timestamp() as metric_updated_at
from weekly_metrics
order by week_start_date desc
dbt run --models daily_revenue monthly_revenue weekly_web_activity
✅ CHECKPOINT
Build customer cohort analysis by registration month.
models/gold/cohorts/customer_cohorts.sql:{{
config(
materialized='table',
schema='gold'
)
}}
with customers as (
select
customer_id,
date_format(registration_date, 'yyyy-MM') as cohort_month,
registration_date
from {{ ref('dim_customers') }}
),
orders as (
select
customer_id,
order_date,
total,
date_format(order_date, 'yyyy-MM') as order_month
from {{ ref('fact_orders') }}
where status = 'completed'
),
-- First order per customer
first_orders as (
select
customer_id,
min(order_date) as first_order_date,
date_format(min(order_date), 'yyyy-MM') as first_order_month
from orders
group by customer_id
),
-- Calculate months since cohort for each order
cohort_orders as (
select
c.cohort_month,
c.customer_id,
o.order_month,
o.total,
months_between(
to_date(o.order_month, 'yyyy-MM'),
to_date(c.cohort_month, 'yyyy-MM')
) as months_since_registration
from customers c
join orders o on c.customer_id = o.customer_id
),
-- Aggregate by cohort and month
cohort_metrics as (
select
cohort_month,
months_since_registration,
count(distinct customer_id) as active_customers,
count(*) as total_orders,
round(sum(total), 2) as total_revenue,
round(avg(total), 2) as avg_order_value
from cohort_orders
where months_since_registration >= 0
group by cohort_month, months_since_registration
),
-- Cohort sizes
cohort_sizes as (
select
cohort_month,
count(*) as cohort_size
from customers
group by cohort_month
),
final as (
select
cm.cohort_month,
cs.cohort_size,
cm.months_since_registration,
cm.active_customers,
round(cm.active_customers * 100.0 / cs.cohort_size, 2) as retention_rate,
cm.total_orders,
cm.total_revenue,
cm.avg_order_value,
round(cm.total_revenue / cs.cohort_size, 2) as revenue_per_cohort_customer,
current_timestamp() as metric_updated_at
from cohort_metrics cm
join cohort_sizes cs on cm.cohort_month = cs.cohort_month
)
select * from final
order by cohort_month, months_since_registration
models/gold/cohorts/cohort_retention_summary.sql:{{
config(
materialized='table',
schema='gold'
)
}}
with cohorts as (
select * from {{ ref('customer_cohorts') }}
),
-- Get month 0, 1, 3, 6, 12 retention rates for each cohort
retention_pivoted as (
select
cohort_month,
cohort_size,
max(case when months_since_registration = 0 then retention_rate end) as retention_month_0,
max(case when months_since_registration = 1 then retention_rate end) as retention_month_1,
max(case when months_since_registration = 3 then retention_rate end) as retention_month_3,
max(case when months_since_registration = 6 then retention_rate end) as retention_month_6,
max(case when months_since_registration = 12 then retention_rate end) as retention_month_12,
sum(total_revenue) as cohort_lifetime_revenue,
round(sum(total_revenue) / cohort_size, 2) as revenue_per_customer
from cohorts
group by cohort_month, cohort_size
)
select
cohort_month,
cohort_size,
retention_month_0,
retention_month_1,
retention_month_3,
retention_month_6,
retention_month_12,
cohort_lifetime_revenue,
revenue_per_customer,
current_timestamp() as metric_updated_at
from retention_pivoted
order by cohort_month desc
dbt run --models customer_cohorts cohort_retention_summary
✅ CHECKPOINT
Build a single executive summary table.
models/gold/metrics/executive_summary.sql:{{
config(
materialized='table',
schema='gold'
)
}}
with current_metrics as (
-- Revenue metrics (last 30 days)
select
'Revenue - Last 30 Days' as metric_name,
'Revenue' as metric_category,
round(sum(gross_revenue), 2) as metric_value,
'currency' as metric_type
from {{ ref('daily_revenue') }}
where order_date >= date_sub(current_date(), 30)
union all
-- Revenue growth
select
'Revenue Growth - MoM' as metric_name,
'Revenue' as metric_category,
revenue_mom_growth_pct as metric_value,
'percentage' as metric_type
from {{ ref('monthly_revenue') }}
order by order_month desc
limit 1
union all
-- Active customers
select
'Active Customers - Last 30 Days' as metric_name,
'Customers' as metric_category,
count(distinct customer_id) as metric_value,
'count' as metric_type
from {{ ref('fact_orders') }}
where order_date >= date_sub(current_date(), 30)
and status = 'completed'
union all
-- Total customers
select
'Total Customers' as metric_name,
'Customers' as metric_category,
count(*) as metric_value,
'count' as metric_type
from {{ ref('dim_customers') }}
union all
-- Average order value
select
'Average Order Value - Last 30 Days' as metric_name,
'Orders' as metric_category,
round(avg(total), 2) as metric_value,
'currency' as metric_type
from {{ ref('fact_orders') }}
where order_date >= date_sub(current_date(), 30)
and status = 'completed'
union all
-- Orders last 30 days
select
'Orders - Last 30 Days' as metric_name,
'Orders' as metric_category,
count(*) as metric_value,
'count' as metric_type
from {{ ref('fact_orders') }}
where order_date >= date_sub(current_date(), 30)
and status = 'completed'
union all
-- Conversion rate
select
'Conversion Rate - Last 30 Days' as metric_name,
'Performance' as metric_category,
round(
count(distinct case when status = 'completed' then customer_id end) * 100.0 /
count(distinct customer_id),
2
) as metric_value,
'percentage' as metric_type
from {{ ref('fact_orders') }}
where order_date >= date_sub(current_date(), 30)
union all
-- Average customer lifetime value
select
'Average Customer LTV' as metric_name,
'Customers' as metric_category,
round(avg(lifetime_revenue), 2) as metric_value,
'currency' as metric_type
from {{ ref('customer_metrics') }}
where lifetime_orders > 0
union all
-- Top performing products
select
'Active Products' as metric_name,
'Products' as metric_category,
count(*) as metric_value,
'count' as metric_type
from {{ ref('dim_products') }}
where is_active = true
union all
-- Web traffic
select
'Website Sessions - Last 30 Days' as metric_name,
'Web Traffic' as metric_category,
count(distinct session_id) as metric_value,
'count' as metric_type
from {{ ref('fact_web_events') }}
where event_date >= date_sub(current_date(), 30)
)
select
metric_category,
metric_name,
metric_value,
metric_type,
current_timestamp() as metric_updated_at
from current_metrics
order by metric_category, metric_name
models/gold/metrics/top_performers.sql:{{
config(
materialized='table',
schema='gold'
)
}}
-- Top customers
with top_customers as (
select
'Top Customer' as category,
full_name as name,
lifetime_revenue as value,
1 as rank_order
from {{ ref('customer_metrics') }}
order by lifetime_revenue desc
limit 10
),
-- Top products
top_products as (
select
'Top Product' as category,
product_name as name,
revenue_30d as value,
row_number() over (order by revenue_30d desc) as rank_order
from {{ ref('product_performance') }}
limit 10
),
-- Top categories
top_categories as (
select
'Top Category' as category,
category as name,
revenue_30d as value,
row_number() over (order by revenue_30d desc) as rank_order
from {{ ref('category_performance') }}
limit 5
),
combined as (
select * from top_customers
union all
select * from top_products
union all
select * from top_categories
)
select
category,
rank_order,
name,
round(value, 2) as value,
current_timestamp() as metric_updated_at
from combined
order by category, rank_order
dbt run --models executive_summary top_performers
✅ CHECKPOINT
models/gold/schema.yml:version: 2
models:
# Metrics
- name: customer_metrics
description: "Comprehensive customer KPIs and scores"
columns:
- name: customer_id
tests:
- unique
- not_null
- name: product_performance
description: "Product sales and engagement metrics"
columns:
- name: product_id
tests:
- unique
- not_null
- name: category_performance
description: "Category-level aggregated metrics"
- name: executive_summary
description: "Key business metrics for executive dashboard"
- name: top_performers
description: "Top customers, products, and categories"
# Aggregations
- name: daily_revenue
description: "Daily revenue and order metrics with rolling averages"
columns:
- name: order_date
tests:
- unique
- not_null
- name: monthly_revenue
description: "Monthly revenue with growth calculations"
- name: weekly_web_activity
description: "Weekly web traffic and engagement metrics"
# Cohorts
- name: rfm_segments
description: "Customer RFM segmentation analysis"
columns:
- name: customer_id
tests:
- unique
- not_null
- name: rfm_segment_summary
description: "Aggregated statistics by RFM segment"
- name: customer_cohorts
description: "Customer retention by registration cohort"
- name: cohort_retention_summary
description: "Cohort retention rates over time"
dbt docs generate
dbt docs serve
✅ CHECKPOINT
gold_validation:# Databricks notebook source
# MAGIC %md
# MAGIC # Gold Layer Validation
# COMMAND ----------
print("=" * 70)
print("GOLD LAYER TABLES")
print("=" * 70)
tables = spark.sql("SHOW TABLES IN gold").collect()
for table in tables:
count = spark.sql(f"SELECT COUNT(*) FROM gold.{table.tableName}").collect()[0][0]
print(f"gold.{table.tableName:<30} {count:>12,} records")
print("=" * 70)
# COMMAND ----------
# MAGIC %md
# MAGIC ## Executive Summary
# COMMAND ----------
print("\n📊 EXECUTIVE DASHBOARD METRICS\n")
spark.sql("""
SELECT
metric_category,
metric_name,
metric_value,
metric_type
FROM gold.executive_summary
ORDER BY metric_category, metric_name
""").show(100, truncate=False)
# COMMAND ----------
# MAGIC %md
# MAGIC ## Top Performers
# COMMAND ----------
print("\n🏆 TOP 10 CUSTOMERS BY REVENUE\n")
spark.sql("""
SELECT name, value as lifetime_revenue
FROM gold.top_performers
WHERE category = 'Top Customer'
ORDER BY rank_order
""").show(10, truncate=False)
print("\n🏆 TOP 10 PRODUCTS (30 DAYS)\n")
spark.sql("""
SELECT name, value as revenue_30d
FROM gold.top_performers
WHERE category = 'Top Product'
ORDER BY rank_order
""").show(10, truncate=False)
# COMMAND ----------
# MAGIC %md
# MAGIC ## RFM Segments
# COMMAND ----------
print("\n📈 RFM SEGMENT DISTRIBUTION\n")
spark.sql("""
SELECT
rfm_segment,
customer_count,
pct_of_customers,
avg_monetary,
total_revenue,
pct_of_revenue
FROM gold.rfm_segment_summary
ORDER BY total_revenue DESC
""").show(truncate=False)
# COMMAND ----------
# MAGIC %md
# MAGIC ## Revenue Trends
# COMMAND ----------
print("\n📊 MONTHLY REVENUE TRENDS (LAST 12 MONTHS)\n")
spark.sql("""
SELECT
order_month,
total_orders,
gross_revenue,
avg_order_value,
revenue_mom_growth_pct,
revenue_yoy_growth_pct
FROM gold.monthly_revenue
ORDER BY order_month DESC
LIMIT 12
""").show(truncate=False)
# COMMAND ----------
# MAGIC %md
# MAGIC ## Cohort Analysis
# COMMAND ----------
print("\n👥 COHORT RETENTION SUMMARY\n")
spark.sql("""
SELECT
cohort_month,
cohort_size,
retention_month_1,
retention_month_3,
retention_month_6,
revenue_per_customer
FROM gold.cohort_retention_summary
ORDER BY cohort_month DESC
LIMIT 12
""").show(truncate=False)
# COMMAND ----------
# MAGIC %md
# MAGIC ## Product Performance
# COMMAND ----------
print("\n📦 TOP CATEGORIES BY REVENUE (30 DAYS)\n")
spark.sql("""
SELECT
category,
product_count,
revenue_30d,
profit_30d,
avg_profit_margin,
units_sold_30d
FROM gold.category_performance
ORDER BY revenue_30d DESC
""").show(truncate=False)
# COMMAND ----------
print("\n" + "=" * 70)
print("✅ GOLD LAYER VALIDATION COMPLETE")
print("=" * 70)
print("\nAll aggregations validated and ready for reporting!")
✅ CHECKPOINT
# Check status
git status
# Add all new files
git add dbt/models/gold/
git add databricks/notebook_exports/ # if exported
# Commit
git commit -m "Phase 5 complete: Gold layer analytics
- Created 15+ Gold layer aggregation tables
- Built customer metrics with quality scores
- Implemented RFM segmentation analysis
- Created cohort retention analysis
- Built product performance metrics
- Generated time-series aggregations (daily, weekly, monthly)
- Created executive dashboard summaries
- All metrics validated and documented"
# Push to GitHub
git push origin main
✅ CHECKPOINT
✅ Customer Analytics (3 models)
customer_metrics - 10,000 customers with 25+ KPIsrfm_segments - Behavioral segmentationrfm_segment_summary - Segment performance stats✅ Product Analytics (2 models)
product_performance - 500 products with health scorescategory_performance - Category aggregations✅ Time-Series Aggregations (3 models)
daily_revenue - Daily metrics with rolling averagesmonthly_revenue - Monthly trends with MoM/YoY growthweekly_web_activity - Web traffic patterns✅ Cohort Analysis (2 models)
customer_cohorts - Retention by registration monthcohort_retention_summary - Cohort performance summary✅ Executive Dashboards (2 models)
executive_summary - 10 key business metricstop_performers - Top customers, products, categories✅ Total: 15 Gold layer tables with 360,500+ aggregated records
Customer Metrics:
Product Metrics:
Business KPIs:
In Phase 6, you will:
Estimated Time: 4-6 hours over Days 15-16
Issue: dbt run takes too long
Solution: Increase Databricks cluster size or use incremental models
Issue: Aggregations don't match source data
Solution: Check join conditions and date filters, ensure completed orders only
Issue: RFM scores seem incorrect
Solution: Verify quartile calculations, check for null values in recency/frequency/monetary
Issue: Cohort retention rates > 100%
Solution: Check cohort_size calculation, ensure unique customer counts
Issue: Executive summary shows nulls
Solution: Verify all source tables have data for date ranges used
Query executive summary:
SELECT * FROM gold.executive_summary
ORDER BY metric_category;
Get RFM Champions:
SELECT * FROM gold.rfm_segments
WHERE rfm_segment = 'Champions'
ORDER BY monetary DESC
LIMIT 100;
Monthly revenue trend:
SELECT
order_month,
gross_revenue,
revenue_mom_growth_pct
FROM gold.monthly_revenue
ORDER BY order_month DESC
LIMIT 12;
Top products this month:
SELECT
product_name,
category,
revenue_30d,
qty_sold_30d
FROM gold.product_performance
WHERE revenue_30d > 0
ORDER BY revenue_30d DESC
LIMIT 20;
Phase 5 Manual Version 1.0
Last Updated: 2025-01-01