0004. Multi-tenancy strategy
- Status: Accepted
- Date: 2026-04-25
- Deciders: @sassy-solutions/maintainers
Context
Compendium is SaaS-first: every domain operation must know which tenant it acts on, and a request from tenant A must never read or write data belonging to tenant B. The tenant identifier shows up in several places that a real request can carry simultaneously:
- An explicit header (
X-Tenant-ID) set by an API gateway or SDK. - A subdomain (
acme.example.com) used for branded login and marketing routes. - One or more JWT claims — Zitadel exposes
urn:zitadel:iam:org:id; generic OIDC providers useorg_idortenant_id.
A real request can carry several of these at once (e.g. JWT + subdomain on a tenant-branded portal). If they disagree, the request is either misconfigured or actively malicious — and we want a default that fails closed.
The decision lives at the framework layer because every consumer needs the same guarantees. Doing it per-app would mean each consumer re-invents tenant resolution (and at least one would get it subtly wrong).
Decision
The framework provides Compendium.Multitenancy with these moving parts:
TenantContext— a small immutable record carryingTenantIdplus its provenance (which source produced it). Registered as a scoped service; one per request.TenantResolver/JwtClaimTenantResolver— pluggable resolvers that read the tenant from a specific source (header, subdomain, JWT). Multiple resolvers run per request.TenantConsistencyValidator(ITenantConsistencyValidator) — runs after resolution and rejects the request if two resolvers produced different non-empty tenant IDs. Default policy: fail closed — any disagreement → reject.TenantContextAccessor— the only sanctioned read path for downstream code (handlers, repositories, adapters).TenantPropagatingDelegatingHandler— propagates the resolved tenant on outbound HTTP calls, so adapter calls (Stripe, Listmonk, …) keep tenant context.TenantIsolation— guard rails used by the PostgreSQL adapter to scope every query to the resolved tenant.
The framework does not prescribe a physical isolation model (shared DB vs. DB-per-tenant vs. RLS). It enforces logical tenant scoping at every layer it owns; consumers choose physical isolation when they wire the adapter.
Consequences
Positive
- Cross-tenant access is impossible by default: any code path that forgets to scope a query reads from
TenantContext, andTenantContextis request-scoped and validated. - Multi-source consistency catches misconfigured gateways (e.g. JWT for tenant A but
X-Tenant-IDfor tenant B) and trivial token-replay attempts. - Tenant context is propagated to outbound calls, so audit logs and third-party requests carry the correct identity.
- Consumers stay free to pick the physical isolation model that fits their compliance posture.
Negative / Trade-offs
- Detection-side attack surface. The validator itself is now a security-critical component: a bug that returns "consistent" when sources disagree silently breaks isolation. Mitigated with focused unit tests and contract tests in
tests/Architecture. - Failure mode is loud. A configuration drift between gateway and IdP rejects every request from that path. We consider this correct behaviour, but it means careful rollout for the multi-source policy.
- Resolver order matters subtly. When only one source is present, that source wins; this is by design but documented because it surprises newcomers.
- Header-based resolvers must only be trusted when set by an authenticated gateway — explicit warning in
CONTRIBUTING.md.
Alternatives considered
- Single source only (header). Rejected — fine for trusted-gateway deployments, but doesn't catch the "JWT and gateway disagree" class of bugs, and rules out branded-subdomain login flows.
- Database-per-tenant as the only model. Rejected — strong isolation but operationally heavy (migrations × tenants), and the framework should not force that cost on small consumers.
- Postgres Row-Level Security as the only mechanism. Rejected as the sole mechanism — RLS is an excellent backstop and consumers can use it, but framework-level scoping is needed for non-Postgres adapters (Redis, third-party APIs).
- No framework support; let consumers solve it. Rejected — guarantees inconsistency across consumers and recreates the very class of bugs the framework exists to prevent.