.png)
Data quality has long been the non-negotiable starting point for wealth management. It’s still true now, but the bar has moved. The question is no longer “can we produce a clean report?” It’s “can our data foundation stand up under an agent?” Can a Claude-powered copilot, an MCP-connected agent, or an LLM-driven advisor assistant pull the right facts, at the right granularity, with the right governance, in real time?
That question changes everything about how we store, manage, and serve data. In this article, I’ll walk through how I think about data warehouses, data lakes, and data lakehouses today, why the medallion architecture on a lakehouse has become our default recommendation, and how structured and unstructured data together form one of the four strategic pillars of any serious Data and AI program.
The Data Foundation is a Strategic Pillar, not a Project
We build data and AI programs on four pillars: Data Foundation, Governance, AI Architecture and Context Management, and Observability. All four carry equal load — pull any one of them out, and the program falls over. This article focuses on the data foundation because that foundation must handle two things simultaneously:
- Structured business-activity data — custodial positions, transactions, fee schedules, CRM interactions, financial plans, performance reporting.
- Unstructured data — meeting notes, call transcripts, email, IPS and planning PDFs, research, disclosures, and the long tail of documents advisors actually use every day.
Historically, firms stored these in different places, governed them differently, and hoped the insights would stitch themselves together. They never did. A modern lakehouse, combined with vector knowledge bases and a knowledge graph layer, lets us put both under one governed roof — and that is the unlock for every agentic use case downstream.
What is a Data Warehouse?
The data warehouse is still the tool I reach for when the problem is well-defined, structured, and reporting-driven. ETL pipelines move data out of custodial and CRM systems, conform it to a relational schema, and pre-aggregate it for BI, client reporting, and regulatory submissions. It is fast to query, easy to govern, and boards understand it.
Where the warehouse breaks down is everything that doesn't fit a table — meeting notes, PDFs, call recordings, the emails your advisors send every day. It also breaks down when the shape of your questions starts changing weekly, which is exactly what happens once you introduce AI.

What is a Data Lake?
A data lake solves the opposite problem: it stores everything, in whatever shape it arrives. Custodial files, portfolio snapshots, CRM extracts, call transcripts, PDFs, images, parquet, JSON — all of it, cheap and scalable. That flexibility is valuable for data science exploration and for training or fine-tuning.
But a raw lake without governance is a swamp. The last five years taught us that "schema-on-read" is not a free lunch — without cataloging, lineage, and data-quality contracts, nobody trusts what they pull out. For an RIA or wealth firm, that is a non-starter. We have fiduciary and supervisory obligations; we cannot run the business on ambiguous data.

What is a Data Lakehouse?
The lakehouse is where my default recommendation sits in 2026. Open table formats like Delta Lake, Apache Iceberg, and Apache Hudi give us ACID transactions, time travel, schema evolution, and governed metadata directly on object storage. We get the economics and flexibility of a lake and the reliability and performance of a warehouse in a single substrate.
More importantly for wealth management, the lakehouse gives us a single, governed home for structured transactional data and unstructured content — which is exactly what an AI agent needs. A Claude-powered advisor copilot doesn't care whether the fact it needs lives in a Delta table or a PDF. It cares that the fact is trustworthy, retrievable, and permissioned correctly.
The Medallion Architecture: Bronze, Silver, Gold
On top of a lakehouse, I run a medallion architecture — Bronze, Silver, Gold — because it gives us a disciplined way to separate raw truth from curated data from business-ready data products. Firms must answer, for any number on any dashboard, "which Gold product did this come from, which Silver tables fed it, and which Bronze ingest is its source of record?" Medallion makes that trivial.

Unstructured data belongs in the same medallion. Raw documents land in Bronze, chunked and enriched with metadata in Silver, and exposed as retrieval-ready vector knowledge bases in Gold. The Gold layer is where we publish data products with owners and SLAs — not just tables. This is where domain-driven data mesh thinking matters: advisor productivity, client experience, risk, and finance each own their Gold products.
Structured + Unstructured: One Foundation, One Governance
Wealth management firms are sitting on a pile of unstructured gold: years of meeting notes, client correspondence, planning documents, research, disclosures, and call transcripts. That content is where the art of the relationship lives. If it stays on file shares and inboxes, we never capture it.
The pattern that works:
- Land all unstructured content in the lakehouse Bronze layer alongside structured feeds so lineage, retention, and legal hold are governed in one place.
- Chunk, embed, and enrich in Silver — with entity extraction against your mastered CRM and household IDs so a note about "the Thompson household" is linked to the real client record.
- Expose Gold vector knowledge bases behind retrieval and agent endpoints. Access is mediated by your existing entitlements — the same rules that govern whether an advisor can see an account apply to whether an agent can surface a note about it.
- Build a knowledge graph on top — clients, households, accounts, entities, products, advisors, documents. The graph is what I call the "street map" agents use to navigate your firm. Vector search tells you what's similar; the graph tells you what's connected.
This is the difference between "we bought a chatbot" and "we have an AI foundation." The chatbot answers questions. The foundation is how the next five agents you build actually work.
Governance, Observability, and Why the Foundation Has to Be Opinionated
A lakehouse plus medallion plus vector knowledge bases is only useful if we can prove what it did, to whom, with what data, at what time. In wealth management, that is not a nice-to-have; it is table stakes.
- Data mesh with federated governance — domain teams own Gold products; central policy-as-code enforces PII, retention, and entitlements.
- Model risk management aligned to SR 11-7 — model inventory, validation, documentation, and ongoing monitoring for every model that touches a financial decision.
- Audit logging aligned to SEC Rule 204-2 — every query, every agent action, every override captured and retrievable.
- Real-time observability across all four pillars — data quality, model drift, hallucination detection, and self-healing pipelines. If we cannot see it, we cannot run it in production.
Choosing Your Architecture: Launch, Accelerate, and Differentiate
The best architecture is the one your firm can operate today, with a clear path to where you need to be. We use a Launch → Accelerate → Differentiate approach:
- Launch. Stand up a lakehouse-lite — custodial plus CRM aggregation with early vector stores, PII redaction, a model inventory, and basic prompt filtering. Simple AI assistants and chatbots on top. Prove the pattern before you scale it.
- Accelerate. Expand to a full lakehouse hub — custody, accounts, CRM, portfolio management, financial planning, advice, and unstructured knowledge bases — with an AI committee, DQ and lineage, structured monitoring, copilots, and context-engineered RAG.
- Differentiate. Mature the lakehouse, layer in knowledge graphs and real-time integration, deploy Graph RAG, LoRA fine-tuning, and MCP-driven agentic workflows under full model risk management, ethics review, and client disclosure.
Whichever stage you are in, the north star is the same: one governed home for structured and unstructured data, medallion discipline from Bronze through Gold, vector and graph layers for agents, and observability on everything. That is what turns data from a cost center into a compounding strategic asset — and what separates the firms investing in AI to grow from the ones still running trials.