In May 2024 HUD issued guidance on AI use in tenant screening and broader housing decisions, putting AI vendors and operators on notice that the Fair Housing Act applies whether the discrimination is intentional or emerges from a model. The same logic generalizes to advertising AI, lead qualification, and any tool that affects housing access. Disparate impact across protected classes (race, color, national origin, religion, sex, familial status, disability) creates liability whether you wrote the rule or your model learned it.

This is the engineering translation. It covers the regulatory layer that real estate AI products operate under, the disparate-impact testing every build needs, the data licensing layer (MLS rules, increasingly AI-specific), and the reference architecture that holds up to a state real estate commission inquiry or a HUD complaint.

For the wider Real Estate cluster, see the pillar, the hallucination spoke, the agent copilot build walkthrough, and the eval spoke.

The regulatory layer

Five regulatory frameworks apply to real estate AI in the US in 2026.

Fair Housing Act (FHA). Prohibits discrimination in housing transactions on the basis of race, color, religion, sex, disability, familial status, or national origin. AI tools that affect housing access (advertising targeting, lead qualification, tenant screening, mortgage decisioning) fall under FHA. Disparate impact applies. A facially neutral model that produces disproportionate outcomes for protected classes is a violation, even without discriminatory intent.

HUD AI guidance (May 2024). Specifically addressed AI in tenant screening but the framework generalizes. Builders are responsible for testing their models for disparate impact. "We did not intend to discriminate" is not a defense; "we tested for disparate impact and found none" is.

RESPA (Real Estate Settlement Procedures Act). Governs settlement service referrals. Section 8 prohibits kickbacks for referrals of settlement services (lenders, title insurance, inspectors). AI recommendations that route consumers to specific service providers in exchange for compensation fall under RESPA. The structure has to be defensible as a legitimate business arrangement, not as a kickback dressed up in algorithm form.

Equal Credit Opportunity Act (ECOA) for AI in lending. AI in mortgage and lending decisions is subject to ECOA's adverse action notice requirements and fair-lending testing. The CFPB's 2023 guidance on AI in lending decisions lays this out; the 2025-2026 enforcement environment has been active.

State licensing and consumer protection. Real estate is regulated state by state. State real estate commissions have started issuing AI-specific guidance, and several state attorneys general have signaled increased focus on real estate AI in advertising and tenant screening.

Where real estate AI products typically run into trouble

Six failure modes recur across products that end up in complaints or enforcement actions.

Lead scoring that proxies for protected class. A model trained to predict "high-quality leads" based on historical conversion data picks up on zip code, name, language, or other features that correlate with race or national origin. The model produces disparate scores across protected classes without any explicit reference to those classes. This is the canonical disparate-impact failure.

Advertising audience selection. Targeting models that deliver housing ads disproportionately to certain demographics, even without explicit demographic targeting, are a documented FHA violation pattern. Meta's 2022 settlement with HUD on housing ad targeting set the precedent.

Tenant screening models. Eviction history, criminal history, and credit history models all have well-documented disparate impact across race. Using them as input features without disparate-impact testing creates liability.

Steering through search and recommendations. A search experience that surfaces different listings to different demographics can constitute steering under FHA. Personalization based on inferred demographics is the riskiest pattern; personalization based on stated preferences is safer but still requires testing.

Disparate language quality. AI-generated property descriptions, agent communications, or chat responses that are systematically lower quality for non-English speakers or AAVE-using users create access disparities. Same content, lower quality service.

RESPA-violating recommendations. AI tools that recommend specific lenders, title insurers, or inspectors in exchange for compensation, without proper disclosure and structure, fall under RESPA Section 8. Re-skinning a kickback as an "AI recommendation" does not change the analysis.

Disparate-impact testing for real estate AI

The engineering practice that catches these failures before they reach complaints.

Step 1: identify the affected populations. For your specific product, what protected classes might be affected? Tenant screening: all FHA-protected classes. Advertising: same. Lead scoring: same plus any class your dataset can proxy. Be specific about what disparities you are testing for.

Step 2: build the test set. Take a population of cases (real or carefully constructed synthetic) where the protected class varies but the underlying clinical or business facts are otherwise identical. For real estate, "identical" is harder than in healthcare because the data have geographic and demographic correlations baked in. Use careful matching methods (propensity scores, paired designs).

Step 3: measure the rate of adverse outcomes per protected class. "Adverse outcome" is product-specific: rejection in tenant screening, low score in lead qualification, missing recommendation in search.

Step 4: compute the disparate impact ratio. The standard threshold is the four-fifths rule: if the rate of adverse outcomes for any protected class is less than 80% of the rate for the most-favored class, that is presumptive disparate impact. The four-fifths rule is a regulatory floor, not a target; production-grade products should aim higher.

Step 5: document the testing. Disparate-impact testing that is not documented does not exist for regulatory purposes. Versioned test results, methodology notes, and remediation actions live in the same observability layer as your regular evals.

from respan import Respan
 
client = Respan(api_key=os.environ["RESPAN_API_KEY"])
 
@client.eval(name="lead-scoring-disparate-impact")
def lead_scoring_disparate_impact(model, test_set):
    results_by_class = {}
    for protected_class in ["race", "national_origin", "familial_status"]:
        results_by_class[protected_class] = {}
        for value in test_set.classes[protected_class]:
            cases = test_set.filter(protected_class=value)
            scores = [model.score(case) for case in cases]
            results_by_class[protected_class][value] = {
                "rejection_rate": sum(s < threshold for s in scores) / len(scores),
                "mean_score": mean(scores),
                "n": len(scores),
            }
        # Four-fifths rule check
        max_rate = max(c["rejection_rate"] for c in results_by_class[protected_class].values())
        for value, c in results_by_class[protected_class].items():
            ratio = c["rejection_rate"] / max_rate if max_rate > 0 else 1.0
            if ratio < 0.80:
                c["four_fifths_violation"] = True
    return results_by_class

The eval runs on every model or prompt change. A four-fifths violation blocks deploy, not just notifies.

MLS data licensing and AI

A practical layer that catches builders by surprise. Multiple Listing Services license data with specific restrictions, and most major MLSes added AI-specific clauses to participation agreements in 2024-2026.

The clauses to read carefully:

Use restrictions on training data. Most MLSes now prohibit using MLS data for foundation model training without separate licensing. RAG over MLS data (retrieval at inference time) is generally allowed; baking it into model weights is generally not.
Attribution requirements. AI-generated descriptions on top of IDX data still require attribution and refresh cadences.
Display and refresh rules. IDX data has to refresh on a specific cadence; AI-generated content layered on stale IDX data can be a rules violation.
Sub-licensing prohibitions. Most MLS licenses do not allow you to share MLS data with third-party LLM providers without specific approval. This affects your model-routing architecture: if Anthropic processes a prompt containing MLS data, that may or may not be allowed under your license. Check.

The compliance pattern: separate MLS data from your other data sources; tag every MLS-derived field; route MLS-tagged content only to providers with the right contractual coverage.

Architectural patterns that hold up

A real estate AI product that holds up to a state commission inquiry shares these patterns.

Tenant access decisions go through human review. AI scores are inputs to a human decision, not the decision itself. The architecture requires explicit human approval before adverse action. This is also where state laws (California SB 1120-style) require human-in-the-loop in payer decisions, and similar logic is starting to appear in housing.

Adverse action notices are auto-generated and complete. When an AI score contributes to an adverse action (lease denial, application rejection), the notice generated cites the specific factors and provides the disclosures ECOA and FHA require.

Audit logs include the full decision context. What model version, what prompt, what data sources, what score, what human approval. Six years of retention minimum.

Disparate-impact eval runs in CI. Every model or prompt change runs the disparate-impact suite. Four-fifths rule violations block deploy.

Demographic data, when collected, is carefully bounded. ECOA-required demographic data for fair-lending testing is collected separately from the underwriting decision flow and not exposed to the underwriting model. The two pipelines stay isolated.

Vendor questions hospitals would ask, adapted for housing

If you are selling real estate AI into a major brokerage, REIT, or property management group in 2026, the procurement security and compliance review will ask roughly these questions. Have answers ready.

What is your disparate-impact testing methodology and what classes do you test for?
Show us the four-fifths analysis for your most recent model release.
How do you handle FHA-protected class data? Is it ever exposed to the model?
What is your MLS data licensing posture? Can we see the AI-specific clauses?
For lending or screening adjacent products: what is your ECOA adverse action workflow?
How does the human-in-the-loop decision gate work? Show us the audit log.
What is your breach notification SLA?
Where is the data stored? Does any of our MLS data flow to providers we have not approved?

Building these in is meaningfully cheaper than retrofitting them. Compliance is the architecture, not a checklist you add at the end.

How Respan fits

Fair housing exposure does not surface in unit tests. It surfaces in the aggregate behavior of your model across protected classes, in the audit trail you can produce six months later, and in the model-routing decisions that govern which provider sees MLS data. Respan gives real estate AI builders the observability and control plane that turns disparate-impact testing from a quarterly project into a continuous deploy gate.

Tracing: every housing decision captured as one connected trace. Auto-instrumented for LangChain, LlamaIndex, Vercel AI SDK, CrewAI, AutoGen, OpenAI Agents SDK. Lead-scoring runs, tenant-screening flows, ad-targeting calls, and adverse-action notice generation all show up as a single trace with the model version, prompt, retrieved MLS context, score, and human approval step linked together.
Evals: ten built-in evaluators (faithfulness, citation accuracy, refusal correctness, harmfulness) plus LLM-as-judge and custom Python evaluators. Production traffic flows directly into datasets. CI-aware experiments block regressions on four-fifths-rule violations, lead-score disparities across protected classes, and steering patterns in search rankings before deploys ship.
Gateway: 500+ models behind an OpenAI-compatible interface, semantic caching, fallback chains, per-customer spending caps. MLS-tagged content routes only to providers your participation agreement covers, and PII redaction strips applicant identifiers before they reach any third-party model.
Prompt management: versioned registry, dev/staging/prod environments with approval workflows, A/B testing in production with one-click rollback. Tenant-screening prompts, lead-qualification rubrics, and property-description templates each live as versioned artifacts so legal review pins to a specific hash and rollbacks happen without a redeploy.
Monitors and alerts: rejection-rate disparity by protected class, four-fifths ratio drift, MLS data flowing to unapproved providers, adverse-action notice completeness, ad-delivery skew across demographics. Slack, email, PagerDuty, webhook. The team that owns fair-housing risk gets paged on the same channel as the team that owns latency.

A reasonable starter loop for fair-housing-aware real estate AI builders:

Instrument every LLM call with Respan tracing including lead-scoring spans, retrieval-of-MLS-context spans, adverse-action-notice spans, and human-approval gate spans.
Pull 200 to 500 production lead-scoring and tenant-screening records into a dataset and label them for protected-class proxy leakage, four-fifths compliance, and adverse-action notice completeness.
Wire two or three evaluators that catch the failure modes you most fear (lead-score disparate impact across race, steering in search rankings, MLS data leaking to unapproved providers).
Put your tenant-screening and lead-qualification prompts behind the registry so you can version, A/B, and roll back without a deploy.
Route through the gateway so MLS-tagged content only reaches providers covered by your participation agreement and applicant PII gets redacted before it leaves your perimeter.

The point is not to add a compliance layer on top of your product; it is to make fair-housing testing a normal part of how your team ships, the same way latency and cost already are.

To wire any of the patterns above on Respan, start tracing for free, read the docs, or talk to us.

CTA

To wire the gateway redaction, audit logging, and disparate-impact testing on Respan, start tracing for free, read the docs, or talk to us. For the rest of the Real Estate cluster: the pillar, the hallucination spoke, the agent copilot build walkthrough, and the eval spoke.

FAQ

Does the FHA apply if I am not making housing decisions, just helping agents market? Yes, in many cases. FHA prohibits discrimination in advertising and marketing of housing, not just in the final transaction decision. AI that targets ads disproportionately or generates descriptions of different quality for different demographics can create FHA exposure.

Can I use a feature like "credit score" in my model? You can, but you have to test for and mitigate the disparate impact. Credit score has well-documented racial disparities. Using it without disparate-impact testing creates liability.

What's the four-fifths rule? The standard regulatory threshold for disparate impact: if the rate of adverse outcomes for any protected class is less than 80% of the rate for the most-favored class, that is presumptive disparate impact. It is a floor, not a target.

My MLS license is from 2018. Are AI clauses retroactive? Most MLSes have updated participation agreements with AI-specific clauses since 2024 and require participants to re-execute. Check whether you have signed an updated agreement. Operating under a 2018 agreement that has been superseded creates contractual exposure.

Is RESPA only about lenders? No. RESPA covers a broad set of settlement service providers including title insurance, inspectors, appraisers, attorneys, escrow agents. AI tools that recommend any of these in exchange for compensation fall under RESPA Section 8.

For the wider Real Estate cluster, see the pillar, the hallucination spoke, the agent copilot build walkthrough, and the eval spoke.

The regulatory layer

Five regulatory frameworks apply to real estate AI in the US in 2026.

Where real estate AI products typically run into trouble

Six failure modes recur across products that end up in complaints or enforcement actions.

Disparate-impact testing for real estate AI

The engineering practice that catches these failures before they reach complaints.

from respan import Respan
 
client = Respan(api_key=os.environ["RESPAN_API_KEY"])
 
@client.eval(name="lead-scoring-disparate-impact")
def lead_scoring_disparate_impact(model, test_set):
    results_by_class = {}
    for protected_class in ["race", "national_origin", "familial_status"]:
        results_by_class[protected_class] = {}
        for value in test_set.classes[protected_class]:
            cases = test_set.filter(protected_class=value)
            scores = [model.score(case) for case in cases]
            results_by_class[protected_class][value] = {
                "rejection_rate": sum(s < threshold for s in scores) / len(scores),
                "mean_score": mean(scores),
                "n": len(scores),
            }
        # Four-fifths rule check
        max_rate = max(c["rejection_rate"] for c in results_by_class[protected_class].values())
        for value, c in results_by_class[protected_class].items():
            ratio = c["rejection_rate"] / max_rate if max_rate > 0 else 1.0
            if ratio < 0.80:
                c["four_fifths_violation"] = True
    return results_by_class

The eval runs on every model or prompt change. A four-fifths violation blocks deploy, not just notifies.

MLS data licensing and AI

The clauses to read carefully:

Use restrictions on training data. Most MLSes now prohibit using MLS data for foundation model training without separate licensing. RAG over MLS data (retrieval at inference time) is generally allowed; baking it into model weights is generally not.
Attribution requirements. AI-generated descriptions on top of IDX data still require attribution and refresh cadences.
Display and refresh rules. IDX data has to refresh on a specific cadence; AI-generated content layered on stale IDX data can be a rules violation.
Sub-licensing prohibitions. Most MLS licenses do not allow you to share MLS data with third-party LLM providers without specific approval. This affects your model-routing architecture: if Anthropic processes a prompt containing MLS data, that may or may not be allowed under your license. Check.

The compliance pattern: separate MLS data from your other data sources; tag every MLS-derived field; route MLS-tagged content only to providers with the right contractual coverage.

Architectural patterns that hold up

A real estate AI product that holds up to a state commission inquiry shares these patterns.

Audit logs include the full decision context. What model version, what prompt, what data sources, what score, what human approval. Six years of retention minimum.

Disparate-impact eval runs in CI. Every model or prompt change runs the disparate-impact suite. Four-fifths rule violations block deploy.

Vendor questions hospitals would ask, adapted for housing

What is your disparate-impact testing methodology and what classes do you test for?
Show us the four-fifths analysis for your most recent model release.
How do you handle FHA-protected class data? Is it ever exposed to the model?
What is your MLS data licensing posture? Can we see the AI-specific clauses?
For lending or screening adjacent products: what is your ECOA adverse action workflow?
How does the human-in-the-loop decision gate work? Show us the audit log.
What is your breach notification SLA?
Where is the data stored? Does any of our MLS data flow to providers we have not approved?

Building these in is meaningfully cheaper than retrofitting them. Compliance is the architecture, not a checklist you add at the end.

How Respan fits

Tracing: every housing decision captured as one connected trace. Auto-instrumented for LangChain, LlamaIndex, Vercel AI SDK, CrewAI, AutoGen, OpenAI Agents SDK. Lead-scoring runs, tenant-screening flows, ad-targeting calls, and adverse-action notice generation all show up as a single trace with the model version, prompt, retrieved MLS context, score, and human approval step linked together.
Evals: ten built-in evaluators (faithfulness, citation accuracy, refusal correctness, harmfulness) plus LLM-as-judge and custom Python evaluators. Production traffic flows directly into datasets. CI-aware experiments block regressions on four-fifths-rule violations, lead-score disparities across protected classes, and steering patterns in search rankings before deploys ship.
Gateway: 500+ models behind an OpenAI-compatible interface, semantic caching, fallback chains, per-customer spending caps. MLS-tagged content routes only to providers your participation agreement covers, and PII redaction strips applicant identifiers before they reach any third-party model.
Prompt management: versioned registry, dev/staging/prod environments with approval workflows, A/B testing in production with one-click rollback. Tenant-screening prompts, lead-qualification rubrics, and property-description templates each live as versioned artifacts so legal review pins to a specific hash and rollbacks happen without a redeploy.
Monitors and alerts: rejection-rate disparity by protected class, four-fifths ratio drift, MLS data flowing to unapproved providers, adverse-action notice completeness, ad-delivery skew across demographics. Slack, email, PagerDuty, webhook. The team that owns fair-housing risk gets paged on the same channel as the team that owns latency.

A reasonable starter loop for fair-housing-aware real estate AI builders:

Instrument every LLM call with Respan tracing including lead-scoring spans, retrieval-of-MLS-context spans, adverse-action-notice spans, and human-approval gate spans.
Pull 200 to 500 production lead-scoring and tenant-screening records into a dataset and label them for protected-class proxy leakage, four-fifths compliance, and adverse-action notice completeness.
Wire two or three evaluators that catch the failure modes you most fear (lead-score disparate impact across race, steering in search rankings, MLS data leaking to unapproved providers).
Put your tenant-screening and lead-qualification prompts behind the registry so you can version, A/B, and roll back without a deploy.
Route through the gateway so MLS-tagged content only reaches providers covered by your participation agreement and applicant PII gets redacted before it leaves your perimeter.

The point is not to add a compliance layer on top of your product; it is to make fair-housing testing a normal part of how your team ships, the same way latency and cost already are.

To wire any of the patterns above on Respan, start tracing for free, read the docs, or talk to us.

Fair Housing and AI in Real Estate

The regulatory layer

Where real estate AI products typically run into trouble

Disparate-impact testing for real estate AI

MLS data licensing and AI

Architectural patterns that hold up

Vendor questions hospitals would ask, adapted for housing

How Respan fits

CTA

FAQ

Built for AI agents.
Break less.
Ship more.

Fair Housing and AI in Real Estate

The regulatory layer

Where real estate AI products typically run into trouble

Disparate-impact testing for real estate AI

MLS data licensing and AI

Architectural patterns that hold up

Vendor questions hospitals would ask, adapted for housing

How Respan fits

CTA

FAQ

Built for AI agents.
Break less.
Ship more.

Fair Housing and AI in Real Estate

The regulatory layer

Where real estate AI products typically run into trouble

Disparate-impact testing for real estate AI

MLS data licensing and AI

Architectural patterns that hold up

Vendor questions hospitals would ask, adapted for housing

How Respan fits

CTA

FAQ

Built for AI agents. Break less. Ship more.

Fair Housing and AI in Real Estate

The regulatory layer

Where real estate AI products typically run into trouble

Disparate-impact testing for real estate AI

MLS data licensing and AI

Architectural patterns that hold up

Vendor questions hospitals would ask, adapted for housing

How Respan fits

CTA

FAQ

Built for AI agents. Break less. Ship more.

Built for AI agents.
Break less.
Ship more.

Built for AI agents.
Break less.
Ship more.