In the first four months of 2026, the legal environment around AI hiring tools changed in three ways that matter for engineering teams.
On January 20, 2026, two California job applicants filed a class action against Eightfold AI in the Northern District of California. The complaint, filed by Outten & Golden and Towards Justice with former EEOC Chair Jenny Yang as co-counsel, alleges that Eightfold's Match Score is a "consumer report" under the federal Fair Credit Reporting Act, that Eightfold operates as an unregistered consumer reporting agency, and that the company violated FCRA by furnishing reports without the disclosures, certifications, and dispute mechanisms the statute has required since 1970. The complaint targets a platform used by Microsoft, PayPal, Morgan Stanley, Starbucks, Chevron, and Bayer. FCRA provides statutory damages of $100 to $1,000 per willful violation; with class mechanisms over a database the complaint describes as covering more than a billion profiles, the arithmetic gets uncomfortable fast.
In May 2025, Judge Rita Lin of the Northern District of California granted preliminary collective certification in Mobley v. Workday, allowing applicants over 40 nationwide to opt into a disparate-impact age discrimination case against the AI hiring vendor. The opt-in period ran through March 7, 2026. The court had earlier accepted, in a July 2024 ruling, that an AI vendor could be directly liable under federal anti-discrimination law as an "agent" of the employers using its tools, rather than as a mere software provider.
In December 2025, the New York State Comptroller released an audit of NYC Department of Consumer and Worker Protection's enforcement of Local Law 144, the city's bias audit law for Automated Employment Decision Tools. The audit found that 75% of test calls to the 311 hotline about AEDT issues were misrouted, that DCWP's review of 32 companies surfaced 1 violation while the auditor's review of the same companies surfaced 17, and that the agency's complaint-driven enforcement approach was structurally ineffective. DCWP committed to operational fixes. Enforcement scrutiny is shifting upward.
For engineering teams building AI recruiting and hiring tools, these three developments are not abstract. Mobley extends vendor liability under existing anti-discrimination law. Eightfold extends consumer-protection law to scoring outputs. The LL144 audit signals that bias-audit compliance is moving from posture to enforcement. A product built without the controls these regimes assume gets caught at procurement, in court, or both.
This post is the engineering translation. It covers what consumer-report status means for a Match Score, what disparate-impact liability means for a recommendation system, what LL144 actually requires once enforcement gets serious, and the specific things to ship to be defensible under all three.
What "consumer report" status means in practice
The Eightfold complaint hinges on three FCRA elements. Eightfold qualifies as a consumer reporting agency because it assembles and evaluates consumer information for fees and furnishes the resulting reports to third parties. The Match Scores, talent profiles, and rankings constitute consumer reports because they bear on a candidate's character, reputation, and qualifications and are used to determine eligibility for employment. The reports are used for an "employment purpose" because employers rely on them to evaluate candidates for hiring.
If a court accepts this theory, the engineering implications are concrete. A platform deemed an FCRA consumer reporting agency must:
| FCRA obligation | What it requires technically |
|---|---|
| Permissible purpose verification | Capture and store employer certifications confirming legitimate employment use before furnishing any score |
| Notice and authorization | Surface to candidates that a "consumer report" is being prepared, get written authorization before scoring |
| Maximum possible accuracy | Maintain procedures to ensure information used in scoring is accurate, with documented testing and validation |
| File access rights | On request, provide the candidate a copy of the report including all data points and inferences |
| Dispute mechanism | Reinvestigate disputed information within statutory timeframes (30 days), correct inaccurate data, document the resolution |
| Adverse action notification | Pre-adverse action: provide a copy of the report and a summary of rights before the negative employment decision is final. Post-adverse action: notify the candidate after the decision and provide rights to dispute |
| Recordkeeping | Retain consumer report data for the FCRA-mandated period, with audit-ready logs |
The single hardest implementation point is the dispute mechanism. A platform that cannot show a candidate exactly which data points and which inferences contributed to their Match Score cannot meaningfully comply with the FCRA's dispute right. Engineering this after the fact is much harder than building it in. The score and every input that contributed to it need to be queryable, attributable, and reversible per candidate, with the ability to recompute the score after a successful dispute.
The CFPB's 2024 Circular (2024-06) on background dossiers and algorithmic scores, while rescinded in 2025, established the regulatory framing the Eightfold plaintiffs adopt: an algorithmic score that synthesizes information about a worker can constitute a consumer report under existing law, regardless of whether the underlying technology was anticipated when the FCRA was written in 1970. Rescinding guidance does not change the statute. The plaintiffs' theory survives the rescission.
What Mobley v. Workday does to vendor liability
Mobley shifts the question of who is responsible for AI hiring discrimination. Workday's defense was the standard vendor argument: we provide software that implements employer-defined criteria, the employer makes the hiring decision, the employer is the responsible party under Title VII and the ADEA. Judge Lin rejected that framing on motion to dismiss, finding the complaint plausibly alleged that Workday's recommendation system "is not simply implementing in a rote way the criteria that employers set forth, but is instead participating in the decision-making process by recommending some candidates to move forward and rejecting others."
That framing matters because it brings the vendor inside the scope of federal anti-discrimination law. If your AI recruiting platform's recommendations meaningfully shape who gets advanced and who gets rejected, the vendor and the employer are both potentially liable for any disparate impact. Disclaimers in vendor agreements that "the employer makes the final decision" carry less weight when the system's architecture predictably produces discriminatory outcomes and the employer relies on those outcomes to filter their candidate pool.
The May 2025 collective certification scaled this exposure. The certified collective potentially covers applicants over 40 nationwide who applied through Workday's system. Workday speculated the collective could be in the "hundreds of millions"; Judge Lin responded that "allegedly widespread discrimination is not a basis for denying notice."
The engineering implications:
Disparate impact is testable. Title VII disparate impact under the four-fifths rule (selection rate of any group below 80% of the highest-rate group is a presumptive disparate impact) is a numerical test that runs against your output distribution. Your platform either passes or fails this test on any given dataset. Production data is a relevant dataset.
Architecture choices matter. A recommendation system that ranks candidates and surfaces a top-N to employers is more exposed than a system that surfaces all candidates with confidence labels. A scoring system whose features include proxies for protected class is more exposed than a system whose features are demonstrably job-related. These are engineering decisions made before the lawyers see the system.
Vendor defenses depend on observability. When a vendor argues the disparate impact was caused by the employer's criteria, not the vendor's algorithm, the vendor needs to be able to show that. Without per-employer attribution, version-pinned model behavior, and traceable feature importance, the argument is hand-waving. With them, it has substance.
NYC Local Law 144 with enforcement teeth
Local Law 144 has been on the books since 2021 and enforced since July 2023, but the December 2025 Comptroller audit is the first signal that enforcement is getting serious. The DCWP committed to fixing its complaint-routing failures, training staff on AEDT detection, and using OTI's technical expertise more systematically. The agency has also signaled increased proactive surveillance of bias audit publications and candidate notices.
The law's core requirements:
| Requirement | Specification |
|---|---|
| Bias audit | Independent third-party audit no more than 1 year before AEDT use, repeated annually |
| Coverage | Disparate impact testing across race, ethnicity, sex, and intersectional categories (e.g., Asian women, White men) for any tool that "substantially assists or replaces" discretionary hiring decisions |
| Audit metrics | Selection rate per demographic group; impact ratio relative to highest-rate group; for continuous-score AEDTs, scoring rate above the median |
| Public disclosure | Audit summary clearly and conspicuously posted on the employer's website, retained for at least 6 months |
| Candidate notice | Notice at least 10 business days before AEDT use, with description of the tool's job-related characteristics and right to opt out for an alternative selection process |
| Penalties | $500 first violation; $1,500 per day for ongoing violations; bias audit and notice are separate violations |
For engineering, the requirements imply specific instrumentation:
AEDT inventory. Every tool that scores, ranks, or filters candidates against demographic categories needs to be inventoried. The threshold is whether the tool "substantially assists or replaces" discretionary decision-making. A resume parser that extracts structured fields probably does not qualify; a scoring tool that ranks candidates does.
Demographic data plumbing. Bias audits require demographic information about candidates that the AEDT may not directly collect. The standard pattern is to use the EEO-1 categories collected by the employer's ATS, joined to AEDT input/output records. The data needs to flow into the audit pipeline without leaking into the AEDT's training or inference (using protected characteristics as model inputs creates direct discrimination liability).
Selection-rate computation. The audit measures selection rates and impact ratios per group. Your system needs to produce these on demand, stratified by job, role, employer, and tool configuration. Auditors compute these themselves for an annual audit; the platform should be able to compute them continuously for internal monitoring.
Annual audit data extract. Every year, your platform needs to provide an independent auditor with a clean dataset of inputs, outputs, and demographic data covering the audit period. The data extract is itself a deliverable, with documented schema and lineage.
A subtle but important point: the auditor must be independent. The vendor cannot self-audit. Vendor-published "audit summaries" that are not produced by an independent third party do not satisfy the law. As enforcement tightens, employers using AEDTs will increasingly require their vendors to produce third-party audit results before signing contracts. Building for auditability from day one is materially cheaper than retrofitting.
State-by-state pile-up
NYC LL 144 was first. It is no longer alone.
| Jurisdiction | Law | Status | Scope |
|---|---|---|---|
| New York City | Local Law 144 | Enforced since July 2023 | AEDTs used for jobs in or associated with NYC |
| Illinois | AI Video Interview Act | Enforced since 2020 | Video interview AI; consent and disclosure |
| Illinois | HB 3773 (amends IHRA) | Effective January 2026 | Disparate impact discrimination via AI in employment |
| Colorado | Colorado AI Act (SB 24-205) | Effective February 2026 | High-risk AI including employment decisions; consumer rights and disclosures |
| Maryland | Facial Recognition in Hiring Act | Enforced | Facial recognition in interviews |
| New Jersey | A4909 (proposed) | Pending | Annual disparate impact analysis of AEDTs |
| California | AB 2930 / draft regs | In progress | Automated decision systems including employment |
| Texas | TRAIGA (HB 149) | Effective January 2026 | High-risk AI including employment |
| EU | AI Act | Phased through 2026 | AI in HR classified as high-risk |
For products selling nationally, the practical strategy is to build to the strictest interpretation. The intersection of NYC LL 144's bias audit requirement, Colorado's high-risk AI consumer rights, Illinois's expanded disparate impact theory, and the EU AI Act's high-risk classification produces a compliance baseline that satisfies most regimes. Building separate products per jurisdiction is more expensive than a single configurable platform that defaults to strict.
What to ship: the engineering checklist
Combining the three regimes into concrete deliverables. Listed in priority order based on legal exposure and implementation cost.
Tier 1: Defensibility under existing claims
These are the controls that limit exposure under Mobley-style vendor liability and Eightfold-style FCRA claims. Build them whether or not your specific product is currently being sued.
1. Per-decision lineage and audit trail. For every candidate evaluated by your system, you can produce on demand: the original input data, the model and version that scored them, the prompt template and retrieval context (if LLM-based), the raw model output, the post-processing applied, the score or ranking surfaced to the employer, and the employer's resulting action. Persisted with cryptographic integrity, retained for the regulatory period, queryable by candidate ID, employer ID, or date range.
This is the foundation that makes everything else possible. Without it, you cannot produce evidence in litigation, comply with FCRA dispute rights, run a bias audit, or defend a disparate-impact finding. It is also the most often skipped or under-built piece.
2. Feature attribution per score. For every score your system produces, you can show which inputs contributed and how. For interpretable models (gradient-boosted trees, logistic regression), this is feature importance per prediction. For LLM-based systems, it is structured rationale tied to specific input fields and retrieval sources, validated against the underlying data.
This supports FCRA dispute rights (a candidate disputing their score needs to know which data points contributed), Mobley-style "the algorithm caused the disparate impact, not us" defenses, and LL 144 audit explainability.
3. Disparate impact monitoring. Continuous computation of selection rates and impact ratios per protected group, stratified by job, employer, and tool configuration. Alerting when any group's impact ratio falls below 0.80 (the four-fifths rule threshold). This is internal infrastructure; the annual audit is its visible output.
4. Demographic data isolation. Demographic information used for bias monitoring must not be accessible to the model at inference or training time. This is enforced at the infrastructure layer, not the policy layer. Train/inference services do not have read access to demographic fields; only the audit and monitoring layer does.
Tier 2: FCRA-defensible architecture (if FCRA applies to you)
If your product produces scores, rankings, or assessments that synthesize multiple data sources about a candidate, the Eightfold theory could apply to you. Whether or not it does ultimately depends on court interpretation, building these controls is good architecture regardless.
5. Pre-scoring authorization capture. Before any candidate is scored, capture employer certification of permissible purpose and candidate authorization. Store the authorization with the resulting score. A score produced without authorization should not be furnishable.
6. Candidate-facing report access. A candidate can request and receive a copy of any report generated about them, including all data points used and the score produced. The data must be presented in a form the candidate can review, not as raw model logits.
7. Dispute and reinvestigation flow. A candidate can dispute specific data points. The dispute is logged, investigated within the statutory window, and the resolution is documented. If the disputed data is corrected, the score is recomputed using the corrected data and the prior score is logged as superseded.
8. Pre-adverse action workflow. When an employer decides to reject a candidate based primarily on the AI score, the candidate receives a copy of the report and a summary of FCRA rights before the rejection is final. This is a workflow change, not just a notification feature; it requires the employer's process to integrate with the platform's pre-adverse step.
Tier 3: AEDT compliance (NYC and similar)
If your product is an AEDT used in NYC or in any jurisdiction with similar requirements (and the list is growing), you need explicit AEDT compliance infrastructure.
9. AEDT inventory and tagging. Every tool, model, or workflow that meets the AEDT definition is tagged in your system. Production traffic for AEDT tools is recorded with the level of detail an audit requires.
10. Annual audit data export. A documented, repeatable process for exporting the dataset an independent auditor needs: candidate inputs, model outputs, demographic data, and metadata for the audit period. The export is signed and immutable.
11. Audit summary publication helper. Tooling that produces the public audit summary in the format LL 144 requires, generated from the auditor's report, ready for employer publication.
12. Candidate notice integration. The 10-business-day notice requirement is enforced in the workflow: candidates cannot be scored within the notice period unless they have received and acknowledged the notice and opted not to use the alternative selection process.
Tier 4: State-by-state extensions
The strict-state pattern. Colorado AI Act consumer rights, Illinois disparate impact monitoring, Texas TRAIGA disclosures. Each of these adds requirements that the strict-state baseline above mostly already covers. The remaining engineering work is configuring per-jurisdiction features (different notice text, different opt-out workflows, different disclosure formats).
What to do this quarter
A staged engineering response that gets a hiring AI product to a defensible position in roughly one quarter:
Weeks 1 to 2. AEDT inventory. List every tool in your platform that scores, ranks, classifies, or recommends candidates. Categorize by AEDT status under LL 144 and similar laws, by FCRA-relevance under Eightfold-style theory, and by Mobley-style vendor liability exposure.
Weeks 3 to 6. Tier 1 controls. Per-decision lineage, feature attribution, disparate impact monitoring, demographic data isolation. This is the foundation.
Weeks 7 to 9. Tier 2 controls if FCRA applies. Authorization capture, candidate-facing access, dispute flow, pre-adverse workflow.
Weeks 10 to 12. Tier 3 controls. AEDT inventory tooling, audit data export, candidate notice integration. Engage an independent auditor for an inaugural bias audit; the timing positions you to publish a current audit summary by quarter end.
Quarter 2 onward. Tier 4 state-by-state extensions, ongoing audit cadence, continuous disparate impact monitoring, FCRA dispute response operations.
The teams that ship in this order produce a product that survives examination. The teams that ship the model first and add compliance later spend the next year in remediation that costs more than building it correctly the first time.
A note on disclaimers
A common pattern in AI hiring vendor agreements: "Vendor's tools provide recommendations only. Customer is responsible for all hiring decisions and for compliance with applicable employment laws." This is the standard contractual liability shift.
After Mobley, courts may weigh these disclaimers against operational reality. If your system's architecture predictably yields outcomes that influence hiring decisions, and your customer's process functionally relies on those outcomes, the disclaimer carries less weight. After Eightfold, a vendor argument that "our scores are not consumer reports because we say so" carries even less; the FCRA's definition of consumer report turns on what the report does, not what it is called.
The defensible path is not stronger disclaimers; it is architecture that makes the underlying claims true. A system that genuinely surfaces all qualified candidates with explanations rather than filtering some out, that genuinely supports FCRA-style dispute rights, that genuinely passes disparate impact tests, can defend itself with evidence rather than language.
How Respan fits
The controls Mobley, Eightfold, and LL 144 demand from a hiring AI platform are observability problems first and policy problems second. Respan is the substrate that makes per-decision lineage, feature attribution, disparate impact monitoring, and FCRA-style dispute response cheap to build instead of a multi-quarter retrofit.
- Tracing: every Match Score, ranking, and screening decision captured as one connected trace. Auto-instrumented for LangChain, LlamaIndex, Vercel AI SDK, CrewAI, AutoGen, OpenAI Agents SDK. When a candidate disputes a score under FCRA or an auditor asks why a protected group's selection rate dipped, you can replay the exact inputs, retrieval context, model version, and post-processing that produced the decision.
- Evals: ten built-in evaluators (faithfulness, citation accuracy, refusal correctness, harmfulness) plus LLM-as-judge and custom Python evaluators. Production traffic flows directly into datasets. CI-aware experiments block regressions on disparate impact ratios, four-fifths-rule violations, proxy-feature leakage, and rationale-to-input grounding before deploys ship.
- Gateway: 500+ models behind an OpenAI-compatible interface, semantic caching, fallback chains, per-customer spending caps. Per-employer routing and version pinning let you prove which model and prompt scored which candidate, the attribution Mobley-style vendor defenses depend on.
- Prompt management: versioned registry, dev/staging/prod environments with approval workflows, A/B testing in production with one-click rollback. Screening prompts, rationale-generation prompts, candidate-notice templates, and pre-adverse action explanations all belong in the registry so changes are reviewable and reversible without a redeploy.
- Monitors and alerts: selection rate per protected group, four-fifths impact ratio, score drift per employer, rationale-grounding failure rate, dispute-resolution latency. Slack, email, PagerDuty, webhook. The annual LL 144 audit becomes the visible output of monitoring you already run continuously.
A reasonable starter loop for hiring AI builders:
- Instrument every LLM call with Respan tracing including resume parsing, retrieval, scoring, and rationale spans.
- Pull 200 to 500 production candidate evaluations into a dataset and label them for job-relatedness, rationale grounding, and demographic balance.
- Wire two or three evaluators that catch the failure modes you most fear (sub-0.80 impact ratios on protected groups, rationales that cite data not in the candidate's record, proxies for protected class leaking into features).
- Put your screening prompts, rationale prompts, and pre-adverse notice templates behind the registry so you can version, A/B, and roll back without a deploy.
- Route through the gateway so per-employer attribution, version pinning, and spending caps are enforced at the call site rather than reconstructed from logs.
Without this loop, a single Mobley-style disparate impact finding or Eightfold-style FCRA dispute lands as a forensics project across logs you do not have. With it, the answer is a query.
To wire any of the patterns above on Respan, start tracing for free, read the docs, or talk to us.
Related reading
- Building Bias Audits for AI Recruiting: the methodology and dataset construction details
- Evaluating Recruiting LLMs: match quality, calibration, and adverse impact
- Building an AI Sourcing and Screening Agent: full architecture walkthrough
- How HR Tech Teams Build LLM Apps in 2026: pillar overview
- Building Adverse Action Explainability for LLM-Driven Credit Decisions: adjacent FCRA territory in fintech
