← All Insights Sammalkko

Interview Intelligence: Removing Bias vs. Structuring Judgment

Jaakko Laine

The marketing language around AI-assisted interviewing has converged on a particular framing: this technology removes bias from hiring decisions. It is a compelling claim, and it is doing real commercial work — "reducing bias" is a purchasing rationale that resonates with HR leaders who are aware of the legal and reputational risks of biased selection processes. The problem is that the claim conflates two distinct problems, and the conflation has consequences for which products get built, which get bought, and which actually improve hiring outcomes.

The two problems are: removing interviewer bias from the evaluation of a candidate, and structuring interviewer judgment so that evaluations across candidates are consistent and comparable. These are related but not identical, and the tools that work well for one do not automatically work well for the other. This piece is about the distinction and what it means for how interview intelligence products should be designed and evaluated.

What Bias in Interviewing Actually Means

Bias in interviewing is a large and heterogeneous category. The types that have received the most attention — affinity bias (preferring candidates similar to the interviewer), halo/horn effects (early positive or negative impressions that contaminate subsequent evaluation), and attribution bias (interpreting the same behavior differently depending on candidate characteristics) — are real phenomena with experimental evidence behind them. They are also not uniformly reduced by AI-assisted processes.

Some biases operate at the question-asking stage: interviewers ask different questions of different candidates, creating non-comparable data. Structured interview tools that prescribe a consistent question sequence for all candidates directly address this form of bias. If every candidate is asked the same questions in the same order, the evaluation data is at least based on comparable prompts, even if the evaluation of responses may still be influenced by interviewer-level biases.

Other biases operate at the evaluation stage: the same answer is scored differently depending on the candidate's characteristics. This is harder to address with technology. An AI system that scores interview responses against a rubric is potentially less susceptible to affinity bias than a human interviewer — it does not have a preference for candidates who went to the same university or share the interviewer's communication style. But it may encode the biases embedded in its training data, which typically includes historical hiring decisions made by humans who had their own bias patterns. A scoring model trained on "successful hires" that reflects historical patterns of who was hired into which roles will learn to score candidates favorably when they resemble the historical successful-hire population.

Structuring Judgment: The More Tractable Problem

The more tractable problem — and the one where I think the current generation of interview intelligence tools genuinely delivers — is structured judgment. This is about consistency and comparability rather than bias elimination.

A structured interview process defines, in advance, the competencies being assessed, the questions that will elicit evidence for each competency, and the rubric by which responses will be evaluated. When this structure exists, AI-assisted tools can help in several measurable ways: verifying that interviewers are actually asking the prescribed questions and not drifting toward unstructured conversation, capturing and transcribing responses so that evaluation happens on recorded content rather than interviewer memory, surfacing the relevant rubric criteria at the point of evaluation so that interviewers are scoring against defined criteria rather than a gestalt impression, and flagging when a particular competency has not been covered adequately.

These are not glamorous functions. "Made sure the interviewer asked all the questions on the rubric" is less compelling as a product story than "reduced bias in hiring decisions." But they produce measurable improvements in inter-rater reliability — the correlation between how different interviewers evaluate the same candidate — which is a meaningful outcome because low inter-rater reliability is one of the primary reasons that hiring decisions are poorly predictive of subsequent performance.

When we backed Hirelogic in 2023, the core of the investment thesis was exactly this: the product is primarily a structured judgment tool, not a bias elimination tool, and it is honest about that distinction. The value proposition is that you can run a structured interview process at scale — across many hiring managers, many roles, many geographies — with consistent coverage of the defined competencies and consistent application of the evaluation rubric, without the training overhead that would otherwise be required to achieve that consistency through hiring manager enablement alone.

The Legal Risk Landscape in Europe

The EU AI Act classifies AI systems used in employment and recruitment contexts — including candidate screening, interview analysis, and hiring decision support — as high-risk AI systems under Annex III. High-risk AI systems are subject to requirements around transparency, human oversight, documentation, and bias testing before deployment.

This has significant implications for how interview intelligence products must be designed and marketed in European markets. A product that is positioned as "removing bias" needs to be able to demonstrate, through documented conformity assessment, that it actually does what it claims and that it does not introduce differential outcomes across protected characteristics. This is a high evidentiary bar, and products that have not been designed and tested with this standard in mind are going to face difficult compliance questions as national AI Act enforcement bodies begin to act.

The practical consequence for vendors is that the "bias removal" framing, while commercially attractive in marketing, creates a legal claim that is hard to support under the EU AI Act's conformity assessment requirements. Vendors who position more carefully — "structured interview process support" rather than "bias elimination" — are making a more defensible claim while describing the same underlying product capability. This is not just a marketing choice; it is a compliance strategy in a market where the legal risk of overclaiming AI performance is increasingly real.

What Good Interview Intelligence Products Do Well

Setting aside the bias framing, the products in this category that generate genuine enterprise value are doing a few specific things well.

First, they make structured interview processes operationally sustainable. Designing a structured interview process is relatively easy. Getting every hiring manager across a large organization to execute that process consistently, over time, without the structure degrading as hiring managers develop their own "improvements" — this is a hard organizational change management problem. A tool that enforces the structure at the point of execution, rather than relying on training and cultural norms, makes the consistency achievable at scale.

Second, they create an interview data layer that did not exist before. Most organizations have no reliable record of what was actually discussed in a hiring conversation. Transcription and structured note-taking create a searchable record that supports better debrief conversations, better hiring committee calibration, and — over time — a dataset of interview content that can be used to study what signal actually predicts hiring outcomes.

Third, the best products give interviewers better inputs at the point of evaluation. Presenting the rubric criteria and the candidate's responses in a structured format at the time of scoring reduces the cognitive load on the interviewer and increases the probability that the evaluation captures the intended signal rather than the most memorable moment of the conversation.

What They Do Not Do

We are not saying that interview intelligence products have no effect on bias. Structured processes consistently reduce certain categories of bias relative to unstructured processes — that is supported by the IO psychology research literature, and the tools help achieve structure at scale. But the specific claim that AI analysis of interview content detects or eliminates interviewer bias is not well-supported by current product capabilities, and products that lead with this claim are setting up for a credibility problem when the claim does not survive scrutiny in an enterprise procurement context that includes a legal review.

The interview intelligence category is real and growing. The value proposition is structured judgment at scale, consistent coverage of evaluation rubrics, and the creation of an interview data layer. Products that lead with those claims honestly are building on a foundation that will hold. Products that lead with bias elimination as the primary value proposition are building on a foundation that the EU AI Act's conformity assessment requirements are going to make uncomfortable in the next 18 months.