← All Insights Sammalkko

Performance Management's AI Layer: What Founders Keep Getting Wrong

Antti Virtanen

Performance management is one of those HR problems that every enterprise has, everyone is dissatisfied with, and almost no one has solved well. The annual review cycle is widely known to be a poor signal generator. The continuous feedback movement has produced better intentions than outcomes at most companies — feedback forms that nobody fills in, nudges that become noise, manager dashboards that look good in demos and sit unopened in production. The category is genuinely ripe for better tooling.

So we pay close attention to performance management pitches, and we see a lot of them. What follows is an honest account of the three patterns that consistently indicate a founder who has not yet talked to enough HR leaders to understand where the real resistance lives.

Pattern One: Conflating Data Volume With Signal Quality

The pitch typically goes: "We aggregate signals from Slack, Jira, GitHub, calendar data, and peer feedback to build a comprehensive performance profile for every employee. More data means better assessments." The logic sounds plausible. In practice, it contains two problems that HR leaders who have been around long enough will raise immediately.

The first problem is that behavioral signal data — Slack message frequency, calendar meeting participation, Jira ticket close rate — is highly sensitive to role type, team structure, and working style. An engineer who prefers to think deeply and write considered comments is not less performant than one who is vocal in Slack. Aggregating behavioral signals without role-specific normalization produces systematic bias against certain working styles, and HR leaders have seen enough "objective" algorithmic assessments produce obviously wrong outputs to be deeply skeptical of this approach.

The second problem is employee rights under GDPR and the EU AI Act. Workplace monitoring at the level that behavioral signal aggregation requires is subject to Works Council consultation requirements in Germany and most of the Nordics, consent requirements under GDPR, and increasingly specific restrictions under the EU AI Act's provisions on AI systems used in employment contexts. Founders who have not mapped their data collection to these requirements are going to hit a wall in European enterprise procurement that is not a minor compliance hurdle — it is a showstopper in many markets.

We are not saying behavioral signal aggregation is useless. We are saying that the founders who are doing it well are the ones who have designed their data model around what is permissible, have built the consent and transparency infrastructure into the product from day one, and have been honest with themselves about which signals are genuinely predictive versus which ones are just available.

Pattern Two: Building for HR Rather Than Managers

Performance management products that are designed to satisfy HR's need for structured documentation are different from performance management products that are designed to help a manager have better conversations with their team. Most of the products we see are the former masquerading as the latter.

The distinction shows up in workflow design. An HR-centric product ensures that every employee has a completed performance review on record in the HRIS by the compliance deadline. A manager-centric product makes it easier for a manager to have a specific, useful conversation with a team member about their development and trajectory. The first product solves HR's problem. The second product solves the manager's problem. Only one of them creates behavior change in the people whose job is to develop their reports.

The signal that a product is genuinely manager-centric is whether managers voluntarily use it outside of formal review cycles. If the only time a manager opens the performance tool is when HR sends a reminder that annual reviews are due in two weeks, the product is not adding value to the manager's job — it is adding compliance overhead. Products where managers are actively using the tool to capture real-time observations, to track commitments from 1:1 conversations, and to build a shared record with their reports have a different adoption profile and a much stronger retention story.

Getting to manager-centric design requires talking to a lot of managers, not a lot of HR leaders. Founders who have done their discovery primarily with CHROs and Head of People tend to build compliance-complete products that managers avoid. The ones who have done deep discovery with people managers tend to build tools that managers actually find useful, even if they require more advocacy to get through HR procurement.

Pattern Three: Treating Performance Data as Neutral

The third pattern is the most subtle and the hardest to address in product design. It is the assumption that AI-generated performance signals are more objective than human-generated ones, and that "removing bias" from performance assessment is primarily a matter of replacing human judgment with algorithmic judgment.

This assumption is wrong in a specific way that matters for product design. Human bias in performance assessment is real — affinity bias, recency bias, leniency bias toward people managers know personally. Algorithmic assessment does not eliminate these biases; it systematizes and obscures them. An algorithm trained on historical performance data inherits whatever biases were encoded in the decisions that generated that data. An algorithm trained on promotion outcomes in a company where senior roles skew toward a particular demographic will learn to underrate candidates who don't match that demographic pattern, and will do so in a way that is harder to challenge than an identifiable human decision-maker's judgment.

Founders who understand this design implication build products that are explicit about the sources of the signals they use, that surface the human decisions embedded in their training data, and that treat the AI layer as augmenting human judgment with additional context rather than replacing it with a score. This is not just an ethical position — it is a product design requirement for enterprise sales in regulated markets. The EU AI Act's provisions on high-risk AI systems explicitly include AI systems used in employment contexts for performance evaluation, and the transparency and explainability requirements they impose are not compatible with black-box scoring approaches.

What Good Performance AI Actually Looks Like

The performance management products that Sammalkko is most interested in are the ones that are honest about what AI can and cannot do in this context. What AI is genuinely useful for: summarizing continuous feedback into a narrative that a manager can review and annotate, surfacing patterns in a manager's feedback that might indicate consistency bias, generating a first draft of a development plan based on feedback themes and role expectations, flagging when an employee's engagement signals have shifted materially without the manager appearing to have noticed.

What AI is not useful for: generating a performance score that is presented as more objective than a manager's judgment, predicting who should be promoted without human review of the model's reasoning, assessing performance in roles where behavioral signals are a poor proxy for actual contribution quality.

The product frame that follows from this understanding is AI-as-assistant to a manager's judgment, not AI-as-assessor. It is a narrower claim than "AI performance management," but it is a claim that survives procurement scrutiny, works within European regulatory requirements, and creates genuine value for the managers who use it. We will take that over a more ambitious claim that falls apart in the second sales call with an HR leader who has seen three generations of algorithmic performance tools promise objectivity and deliver different problems.