After 13 investments and reviewing somewhere north of 300 companies in AI HR tech over four years, patterns have emerged about what separates the companies worth backing from the ones that will be features inside a larger product within 18 months. I have been reluctant to write this up as a list — lists flatten the judgment calls that actually matter — but several founders have asked me directly to be more explicit about our criteria, and I think that is a reasonable request.
What follows is as close to a direct account as I can give. These are not universal truths about what makes a great company in this space. They are the criteria we apply at Sammalkko, informed by our thesis, our operating backgrounds, and what we have learned from the investments we have made. Some of it will not apply to every sub-segment of AI HR tech. Where I know the nuance matters, I have tried to name it.
The Founding Team's Relationship to the Problem
The most reliable signal I have found is not the founders' credentials — it is the quality of their problem understanding. When a founding team describes the pain they are solving, I am listening for granularity. Can they tell me exactly which person in an enterprise HR organization experiences this pain, at what point in their workflow, and what they currently do to manage it? Can they explain why the current workaround is inadequate in a specific, mechanistic way rather than "it's too slow" or "it doesn't scale"?
The founding teams that have this granularity almost always have direct operator history in the problem — either as HR practitioners, as builders inside HR software companies, or as engineers who spent years working on a problem adjacent to this one and can transfer the insight. The founding teams that lack this granularity tend to have identified the market opportunity first and built the problem understanding second. That order produces products that are accurate about the category but imprecise about the specific failure mode they are solving.
This does not mean we only back operator founders. We have backed technically exceptional teams with no HR operations background. But in those cases, we want to see evidence of the learning process: early enterprise conversations that produced specific, surprising insights, design partner relationships that shaped product direction in traceable ways, or an advisor with deep HR operations credibility who is genuinely engaged rather than a logo on the website.
AI-Native Architecture, Not AI-Augmented Workflow
This distinction matters more than it sounds. An AI-augmented workflow product automates or accelerates steps in an existing HR process. The HR process is the product; the AI is a feature that makes it faster or less manual. An AI-native product is one where the ML model is the value proposition — where the product cannot be described without describing the model, and where the quality of the model is the primary competitive variable.
We invest in the second category. The first category produces real businesses — often better businesses in the short term, because the value proposition is immediately legible to an enterprise buyer — but they lack the compounding moat we are looking for. An AI-augmented workflow tool can be replicated by a competitor who builds the same workflow with a better UI. An AI-native product where the model quality depends on proprietary training data or a feedback loop that compounds with usage is much harder to displace once it is embedded.
The practical test we apply: if you replaced the AI component with a rules-based system or a human review step, would the product still exist? If the honest answer is yes, it is probably AI-augmented. If the honest answer is that the product becomes impossible or incoherent, it is probably AI-native.
The Data Flywheel Question
The question I spend the most time on in technical diligence — and this is where Jaakko's background as an ML practitioner is invaluable — is: what is the mechanism by which the model improves over time, and does it depend on proprietary data accumulation?
The strongest version of this is cross-customer learning: the model trained on data from customer A becomes more accurate for customer B, in a way that does not expose customer A's data directly but does compound the platform's overall model quality. This creates a network effect at the model layer: the more customers on the platform, the better the model for every customer, and the harder it is for a new entrant to match quality without equivalent data volume.
Not every business model supports cross-customer learning — some enterprise contracts explicitly prohibit it. But within-customer model improvement is usually achievable: the model trained on this company's historical hires becomes more accurate as more hiring decisions feed back into it. We want to understand what that loop looks like and what model quality looks like before it has accumulated meaningful signal.
Enterprise Distribution Insight
HR software sells into a complex stakeholder environment. The economic buyer is typically the CHRO or VP People, but the technical evaluator is often an HR operations or HRIS team, and the end user is hiring managers or recruiters who have different success criteria than either of the above. The procurement process typically involves legal review (GDPR, employment law compliance), IT security review, and often a works council or employee representative consultation in European contexts.
We look for founding teams who have navigated this process at least once — either as an enterprise buyer or as a vendor who has completed a full enterprise sales cycle including the legal and IT review stages. The failure mode we see most often is teams who have built a great product, sold it successfully to SMBs or mid-market companies, and are now approaching enterprise at a size where the procurement complexity is qualitatively different from anything they have handled before. That transition is learnable, but it takes longer than most teams budget for, and the resource implications affect everything from engineering priorities to cash runway planning.
The proxy signal I look for is whether the founder can describe, in specific terms, who the six people are in the procurement process for a typical enterprise deal, what each of them cares about, and what question from each of them they are most worried about not being able to answer. Teams who can answer this clearly have typically been through the process. Teams who cannot have typically not.
The Red Flags We Have Learned to Take Seriously
After four years of active deal flow in this space, a few patterns reliably predict problems that are harder to fix post-investment than they looked pre-investment.
The first is over-indexing on the technology demo. The demos for AI HR tech products are often impressive and sometimes misleading — the live demo environment uses clean, carefully prepared data that rarely represents the messy reality of enterprise HR data. Founders who are primarily comfortable in the demo environment and become visibly less comfortable when the conversation turns to enterprise data onboarding, training data composition, or model performance on real customer data are showing you something important.
The second is market sizing that starts with "HR is a $500 billion market." It is, and it is irrelevant. What matters is the specific wedge: which HR buyer, which pain point, which transaction size, which sales motion, and what the plausible market size is for that specific product. Founders who have not done this decomposition are often not ready to sell enterprise.
The third, which I did not expect to matter as much as it does, is board-readiness. Founders who have not worked with an external board member before — who have not experienced the value and the friction of having an investor in the room for strategic decisions — sometimes find the transition from solo decision-making to shared governance more disruptive than the company can absorb while also trying to grow. This is not a reason not to invest; it is a reason to be explicit about what the governance relationship will look like before the term sheet is signed.
None of these criteria produce certainty. Thirteen investments have produced outcomes across the distribution, and the relationship between the quality of our initial diligence and the outcome has been imperfect. What these criteria do is eliminate the most predictable failure modes and increase the probability that the surprises that come after investment are the good kind — unexpected growth vectors — rather than the bad kind — foundational assumptions that were wrong from the start.