Enterprise Software Case Study - 96% False Positives Proven

Fifty-one weeks of pressure

The product security team at one of the world's largest enterprise software companies works on a monthly release cycle. Every cycle ends the same way: a queue of open CVEs, a release date that will not move, and development teams waiting on a verdict.

For the senior product security engineer who owns a large part of that queue, the math was unforgiving. Black Duck scans across the product portfolio produced a constant stream of findings. He analyzed about 2,000 of them a year himself. A high-severity CVE took roughly an hour to work through, from opening the ticket to making a defensible decision.

“From the moment I start working on the ticket until I'm able to make a decision, it's about an hour when it's a high vulnerability.”
Senior Product Security Engineer

Some mornings disappeared entirely into ticket processing, four hours before any other work started. He called the past twelve months a brutal year for CVE volume. And the pressure never let up.

“There is one week a year when we don't have pressure. That's Christmas week. For fifty-one weeks a year, we're dealing with high CVEs.”
Senior Product Security Engineer

Panic above 7.0, forgotten below

With no way to investigate everything, the team relied on the same shortcut most enterprises use: the CVSS score. Findings above 7.0 got analyzed. Findings below 7.0 were deprioritized automatically. Nobody believed the line was accurate. It was just the only line available.

“When the CVSS score is above 7, everybody panics. Below 7, people forget about it because it's not on their plate.”
Senior Product Security Engineer

The scores did not even agree with each other. EPSS and CVSS regularly pointed in different directions on the same CVE. And neither answered the question the team actually needed answered: can an attacker exploit this in our product, as it is actually built and deployed? That is an exploitability question, and no score answers it.

The back and forth that leaves scars

The hours were only half the cost. Every verdict had to be defended to development teams under release pressure, usually with nothing stronger than a severity score as evidence. The debates were long, repetitive, and personal.

“They cursed my mom on every continent of the planet. We go into this back and forth about whether we're vulnerable or not.”
Senior Product Security Engineer

A security ticket landing weeks before a release felt, in his words, like a hammer. It causes friction, and it causes scars. The team did not need another scanner or another score. They needed evidence strong enough to end the argument.

Three tests

The team did not evaluate Konvu on a slide deck. They ran it through three increasingly hard tests.

1
Test 1
A working session
Konvu walked through how its agents investigate a finding: whether an attacker can actually exploit it in the product, with the full analysis laid out in a graph.
2
Test 2
A POC on their own code
The team pointed Konvu at one of their own projects and let the agents work through its backlog of Black Duck findings.
3
Test 3
A live CVE from his queue
He ran a real Apache Tomcat ticket through Konvu and compared the analysis, step by step, against his own manual process.

The first reaction came in the working session, looking at how Konvu presents its analysis. “I have to say, that's impressive,” he told the team.

“Of all the things you showed me, what I liked the most was the triage evidence. It saves a lot of back and forth.”
Senior Product Security Engineer

The POC ran on one of the company's own projects. Konvu's agents worked through its Black Duck findings and returned a verdict for each one, backed by a written analysis. Only 75 findings were exploitable. The rest were false positives, including an Angular CVE that a score-based process would have escalated: the vulnerable code was present, but the HTTP client calls it depended on were not attacker-controllable. Present in the code, not exploitable in the product. That distinction is exactly what a severity score cannot see.

The last test was his own workflow. He took a live Apache Tomcat CVE from his queue and ran Konvu against it, comparing the step-by-step analysis and dependency graph to what he would produce by hand, and to what a general-purpose AI assistant produced for the same ticket. Konvu's analysis held up. He had recently spent a full Sunday manually working through CVEs. This was the workflow he saw Konvu replacing.

What the evidence showed

Across the findings Konvu assessed, the severity mix looked like any enterprise backlog: 8% critical, 41% high, 36% medium. Read by score alone, half the queue demanded panic.

Sankey chart of the assessment: 100% of findings assessed, split by CVSS severity into 8% critical, 41% high, 36% medium and 15% low, assessed by Konvu into 96% false positives and 4% exploitable, ending in 19% open, 23% fixed and 58% dismissed. — The assessment flow: coverage, CVSS severity, Konvu's verdicts, and where each finding ended up.

The verdicts told a different story. 96% of findings were false positives, each dismissed with a written explanation of why no attack path exists in the product. 4% were exploitable and worth engineering time.

The detail that mattered most sat at the bottom of the severity range. Several of the exploitable findings were low and medium severity CVEs. Under the 7.0 rule, nobody would ever have opened those tickets. The score-based process burned hours on noise above the line and hid real risk below it.

“Engineers love the evidence. They see that, and it ends many conversations.”
Senior Product Security Engineer

By the end of the assessment, 58% of the backlog was dismissed with the evidence attached to each ticket, 23% was fixed, and 19% remained open with an owner. A queue that had only ever grown started to shrink.

The target they set for the rollout was direct: cut triage from an hour per CVE to under ten minutes, with every verdict carrying evidence a developer can check instead of a score they can argue with.

After the POC, the team rolled Konvu out across more of the portfolio. The rollout was the easy part: connect a repository, and the agents work through its backlog the same way they worked through the first one. Coverage no longer costs analyst hours.

Why it worked

Konvu did not replace Black Duck, and it did not add another dashboard. The scanner keeps finding. Konvu's agents do the investigation step that used to cost an hour per ticket: they determine whether each finding is exploitable in the product as it is actually built, then write the verdict and the evidence back into the tools the team already uses.

Exploitability is a different question from severity, and a different question from reachability. A reachable function is not necessarily exploitable. A CVE below 7.0 is not necessarily safe. Evidence, not scores, is what ends the back and forth.

Enterprise Software Leader Proves 96% of SCA Findings Are False Positives

Key Results

Fifty-one weeks of pressure

Panic above 7.0, forgotten below

The back and forth that leaves scars

Three tests

A working session

A POC on their own code

A live CVE from his queue

What the evidence showed

Why it worked

Use case

Ready to end the back and forth?