
Episode Summary
AI is starting to change penetration testing, but most people are asking the wrong question. In this episode of Secured, Cole Cornford sits down with Brendan Dolan-Gavitt, AI researcher at XBOW and former NYU professor, to unpack what autonomous pen testing really is, what it can reliably do today, and what still needs humans.
They explore why AI agents are great at scaling the boring parts of testing, like authenticated workflows and broad vulnerability coverage across huge attack surfaces, and why that does not automatically translate to deep, context-aware exploitation. The conversation also gets into the messy parts: AI systems overclaiming “serious” findings, business logic flaws that are hard to verify, audit expectations, and why scope control needs real guardrails, not vibes. From agent traces and validation models to cost curves and creative exfiltration tricks, this episode is a grounded look at where AI helps AppSec and where it can still cause damage if you trust it too much.
Presented By
Chapters:
00:00 – Intro
03:10 – From academia to building autonomous security tools
05:00 – Human pen testers vs AI agents: what is actually different
06:40 – Where AI helps most: boring tasks and low hanging fruit
08:30 – Scale: a thousand targets vs hiring a thousand testers
10:20 – Accessibility, economics, and Jevons paradox
12:30 – Accountability: audit evidence, traces, and “who signs off”
14:40 – Scope control: avoiding prod and preventing out-of-scope actions
16:20 – Safety checkers, overseer agents, and persuasion resistance
18:40 – The cost question: VC money, inference pricing, and efficiency
21:20 – When AI wastes money and why prioritisation matters
23:50 – Failure mode: overclaiming business “vulnerabilities”
26:10 – Validation agents and adversarial peer review
28:40 – The scary clever stuff: exfiltrating files as images
31:00 – What AI finds well: XSS, SQLi, file traversal, hard proof bugs
33:10 – What AI struggles with: business logic and contextual judgement
35:20 – Hype vs skepticism and why nobody has a crystal ball





