1 minute read

The paper explores the capability of Large Language Models (LLMs) to autonomously hack websites. It demonstrates that some models (especially GPT-4) can perform complex cybersecurity attacks, such as blind database schema extraction and SQL injections, without human intervention or prior knowledge of the vulnerabilities.

Method Overview

The method creates an agent by using a LLM with function calling and extending some of its abilities: it gives the LLM access to a headless web browser, namely Playwright, which allows programatic access to websites. Additionally, the LLM is given 7 web hacking documents that are extracted and unmodified from online sources as part of its knowledge. Finally, through prompting, the LLM is given the ability to plan. The authors do not include the exact steps and prompts used intentionally and the goal of the paper is to raise awareness about the possibility of such attacks. Also, in order not to disrupt real-world systems, the authors perform these types of attacks on sandboxed websites and test for 15 vulnerabilities as detailed below:

Results

The study found that GPT-4 successfully hacked 73.3% of the tested vulnerabilities, showcasing the significant offensive capabilities of LLMs. The results underline a scaling law where the success rate drops significantly with smaller models. For example, GPT-3.5 can only correctly execute a single SQL injection and fails on every other task, including simple and widely known attacks, like XSS and CSRF attacks. Also, all open-source models tested by the authors currently fail to perform these attacks.

Hacking abilities of different agents

Conclusion

The findings highlight the potential risks associated with the deployment of highly capable LLMs, as they can autonomously find and exploit web vulnerabilities. More details in the paper.

Congrats to the authors for their work!

Fang, Richard et al. “LLM Agents can Autonomously Hack Websites.” ArXiv abs/2402.06664 (2024): n. pag.

Updated: