Show HN: BrokenClaw Part 5: GPT-5.4 Edition (Prompt Injection)

9 points
1/21/1970
a day ago
by veganmosfet

Comments


feznyng

This is cool stuff, have you considered submitting any of these exploits to https://hackmyclaw.com/? Email being the only allowed injection vector might be tricky though.

a day ago

veganmosfet

Thanks!

I did (not extensively) tried hackmyclaw but no success. The challenge is a complete black box and the user intent (e.g., "summarize my emails") is not known - this is critical for the prompt injection payload. I also suspect that batch processing of "malicious" emails (every 3 hours) adds a bias to the model behaviour (a lot of potential and detected prompt injection payloads are injected in context). That's why I always start my experiments with a fresh context. Moreover, "hacking" the VPS is not allowed.

Imho the author shall disclose more info about the setup (version, user intent, exact config) to make it more realistic. I read people saying "OpenClaw is secure against prompt injection" because nobody was able to solve the challenge - it's not.

15 hours ago