Thanks!I did (not extensively) tried hackmyclaw but no success. The challenge is a complete black box and the user intent (e.g., &quot;summarize my emails&quot;) is not known - this is critical for the prompt injection payload. I also suspect that batch processing of &quot;malicious&quot; emails (every 3 hours) adds a bias to the model behaviour (a lot of potential and detected prompt injection payloads are injected in context). That&#x27;s why I always start my experiments with a fresh context. Moreover, &quot;hacking&quot; the VPS is not allowed.Imho the author shall disclose more info about the setup (version, user intent, exact config) to make it more realistic. I read people saying &quot;OpenClaw is secure against prompt injection&quot; because nobody was able to solve the challenge - it&#x27;s not.

This is cool stuff, have you considered submitting any of these exploits to <a href="https:&#x2F;&#x2F;hackmyclaw.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;hackmyclaw.com&#x2F;</a>? Email being the only allowed injection vector might be tricky though.

Some prompt injection experiments with OpenClaw and GPT-5.4. Last part of the BrokenClaw series.

Show HN: BrokenClaw Part 5: GPT-5.4 Edition (Prompt Injection)

Comments

feznyng

veganmosfet