Safety Guidelines

Hermes is built with multiple layers of defense to keep you and your data safe — even when it's acting autonomously.

The approval system

Before Hermes takes certain actions it will ask for your approval. This is a core safety feature. On Telegram / Slack / Discord, Hermes sends you a message with the full details of what it wants to do. Reply yes to allow or no to deny. You have 10 minutes to respond — after that the action is denied (fail-safe).

The "lethal trifecta" defense

If Hermes reads content from the web in the current turn, any action that sends data externally requires your approval — regardless of normal approval settings. This breaks the prompt injection attack chain: malicious page → agent reads it → agent tries to email your data to attacker → you see an approval request with the full email body.

Hardline blocks — cannot be overridden

These actions are permanently blocked regardless of any setting, approval, or instruction:

Filesystem root wipes (rm -rf /)
Fork bombs
Formatting mounted disks
Zeroing physical drives
Piping untrusted remote URLs to a shell at root level

There is no override flag.

Prompt injection awareness

When Hermes browses the web or reads documents, that content could contain hidden instructions trying to hijack your agent. Defenses include taint tracking (untrusted content flags the turn), context file scanning (project files scanned before loading), and approval modals that show the exact data about to be sent — not just the tool name.

If Hermes asks to take an unexpected action after browsing the web, read the approval details carefully. If something looks off, deny it.

Full security model

The upstream Hermes security architecture covers nine distinct layers — more detail than summarised here. Key areas include container isolation (hardened Docker, capability dropping, resource limits), user authorization via allowlists and DM pairing codes, SSRF protection blocking private networks and cloud metadata endpoints, pre-execution scanning for homograph spoofing and pipe-to-interpreter patterns, context file injection protection, and supply-chain advisory checking for known-compromised packages.

Full reference: Hermes Security Model — all nine layers, configuration options, and production deployment checklist.