Can Claude Code perform a real security audit?

Claude Code can read configuration files, inspect running services, check file permissions, and cross-reference findings across multiple system layers. It is not a replacement for professional penetration testing, but it catches configuration drift, missing hardening, and blind spots that manual checklists miss. Running multiple focused agents in parallel gives broad coverage in minutes.

Is it safe to give an AI tool access to a production server?

The same caution applies as with any administrative tool. Review what the agent proposes before applying changes, test configuration changes before reloading services, and keep backups current. Claude Code shows you every command before it runs, so you maintain control over what actually executes.

Does Docker really bypass UFW?

Yes. Docker writes its own iptables rules in the FORWARD chain, which UFW does not manage. Published container ports are reachable from the internet even if UFW has no rule allowing them. The DOCKER-USER chain is the intended place to add filtering rules that Docker will respect.

How long does a full server audit take with Claude Code?

Running five parallel agents covering different attack surfaces, the audit completed in about four minutes. Applying the fixes took longer because each change required testing and service reloads, but the discovery phase is fast.

What should I audit first on my own server?

Start with Docker networking if you run containers behind a host firewall. Then check which ports are actually listening on 0.0.0.0 versus 127.0.0.1. Finally, review your Nginx or reverse proxy configuration for missing security headers and rate limiting. These three areas account for the majority of findings on most servers.

How I Hardened a Production Server with Claude Code

Most server hardening guides are checklists. Install a firewall, disable root login, set up fail2ban, done. The problem is not that those steps are wrong. They are necessary. But they represent maybe 30% of what actually matters, and the remaining 70% is the stuff nobody writes about because it requires understanding the specific server in front of you, not a generic Ubuntu box from a tutorial.

I run a VPS that hosts a Mailcow mail server for multiple domains, three static websites including this blog, a Plausible Analytics instance, and two Telegram bots. The basics were already in place: UFW with default-deny, SSH on a non-standard port with key-only authentication, fail2ban, AppArmor, AIDE for integrity monitoring, and automatic security updates. By most checklists, this server was “done.”

Then I looked at the fail2ban logs.

89 Bans and a Question
Five Agents, One Audit
What the Checklist Missed
Docker Does Not Care About Your Firewall
The Ports Nobody Thinks About
Nginx Is Not Secure by Default
SSH Is Harder Than You Think
The Small Things That Add Up
The Audit Prompt
Lessons Learned
Frequently Asked Questions

89 Bans and a Question

My nginx-exploit-scan jail had banned 89 IP addresses in one week. I had written that filter myself to catch the usual background noise: bots scanning for .env files, WordPress exploits, exposed .git directories. The filter was working. But 89 bans in a week made me wonder what I was not catching.

The top requests were exactly what you would expect from automated vulnerability scanners sweeping IP ranges. Over 500 hits on a known WordPress file upload exploit path. Over 300 attempts to grab /.env. Nearly 150 tries at /.git/config. Hundreds of variations like .env.local, .env.bak, .env.production, /api/.env, /backend/.env. The scanners were thorough, trying every path convention from every popular framework.

About 60% of the traffic came from cloud provider IPs, likely compromised VMs or rented scan infrastructure. Another cluster of seven IPs from a single /24 block looked like a dedicated scanning operation. The rest was the usual mix of hosting providers known for abuse traffic.

None of this was getting through. My server does not run WordPress, does not expose .env files, and does not have .git in any webroot. But the volume of automated scanning raised a simple question: if these bots are testing everything, am I sure my server has no blind spots?

I decided to find out.

Five Agents, One Audit

Instead of working through yet another hardening checklist, I used Claude Code to perform a full security audit. Not a single-pass scan, but five parallel agents, each focused on a different attack surface:

Nginx and web exposure - configuration review, security headers, exposed files, SSL status
Ports, firewall, and services - open ports, firewall rules, Docker networking, running services
SSH and user accounts - sshd_config review, key management, user privileges, cron jobs
System hardening - kernel parameters, file permissions, updates, integrity monitoring
Docker and Mailcow - container security, mail server configuration, TLS policies, backups

Each agent had a specific brief about what to check and reported findings categorized by severity. They ran simultaneously, reading configuration files, inspecting running services, checking file permissions, and cross-referencing what they found. The whole audit completed in about four minutes.

The result was 22 findings across three severity levels. Some were obvious once pointed out. Others were things I had never considered.

What the Checklist Missed

The most important findings were about backup coverage. My Restic backup ran daily with proper retention policies, but the mail data volume was not included in the backup scope. The server configs were backed up, the databases were dumped and archived, but the Docker volume containing every actual mailbox was simply missing. I had been running this setup for weeks without noticing the gap.

These are the kinds of findings that do not show up on hardening checklists because they are not about hardening. They are about understanding what your server actually does and whether your safety net covers all of it. I fixed the backup scope and added off-site replication the same day.

The remaining findings were more interesting from a hardening perspective. They fell into patterns that I suspect are common on a lot of well-intentioned but incompletely hardened servers.

Docker Does Not Care About Your Firewall

This was the finding that surprised me most, even though the behavior is well-documented. My UFW configuration was clean: default-deny incoming, only the ports I needed explicitly allowed. IPv4 and IPv6 rules consistent. Textbook setup.

But Docker does not use UFW. When you publish a port in a Docker Compose file, Docker writes its own iptables rules that bypass UFW entirely. My mail server ports were all publicly reachable, not because UFW allowed them, but because Docker routed around UFW through its own FORWARD chain.

Docker provides the DOCKER-USER chain specifically for this problem. It is the one place where you can insert rules that Docker will respect. On my server, this chain was empty. No rules, no filtering. Any port that Docker published was wide open to the internet, regardless of what UFW said.

The fix was straightforward: populate the DOCKER-USER chain with a whitelist that only allows the specific ports that need to be public, accepts established connections, and drops everything else. Then make those rules persistent so they survive a reboot.

# Allow only the mail ports that need to be public
iptables -I DOCKER-USER -i eth0 -p tcp -m multiport --dports 25,465,587,993,995 -j ACCEPT
# Allow established connections
iptables -I DOCKER-USER 2 -i eth0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
# Drop everything else from outside
iptables -A DOCKER-USER -i eth0 -j DROP
# Make rules persistent
netfilter-persistent save

The important part is not the specific iptables commands. It is knowing that this problem exists at all. If you are running Docker behind UFW and you have not configured the DOCKER-USER chain, your firewall has a hole in it the size of every published container port.

The Ports Nobody Thinks About

Mailcow, like most mail server stacks, publishes ports for every mail protocol: SMTP, SMTPS, Submission, IMAP, IMAPS, POP3, POP3S. The encrypted variants, IMAPS on 993 and POP3S on 995, are what modern mail clients use. But the unencrypted ports, IMAP on 143 and POP3 on 110, were also bound to all interfaces, meaning they were publicly accessible.

This matters because anyone connecting to those unencrypted ports could potentially authenticate with credentials in plain text. No TLS required. On a server that otherwise enforced TLS 1.2+ everywhere, this was a gap that existed simply because the defaults include legacy ports for maximum compatibility.

No modern mail client needs port 110 or 143. The fix was changing two lines in the Mailcow configuration to bind those ports to localhost instead of all interfaces. After a container restart, the unencrypted ports were only reachable internally, which is all the containers need for inter-service communication, while the encrypted ports remained public.

# In mailcow.conf - bind plaintext ports to localhost only
IMAP_PORT=127.0.0.1:143
POP_PORT=127.0.0.1:110

If you run a mail server, check which ports are actually bound to 0.0.0.0 versus 127.0.0.1. You might find legacy protocols exposed that nobody is using.

Nginx Is Not Secure by Default

Nginx does a lot of things well out of the box, but security headers are not one of them. One of my sites had a complete set of security headers because I had explicitly configured them. The other sites did not, because I had not copied the configuration over.

The audit found multiple sites with no security headers at all. A site without HSTS means a browser will happily load it over HTTP if an attacker downgrades the connection. A web application without X-Frame-Options is vulnerable to clickjacking. These are not theoretical risks. They are the exact things that automated security scanners test for, and they are trivial to fix once you know they are missing.

# The minimum set of security headers every site should have
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Permissions-Policy "camera=(), microphone=(), geolocation=()" always;

The other Nginx finding was more subtle. One reverse proxy configuration had client_max_body_size set to 0, which means no limit. Someone could send a multi-gigabyte request body and Nginx would try to buffer the entire thing. The backend application has its own limits, but Nginx would happily consume memory and disk before the request ever reached it. Setting an appropriate limit closes that gap.

Rate limiting was also completely absent. No limit_req_zone anywhere in the configuration. Every endpoint, including authentication pages, would accept unlimited requests per second from a single IP. Adding rate limit zones took a few lines in the http block and one directive per location.

# In nginx.conf http block
limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=login:10m rate=3r/s;

# In server blocks
limit_req zone=login burst=10 nodelay;

None of these are advanced techniques. They are baseline configurations that Nginx does not ship with and that most tutorials skip because they are not strictly required for the site to work.

SSH Is Harder Than You Think

SSH on a non-standard port with key-only authentication and AllowUsers is a strong starting point. But the audit found issues I had not considered.

The default MAC algorithms included hmac-sha1 and umac-64, both cryptographically outdated. They are not broken today, but they are weaker than the alternatives that the same SSH server already supports. Restricting MACs to the SHA-2 family took one line in sshd_config and removed algorithms that no modern client needs.

# Only allow strong MACs
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256

TCP forwarding was enabled, which means anyone with SSH access could use the server as a tunnel to reach internal services bound to localhost. Things like databases, caching layers, or admin dashboards that are correctly firewalled from the outside become reachable through an SSH tunnel. With only one user and key-only authentication the immediate risk was low, but a compromised SSH key would give an attacker not just shell access but also a network pivot point to every internal service. Disabling it when you do not need it removes that option entirely.

The LoginGraceTime was at the default of 120 seconds. That is two minutes for an unauthenticated connection to sit open before the server drops it. With key authentication, the handshake takes less than a second. Reducing this to 15 seconds limits how many half-open connections an attacker can maintain.

There was also an old authorized_keys file for root with a valid key in it, even though root login was disabled. The key was inert because PermitRootLogin was set to no. But if anyone toggled that setting during maintenance and forgot to revert it, there would be immediate passwordless root access waiting. Cleaning up stale key files is a two-second task that removes a latent risk.

The Small Things That Add Up

Some findings were individually minor but collectively painted a picture of gaps that accumulate on any server that was not audited systematically.

The kernel had log_martians configured correctly in the sysctl config file, but the runtime value had been silently overwritten. Docker modifies kernel parameters during startup and can undo your hardening without any warning. The fix was a systemd drop-in that re-applies the correct value after Docker starts. Without checking both the configuration and the runtime state, you would never know they had diverged.

Two desktop services, a disk manager for USB hotplugging and a battery management daemon, were running on a headless server. They are not vulnerabilities in themselves, but every unnecessary service is unnecessary attack surface and unnecessary resource consumption.

The spam filtering stack was mostly well-configured, but outbound rate limiting was disabled. A compromised mailbox could have sent unlimited spam before anyone noticed. The Spamhaus blocklist key was also empty, which meant the server was making unauthenticated queries that could be throttled or blocked at any time. Both are things that work fine during initial testing but become problems under real-world conditions.

Database dumps from the backup script were written to a world-readable temporary directory before being archived. Every process on the system could briefly read a full database dump. Moving the dump location to a restricted directory was a one-minute fix for a data exposure window that recurred on every backup run.

SOGo, the webmail interface, had a session timeout of eight hours. A forgotten browser tab in an open WiFi network would stay authenticated for an entire workday. Reducing it to two hours was a single configuration change.

HTTP/2 was missing on most sites. Not a security issue, but a free performance improvement that required adding one keyword to each listen directive. DNSSEC and DNS-over-TLS were disabled on the system resolver, leaving DNS queries in plain text. Enabling both in opportunistic mode adds protection without breaking resolution if the upstream server does not support it.

The Audit Prompt

If you want to run a similar audit on your own server, here is the approach. The key is giving each agent a detailed brief about what to check, not just “audit my server.”

The five focus areas that covered everything on my server:

Web server configuration - reverse proxy configs, security headers, exposed files in webroots, SSL certificates, rate limiting, request size limits
Network and firewall - open ports with process mapping, firewall rules, Docker networking and port bindings, IPv4 and IPv6 consistency
SSH and access control - sshd_config review including algorithms and forwarding settings, authorized keys audit across all users, sudo configuration, cron jobs
System hardening - OS updates, kernel sysctl parameters, SUID binaries, file permissions, AppArmor or SELinux status, integrity monitoring, logging
Application-specific - whatever you are running. In my case that meant mail server TLS policies, DKIM and DMARC configuration, spam filter tuning, Docker daemon hardening, and backup coverage

Run the agents in parallel and have each one report findings categorized by severity with specific file paths and recommended fixes. The full audit completes in minutes.

The value is not in any individual check. Most of these are things you could look up. The value is in the coverage. A systematic audit catches the interaction effects and configuration drift that no single checklist covers.

Lessons Learned

The basics are necessary but not sufficient. A firewall, key-only SSH, and fail2ban are the foundation. They are not the building. This audit found 22 issues on a server that had all three configured correctly.

Docker and your firewall are probably not friends. If you run Docker behind UFW and have not configured the DOCKER-USER chain, you should check your exposed ports right now. This is the single most impactful finding for anyone running Docker on a server with a host-level firewall.

Defaults are not secure defaults. Nginx ships without security headers. SSH includes legacy algorithms. Mail servers publish unencrypted ports. Spam filters disable rate limiting. Each default makes sense for compatibility, but on a production server, compatibility with legacy protocols you do not use is just attack surface you do not need.

Configuration and runtime can diverge. A sysctl value was correct in the config file and wrong at runtime because another service overwrote it during startup. Without checking both, you would never know. Any service that modifies kernel parameters at startup can silently undo your hardening.

Backups are security infrastructure. If your server cannot be fully restored, it can be held hostage. Backup coverage is not a devops concern separate from security. It is a security concern.

Parallel agents are the right model for audits. Five focused agents covering different attack surfaces in parallel found things that a single sequential pass would have missed or deprioritized. The cross-domain coverage, where one agent checks Docker networking while another checks firewall rules, catches the gaps between responsibilities.

Security is not a state, it is a process. I ran this audit on a server I had been actively maintaining for weeks. The findings were not from neglect. They were from the natural accumulation of defaults, assumptions, and blind spots that every server develops over time. The only way to catch them is to look systematically, and then look again.

Frequently Asked Questions

Q: Can Claude Code perform a real security audit?: A: Claude Code can read configuration files, inspect running services, check file permissions, and cross-reference findings across multiple system layers. It is not a replacement for professional penetration testing, but it catches configuration drift, missing hardening, and blind spots that manual checklists miss. Running multiple focused agents in parallel gives broad coverage in minutes.

Q: Is it safe to give an AI tool access to a production server?: A: The same caution applies as with any administrative tool. Review what the agent proposes before applying changes, test configuration changes before reloading services, and keep backups current. Claude Code shows you every command before it runs, so you maintain control over what actually executes.

Q: Does Docker really bypass UFW?: A: Yes. Docker writes its own iptables rules in the FORWARD chain, which UFW does not manage. Published container ports are reachable from the internet even if UFW has no rule allowing them. The DOCKER-USER chain is the intended place to add filtering rules that Docker will respect.

Q: How long does a full server audit take with Claude Code?: A: Running five parallel agents covering different attack surfaces, the audit completed in about four minutes. Applying the fixes took longer because each change required testing and service reloads, but the discovery phase is fast.

Q: What should I audit first on my own server?: A: Start with Docker networking if you run containers behind a host firewall. Then check which ports are actually listening on 0.0.0.0 versus 127.0.0.1. Finally, review your Nginx or reverse proxy configuration for missing security headers and rate limiting. These three areas account for the majority of findings on most servers.