Deny access from certain user agents on Nginx

Access denied
When checking your nginx logs, chances are you'll see several repeated attempts to access certain url's. Sometimes these requests come from people entering wrong urls, from bots and sometimes from software automatically scanning for vulnerabilities on your site, administrative urls and so on. A small step you can take in better protecting your site is to deny these automated probes. Often they don't even bother to change the user agent from the malware.

There are a couple of measures you can take to make it less easy for bots to scan your site for vulnerabilities. First things first, you need to check or monitor your site's logs. In this case we use nginx but if you use Apache, the same measures can be taken but the actual commands will differ.

1. Check the logs

Open the nginx access log files for your site. The location will depend on the distribution you use and your nginx configuration. This is an example of "Jorgee". It's kindly identifying itself:

x.y.z.w - - [24/Oct/2017:14:17:20 +0200] "HEAD https://y.z.w.x:80/PMA2012/ HTTP/1.1" 301 0 "-" "Mozilla/5.0 Jorgee"

Jorgee is a well known vulnerability scanner.

Warning

Blocking the originating IP's is not always a good solution as it can impact users who have nothing to do with the attack.


2. Block the user agent

To block the user agent, edit the server stanza for your site:

vi /etc/nginx/sites-enabled/mysite
...
server {
    ...
    # case insensitive matching
    if ($http_user_agent ~* (netcrawl|npbot|malicious|LWP::Simple|BBBike|wget|jorgee)) {
        return 403;
    }
}

If you want to add more user agents to the ignore list, simply add a pipe symbol after jorgee and add the user agents name. Check the config and reload nginx:

nginx -t
systemctl reload nginx

Afterwards, check your site to make sure you didn't make a booboo.

Crying baby

There are some more simple measures you can take.


3. Block access on IP address

If you check the logs, you might find some scripts accessing your site on IP. Another trivial way to exclude those is to add a server stanza specifically to check for IP addresses and deny access.:

vi /etc/nginx/sites-enabled/mysite

server {
    listen 80;
    server_name "~^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$"; # catch ip address
    return 444;
}
...

Note

Mind that the return code 444 is not an official http status code but an nginx specific one. To be more in line with the standards, you could return another more appropriate http status code.

The nginx documentation on error 444:

... and a special nginx’s non-standard code 444 is returned that closes the connection.

You could return another 40* http status code, see Wikipedia for more information:

  • 400: Bad Request The server cannot or will not process the request due to an apparent client error (e.g., malformed request syntax, size too large, invalid request message framing, or deceptive request routing)

  • 401: Unauthorized Similar to 403 Forbidden, but specifically for use when authentication is required and has failed or has not yet been provided. The response must include a WWW-Authenticate header field containing a challenge applicable to the requested resource. See Basic access authentication and Digest access authentication.[33] 401 semantically means "unauthenticated",[34] i.e. the user does not have the necessary credentials. Note: Some sites issue HTTP 401 when an IP address is banned from the website (usually the website domain) and that specific address is refused permission to access a website.

  • 403: Forbidden The request was valid, but the server is refusing action. The user might not have the necessary permissions for a resource, or may need an account of some sort.

4. Default site

Lastly we can make a catchall server stanza to deal with all other requests:

$ vi /etc/nginx/conf.d/default.conf
server {
    listen 80 default_server;
    server_name _; # This is just an invalid value which will never trigger on a real hostname.
    access_log /var/log/nginx/default.access.log;
    error_log /var/log/nginx/default.error.log;
    server_name_in_redirect off;

    root  /var/www/html;

    location ^~ / {
       return 444;
    }
}

Note

to quote the nginx documentation on the underscore after server_name: There is nothing special about this name, it is just one of a myriad of invalid domain names which never intersect with any real name. Other invalid names like “--” and “!@#” may equally be used.

This is an example of a logged request from the above config:

x.y.z.w - - [24/Oct/2017:18:09:32 +0200] "\x16\x03\x01\x02\x00\x01\x00\x01\xFC\x03\x03<\xF6#y\x19\xF2\xB4-\xBFc\x0B|+\xE9,\x0FZQ\xBFqhu\xA6\xBEH\xD9\xAC\x01EX$\xDE\x00\x00\xDA\x00\x05\x00\x04\x00\x02\x00\x01\x00\x16\x003\x009\x00:\x00\x18\x005\x00" 400 166 "-" "-"

Some other measures you could take:

  • Patch your system and check for security news. Often attackers go for the easiest target. You can slow down an attack but often you can't stop it
  • Make sure you have a good and tested set of back-ups
  • Install a monitoring solution. Do not rely on manual periodical checking. Attackers have more time than you do
  • Do not expose administrative parts of your site on the internet if possible and if you have too, only allow access from certain IP's
  • Protect on the OS level. Check out stuff like SELinux, put the files on their own partition, ...
  • Implement kernel security measures
  • Intrusion detection
  • File integrety checker
  • If you don't have time to bother with the above, consider moving your site to a hosting provider
  • Accept you will never win the fight against a determined attacker. That's why you need to monitor and detect the break in. Clean up takes time and needs good back-ups. Yeah, afterwards it helps if you're able to find the hole in your system to prevent the same exploit.