One of the challenge of operating an organization like Mozilla is dealing with the heterogeneity of the platform. Each group is free to define its own operational practices, as long as they respect strong security rules. We don't centralize a lot, and when we do, we do it in a way to doesn't slow down devops.

The real challenge on the infosec side is being able to investigate infrastructures that are managed in many different ways. We look for anomalies, and one that recently received our focus is finding bad ~/.ssh/authorized_keys files.

Solving that problem involved adding some functionalities to MIG's file investigation module to assert the content of files, as well as writing a little bit of Python. Not only did this method help us find files that needed updating, but it also provided a way to assert the content of authorized_keys files moving forward.

Let's dive in.

LDAP all the things!

We have a really good LDAP database, results of tons of hard work from the folks in Mozilla IT. We use it for a lot of things, from providing a hierarchical view of Mozilla to showing your personal photo in the organization's phonebook. We also use it to store GPG Fingerprints and, what interests us today, SSH Public Keys.

LDAP users are in charge of their keys. They have an admin panel where they can add and remove keys, to facilitate regular rotations. On the infra side, Puppet pulls the public keys and writes them into the users authorized_keys files. As long as LDAP is up to date, and Puppet runs, authorized_keys files contain the proper keys.

But bugs happen, and sometimes, for various reasons, configurations don't get updated when they should be, and files go out of date. This is where we need an external mechanism to find the systems where configurations go stale, and fix them.

Asserting the content of a file

The most common way to verify the integrity of a file is by using a checksum, like a sha256sum. Unfortunately, it is very rare that a given file would always be exactly the same across the infrastructure. That is particularly true in our environment, because we often add a header with a generation date to authorized_keys files.

# HEADER: This file was autogenerated at Mon Jul 27 14:24:07 +0000 2015

That header means the checksum will change on every machine, and we cannot use a checksum approach to assert the content of a file. Instead, we need to use a regular expression.

Content regexes have been present in MIG for a while now, and are probably the most used feature in investigations. But until recently, content regexes were limited to finding things that exist in a file, such as an IOC. The file module would stop inspecting as soon as a regex matches, even skipping the rest of the file, to accelerate investigations.

To assert the content of a file, we need a different approach. The regex needs to verify that every line of a file match our expectations, and if one line does not match, that means the file has bad content.

Introducing Macroal mode

The first part of the equation is making sure that every line in a file match a given regex. In the file module, we introduced a new option called "macroal" that stands for Match All Content Regexes On All Lines. When activated, this mode tells the file module to continue reading the file until the end, and flag the file if all lines have match the content regex.

On the MIG command line, this option can be used in the file module with flag ""-macroal". It's a boolean that is off by default.

$ mig file -t local -path /etc -name "^passwd$" -content "^([A-Za-z0-9]|-|_){1,100}:x:[0-9]{1,5}:[0-9]{1,5}:.+" -macroal

The command above finds /etc/passwd and checks that all the lines in the file match the content regex "^([A-Za-z0-9]|-|_){1,100}:x:[0-9]{1,5}:[0-9]{1,5}:.+". If they do, MIG returns a positive result on the file.

In the JSON of the investigation, macroal is stored in the options of a search:

{
    "searches": {
        "s1": {
            "paths": [
                "/etc"
            ],
            "names": [
                "^passwd$"
            ],
            "contents": [
                "^([A-Za-z0-9]|-|_){1,100}:x:[0-9]{1,5}:[0-9]{1,5}:.+"
            ],
            "options": {
                "macroal": true,
                "matchall": true,
                "maxdepth": 1
            }
        }
    }
}

But finding files that match all the lines is not yet what we want. In fact, we want the exact opposite: finding files that have lines that do not match the content regex.

Introducing Mismatch mode

Another improvement we added to the file module is the mismatch mode. It's a simple feature that inverses the behavior of one or several parameters in a file search.

For example, if we know that all versions of RHEL6.6 have /usr/bin/sudo matching a given sha256, we can use the mismatch option to find instances where sudo does not match the expected checksum.

$ mig file -t "environment->>'ident' LIKE 'Red Hat Enterprise Linux Server release 6.6%'" \
> -path /usr/bin -name "^sudo$" \
> -sha256 28d18c50eb23cfd6ac8d39461d5479e19f6f1a5f6b839d34f2eeaf7ce8a3e054 \
> -mismatch sha256

Mismatch allows us to find anomalies, files that don't match our expectations. By combining Macroal and Mismatch in a file search, we can find files that have unexpected content. But we need one last piece: a content regex that can be used to inspect authorized_keys files.

Building regexes for authorized_keys files

An authorized_keys file should only contain one of three type of line:

  1. a comment line that starts with a pound "#" character
  2. an empty line, or a line full of spaces
  3. a ssh public key

Writing a regex for the first two types is easy. A comment line is "^#.+$" and an empty line is "^(\s+)?$".

Write a regex for SSH public keys isn't too complicated, but we need to take a few precautions. A pubkey entry has three section separated by a white space, and we only care about the first two section. The third one, the comments, can be discarded entirely with ".+".

Next, a few things need to be escaped in the public key, as pubkey are base64 encoded and thus include the slash "/" and plus "+" character that have special meaning in regexes.

Awk and Sed can do this very easily:

$ awk '{print $1,$2}' ~/.ssh/authorized_keys | grep -v "^#" | sed "s;\/;\\\/;g" | sed "s;\+;\\\+;g"

The result can be placed into a content regex and given to MIG.

$ mig file -path /home/jvehent/.ssh -name "^authorized_keys$" \
> -content "^((#.+)|(\\s+)|(ssh-rsa\\sAAAAB3NzaC1yc2EAA[...]yFDMZLFlVmQ==\\s.+))$" \
> -macroal -mismatch content

Or in JSON form:

{
    "searches": {
        "jvehent@mozilla.com_ssh_pubkeys": {
            "contents": [
                "^((#.+)|(\\s+)|(ssh-rsa\\sAAAAB3NzaC1yc2EAA[...]yFDMZLFlVmQ==\\s.+))$"
            ],
            "names": [
                "^authorized_keys$"
            ],
            "options": {
                "macroal": true,
                "matchall": true,
                "maxdepth": 1,
                "mismatch": [
                    "content"
                ]
            },
            "paths": [
                "/home/jvehent/.ssh"
            ]
        }
    }
}

Automating the investigation

With a several hundreds pubkeys in LDAP, it is more than necessary to automate the generation of the investigation file. We can do so with Python and a small LDAP helper library called mozlibldap.

The algorithm is very simple: iterate over active LDAP users and retrieve their public keys. Then it find their home directory from LDAP and creates a MIG file search that asserts the content of their authorized_keys file.

The investigation JSON file gets bigs very quickly (2.4MB, and ~40,000 lines), but still runs decently fast on target systems. A single system runs the whole thing in approximately 15 seconds, and since MIG is completely parallelized, running it across the infrastructure takes less than a minute.

Below is the Python script that generate the investigation in MIG's action v2 format.

#!/usr/bin/env python
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
# Copyright (c) 2015 Mozilla Corporation
# Author: jvehent@mozilla.com

# Requires:
# mozlibldap

from __future__ import print_function
import mozlibldap
import string
import json
import sys

LDAP_URL = 'ldap://someplace.at.mozilla'
LDAP_BIND_DN = 'mail=ldapreadonlyuser,o=com,dc=mozilla'
LDAP_BIND_PASSWD = "readonlyuserpassphrase"


def main():
    lcli = mozlibldap.MozLDAP(LDAP_URL, LDAP_BIND_DN, LDAP_BIND_PASSWD)
    searches = {}

    # get a list of users that have a pubkey in ldap
    users = lcli.get_all_enabled_users_attr('sshPublicKey')
    for user_attr in users:
        search = {}
        user = user_attr[0].split(',', 1)[0].split('=', 1)[1]
        print("current user: "+user, file=sys.stderr)
        keys = user_attr[1]
        if len(keys) == 0:
            continue
        contentre = '^((#.+)|(\s+)'
        for pubkey in keys['sshPublicKey']:
            if len(pubkey) < 5 or not (pubkey.startswith("ssh")):
                continue
            pubkey = string.join(pubkey.split(' ', 2)[:2], '\s')
            pubkey = pubkey.replace('/', '\/')
            pubkey = pubkey.replace('+', '\+')
            contentre += '|({pubkey}\s.+)'.format(pubkey=pubkey)
        contentre += ')$'
        search["names"] = []
        search["names"].append("^authorized_keys$")
        search["contents"] = []
        search["contents"].append(contentre)
        paths = []
        try:
            paths = get_search_paths(lcli, user)
        except:
            continue
        if not paths or len(paths) < 1:
            continue
        search["paths"] = paths
        search["options"] = {}
        search["options"]["matchall"] = True
        search["options"]["macroal"] = True
        search["options"]["maxdepth"] = 1
        search["options"]["mismatch"] = []
        search["options"]["mismatch"].append("content")
        print(json.dumps(search), file=sys.stderr)
        searches[user+"_ssh_pubkeys"] = search
    action = {}
    action["name"] = "Investigate the content of authorized_keys for LDAP users"
    action["target"] = "status='online' AND mode='daemon'"
    action["version"] = 2
    action["operations"] = []
    operation = {}
    operation["module"] = "file"
    operation["parameters"] = {}
    operation["parameters"]["searches"] = searches
    action["operations"].append(operation)
    print(json.dumps(action, indent=4, sort_keys=True))


def get_search_paths(lcli, user):
    paths = []
    res = lcli.query("mail="+user, ['homeDirectory', 'hgHome',
                                    'stageHome', 'svnHome'])
    for attr in res[0][1]:
        try:
            paths.append(res[0][1][attr][0]+"/.ssh")
        except:
            continue
    return paths


if __name__ == "__main__":
    main()

The script write the investigation JSON to stdout and needs to be redirected to a file. We can then use the MIG command line to run the investigation file.

$ ./make-pubkeys-investigation.py > /tmp/investigate_pubkeys.json
$ mig -i /tmp/investigate_pubkeys.json
[info] launching action from file, all flags are ignored
3124 agents will be targeted. ctrl+c to cancel. launching in 5 4 3 2 1 GO
Following action ID 4898767262251.status=inflight......status=completed
- 100.0% done in 34.848325918s
3124 sent, 3124 done, 3124 succeeded
server.example.net /home/bob/.ssh/authorized_keys [lastmodified:2014-05-30 04:04:45 +0000 UTC, mode:-rw-------, size:968] in search 'bob_ssh_pubkeys'
[...]
17 agent have found results

In conclusion

When maintaining the security of a large infrastructure, it is critical to separate the components that perform the configuration from the components that verify the configuration.

While MIG was written primarily as a security investigation platform, its low-level file investigation capabilities can be used to assert the integrity of configurations organization-wide.

This post shows how checks that verify the integrity of SSH Authorized Keys files can be executed using MIG. The checks are designed to consume negligible amounts of resources, and as such should be automated to run every few days in an approach that should be reused for a large amount of sensitive configuration files.

Test your infra, the same way you would test your applications!