Sunday, September 13 2015

Investigating SSH authorized keys across the infrastructure using MIG

One of the challenge of operating an organization like Mozilla is dealing with the heterogeneity of the platform. Each group is free to define its own operational practices, as long as they respect strong security rules. We don't centralize a lot, and when we do, we do it in a way to doesn't slow down devops.

The real challenge on the infosec side is being able to investigate infrastructures that are managed in many different ways. We look for anomalies, and one that recently received our focus is finding bad ~/.ssh/authorized_keys files.

Solving that problem involved adding some functionalities to MIG's file investigation module to assert the content of files, as well as writing a little bit of Python. Not only did this method help us find files that needed updating, but it also provided a way to assert the content of authorized_keys files moving forward.

Let's dive in.

LDAP all the things!

We have a really good LDAP database, results of tons of hard work from the folks in Mozilla IT. We use it for a lot of things, from providing a hierarchical view of Mozilla to showing your personal photo in the organization's phonebook. We also use it to store GPG Fingerprints and, what interests us today, SSH Public Keys.

LDAP users are in charge of their keys. They have an admin panel where they can add and remove keys, to facilitate regular rotations. On the infra side, Puppet pulls the public keys and writes them into the users authorized_keys files. As long as LDAP is up to date, and Puppet runs, authorized_keys files contain the proper keys.

But bugs happen, and sometimes, for various reasons, configurations don't get updated when they should be, and files go out of date. This is where we need an external mechanism to find the systems where configurations go stale, and fix them.

Asserting the content of a file

The most common way to verify the integrity of a file is by using a checksum, like a sha256sum. Unfortunately, it is very rare that a given file would always be exactly the same across the infrastructure. That is particularly true in our environment, because we often add a header with a generation date to authorized_keys files.

# HEADER: This file was autogenerated at Mon Jul 27 14:24:07 +0000 2015

That header means the checksum will change on every machine, and we cannot use a checksum approach to assert the content of a file. Instead, we need to use a regular expression.

Content regexes have been present in MIG for a while now, and are probably the most used feature in investigations. But until recently, content regexes were limited to finding things that exist in a file, such as an IOC. The file module would stop inspecting as soon as a regex matches, even skipping the rest of the file, to accelerate investigations.

To assert the content of a file, we need a different approach. The regex needs to verify that every line of a file match our expectations, and if one line does not match, that means the file has bad content.

Introducing Macroal mode

The first part of the equation is making sure that every line in a file match a given regex. In the file module, we introduced a new option called "macroal" that stands for Match All Content Regexes On All Lines. When activated, this mode tells the file module to continue reading the file until the end, and flag the file if all lines have match the content regex.

On the MIG command line, this option can be used in the file module with flag ""-macroal". It's a boolean that is off by default.

$ mig file -t local -path /etc -name "^passwd$" -content "^([A-Za-z0-9]|-|_){1,100}:x:[0-9]{1,5}:[0-9]{1,5}:.+" -macroal

The command above finds /etc/passwd and checks that all the lines in the file match the content regex "^([A-Za-z0-9]|-|_){1,100}:x:[0-9]{1,5}:[0-9]{1,5}:.+". If they do, MIG returns a positive result on the file.

In the JSON of the investigation, macroal is stored in the options of a search:

{
    "searches": {
        "s1": {
            "paths": [
                "/etc"
            ],
            "names": [
                "^passwd$"
            ],
            "contents": [
                "^([A-Za-z0-9]|-|_){1,100}:x:[0-9]{1,5}:[0-9]{1,5}:.+"
            ],
            "options": {
                "macroal": true,
                "matchall": true,
                "maxdepth": 1
            }
        }
    }
}

But finding files that match all the lines is not yet what we want. In fact, we want the exact opposite: finding files that have lines that do not match the content regex.

Introducing Mismatch mode

Another improvement we added to the file module is the mismatch mode. It's a simple feature that inverses the behavior of one or several parameters in a file search.

For example, if we know that all versions of RHEL6.6 have /usr/bin/sudo matching a given sha256, we can use the mismatch option to find instances where sudo does not match the expected checksum.

$ mig file -t "environment->>'ident' LIKE 'Red Hat Enterprise Linux Server release 6.6%'" \
> -path /usr/bin -name "^sudo$" \
> -sha256 28d18c50eb23cfd6ac8d39461d5479e19f6f1a5f6b839d34f2eeaf7ce8a3e054 \
> -mismatch sha256

Mismatch allows us to find anomalies, files that don't match our expectations. By combining Macroal and Mismatch in a file search, we can find files that have unexpected content. But we need one last piece: a content regex that can be used to inspect authorized_keys files.

Building regexes for authorized_keys files

An authorized_keys file should only contain one of three type of line:

  1. a comment line that starts with a pound "#" character
  2. an empty line, or a line full of spaces
  3. a ssh public key

Writing a regex for the first two types is easy. A comment line is "^#.+$" and an empty line is "^(\s+)?$".

Write a regex for SSH public keys isn't too complicated, but we need to take a few precautions. A pubkey entry has three section separated by a white space, and we only care about the first two section. The third one, the comments, can be discarded entirely with ".+".

Next, a few things need to be escaped in the public key, as pubkey are base64 encoded and thus include the slash "/" and plus "+" character that have special meaning in regexes.

Awk and Sed can do this very easily:

$ awk '{print $1,$2}' ~/.ssh/authorized_keys | grep -v "^#" | sed "s;\/;\\\/;g" | sed "s;\+;\\\+;g"

The result can be placed into a content regex and given to MIG.

$ mig file -path /home/jvehent/.ssh -name "^authorized_keys$" \
> -content "^((#.+)|(\\s+)|(ssh-rsa\\sAAAAB3NzaC1yc2EAA[...]yFDMZLFlVmQ==\\s.+))$" \
> -macroal -mismatch content

Or in JSON form:

{
    "searches": {
        "jvehent@mozilla.com_ssh_pubkeys": {
            "contents": [
                "^((#.+)|(\\s+)|(ssh-rsa\\sAAAAB3NzaC1yc2EAA[...]yFDMZLFlVmQ==\\s.+))$"
            ],
            "names": [
                "^authorized_keys$"
            ],
            "options": {
                "macroal": true,
                "matchall": true,
                "maxdepth": 1,
                "mismatch": [
                    "content"
                ]
            },
            "paths": [
                "/home/jvehent/.ssh"
            ]
        }
    }
}

Automating the investigation

With a several hundreds pubkeys in LDAP, it is more than necessary to automate the generation of the investigation file. We can do so with Python and a small LDAP helper library called mozlibldap.

The algorithm is very simple: iterate over active LDAP users and retrieve their public keys. Then it find their home directory from LDAP and creates a MIG file search that asserts the content of their authorized_keys file.

The investigation JSON file gets bigs very quickly (2.4MB, and ~40,000 lines), but still runs decently fast on target systems. A single system runs the whole thing in approximately 15 seconds, and since MIG is completely parallelized, running it across the infrastructure takes less than a minute.

Below is the Python script that generate the investigation in MIG's action v2 format.

#!/usr/bin/env python
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
# Copyright (c) 2015 Mozilla Corporation
# Author: jvehent@mozilla.com

# Requires:
# mozlibldap

from __future__ import print_function
import mozlibldap
import string
import json
import sys

LDAP_URL = 'ldap://someplace.at.mozilla'
LDAP_BIND_DN = 'mail=ldapreadonlyuser,o=com,dc=mozilla'
LDAP_BIND_PASSWD = "readonlyuserpassphrase"


def main():
    lcli = mozlibldap.MozLDAP(LDAP_URL, LDAP_BIND_DN, LDAP_BIND_PASSWD)
    searches = {}

    # get a list of users that have a pubkey in ldap
    users = lcli.get_all_enabled_users_attr('sshPublicKey')
    for user_attr in users:
        search = {}
        user = user_attr[0].split(',', 1)[0].split('=', 1)[1]
        print("current user: "+user, file=sys.stderr)
        keys = user_attr[1]
        if len(keys) == 0:
            continue
        contentre = '^((#.+)|(\s+)'
        for pubkey in keys['sshPublicKey']:
            if len(pubkey) < 5 or not (pubkey.startswith("ssh")):
                continue
            pubkey = string.join(pubkey.split(' ', 2)[:2], '\s')
            pubkey = pubkey.replace('/', '\/')
            pubkey = pubkey.replace('+', '\+')
            contentre += '|({pubkey}\s.+)'.format(pubkey=pubkey)
        contentre += ')$'
        search["names"] = []
        search["names"].append("^authorized_keys$")
        search["contents"] = []
        search["contents"].append(contentre)
        paths = []
        try:
            paths = get_search_paths(lcli, user)
        except:
            continue
        if not paths or len(paths) < 1:
            continue
        search["paths"] = paths
        search["options"] = {}
        search["options"]["matchall"] = True
        search["options"]["macroal"] = True
        search["options"]["maxdepth"] = 1
        search["options"]["mismatch"] = []
        search["options"]["mismatch"].append("content")
        print(json.dumps(search), file=sys.stderr)
        searches[user+"_ssh_pubkeys"] = search
    action = {}
    action["name"] = "Investigate the content of authorized_keys for LDAP users"
    action["target"] = "status='online' AND mode='daemon'"
    action["version"] = 2
    action["operations"] = []
    operation = {}
    operation["module"] = "file"
    operation["parameters"] = {}
    operation["parameters"]["searches"] = searches
    action["operations"].append(operation)
    print(json.dumps(action, indent=4, sort_keys=True))


def get_search_paths(lcli, user):
    paths = []
    res = lcli.query("mail="+user, ['homeDirectory', 'hgHome',
                                    'stageHome', 'svnHome'])
    for attr in res[0][1]:
        try:
            paths.append(res[0][1][attr][0]+"/.ssh")
        except:
            continue
    return paths


if __name__ == "__main__":
    main()

The script write the investigation JSON to stdout and needs to be redirected to a file. We can then use the MIG command line to run the investigation file.

$ ./make-pubkeys-investigation.py > /tmp/investigate_pubkeys.json
$ mig -i /tmp/investigate_pubkeys.json
[info] launching action from file, all flags are ignored
3124 agents will be targeted. ctrl+c to cancel. launching in 5 4 3 2 1 GO
Following action ID 4898767262251.status=inflight......status=completed
- 100.0% done in 34.848325918s
3124 sent, 3124 done, 3124 succeeded
server.example.net /home/bob/.ssh/authorized_keys [lastmodified:2014-05-30 04:04:45 +0000 UTC, mode:-rw-------, size:968] in search 'bob_ssh_pubkeys'
[...]
17 agent have found results

In conclusion

When maintaining the security of a large infrastructure, it is critical to separate the components that perform the configuration from the components that verify the configuration.

While MIG was written primarily as a security investigation platform, its low-level file investigation capabilities can be used to assert the integrity of configurations organization-wide.

This post shows how checks that verify the integrity of SSH Authorized Keys files can be executed using MIG. The checks are designed to consume negligible amounts of resources, and as such should be automated to run every few days in an approach that should be reused for a large amount of sensitive configuration files.

Test your infra, the same way you would test your applications!

Wednesday, August 26 2015

Hosting Go code on Github with a custom import path

We host MIG at https://github.com/mozilla/mig, but while I have tons of respect for the folks at Github, I can't guarantee that we won't use another hosting provider in the future. Telling people to import MIG packages using something of the form "github.com/mozilla/mig/<package>" bothers me, and I've been looking for a better solution for a while.

I bought the domain mig.ninja with the intention to use that as the base import path. I initially tried to use HAProxy to proxy github.com, and somewhat succeeded, but it involved a whole bunch of rewrites that were frankly ugly.

Thankfully, Russ Cox got an even better solution merged into Go 1.4 and I ended up implementing it. Here's how.


Understanding go get

When asked to fetch a package, go get does a number of checks. If the target is on a known hosting site, it fetches the data using a method that is hardcoded (git for github.com, hg for bitbucket, ...). But when using your own domain, go get has no way to know how to fetch the data. To work around that, go lets you specify the vcs method in the import path: import mig.ninja/mig.git The .git indicates to go get that git should be used to retrieve the package. It also doesn't interfere with the code: in the code that import the package, .git is ignored and the package content is accessed using mig.Something.

But that's ugly. No one wants to suffix .git to their import path. There is another, cleaner, solution that use a HTML file to tell go get where the package is located, and which protocol should be used to retrieve it. The file is served from the location of the import path. For an example, let's curl https://mig.ninja/mig:

$ curl https://mig.ninja/mig
<!DOCTYPE html>
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <meta name="go-import" content="mig.ninja/mig git https://github.com/mozilla/mig"> <meta http-equiv="refresh" content="0; url=http://mig.mozilla.org"> </head> <body> Nothing to see here; <a href="http://mig.mozilla.org">move along</a>. </body> </html>

The key here is in the <meta> tag named "go-import". When go get requests https://mig.ninja/mig, it hits that HTML file and knows that "mig.ninja/mig" must be retrieved using git from https://github.com/mozilla/mig.

One great aspect of this method, aside from removing the need for .git in the import path, is that "mig.ninja/mig" can now be hosted anywhere as long as the meta tag continues to indicate the authoritative location (in this case: github). It also works nicely with packages under the base repository, such that go get mig.ninja/mig/modules/file works as expected as long as the file is served from that location as well. Note that go get will retrieve the entire repository, not just the target package.


Serving the meta tag from HAProxy

Creating a whole web server for the sole purpose of serving an 11 lines of HTML isn't very appealing. So I reused an existing server that already hosts various things, including this blog, and is powered by HAProxy.

HAProxy can't serve files, but here's the trick, it can serve a custom response at a monitoring uri. I created a new HTTPS backend for mig.ninja that monitors /mig and serves a custom HTTP 200 response.

frontend https-in
        bind 62.210.76.92:443
        mode tcp
        tcp-request inspect-delay 5s
        tcp-request content accept if { req_ssl_hello_type 1 }
        use_backend jve_https if { req_ssl_sni -i jve.linuxwall.info }
        use_backend mig_https if { req_ssl_sni -i mig.ninja }

backend mig_https
        mode tcp
        server mig_https 127.0.0.1:1666

frontend mig_https
        bind 127.0.0.1:1666 ssl no-sslv3 no-tlsv10 crt /etc/certs/mig.ninja.bundle
        mode http
        monitor-uri /mig
        errorfile 200 /etc/haproxy/mig.ninja.200.http
        acl mig_pkg url /mig
        redirect location https://mig.ninja/mig if !mig_pkg

The configuration above uses SNI to serve multiple HTTPS domains from the same IP. When a new connection enters the https_in frontend, it inspects the server_name TLS extension and decides which backend should handle the request. If the request is for mig.ninja, it sends it to the mig_https backend, which forward it to the mig_https frontend. There, the request URI is inspect. If it matches /mig, the file mig.ninja.200.http is returned. Otherwise, a HTTP redirect is returned to send the caller back to https://mig.ninja/mig (in case a longer path, such as mig.ninja/mig/module, was requested).

mig.ninja.200.http is a complete HTTP response, with HTTP headers, body and proper carriage return. HAProxy doesn't process the file at all, it just globs it and sends it back to the client. It looks like this:

HTTP/1.1 200 OK
Cache-Control: no-cache
Connection: close
Content-Type: text/html

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<meta name="go-import" content="mig.ninja/mig git https://github.com/mozilla/mig">
<meta http-equiv="refresh" content="0; url=http://mig.mozilla.org">
</head>
<body>
Nothing to see here; <a href="http://mig.mozilla.org">move along</a>.
</body>
</html>


Building MIG

With all that in place, retrieving mig is now as easy as go get mig.ninja/mig. Icing on the cake, the clients can now be retrieved with go get as well:

$ go get -u mig.ninja/mig/client/mig
$ mig help
$ go get -u mig.ninja/mig/client/mig-console
$ mig-console

Telling everyone to use the right path

We just this in place, you can't guarantee that users of your packages won't directly reference the github import path.

Except that you can! As outlined's in rsc's document, and in go help importpath, it is possible to specify the authoritative location of a package directly in the package itself. This is done by adding a comment with the package location right next to the package name:
package mig /* import "mig.ninja/mig" */

Using the import comment, Go will enforce retrieval from the authoritative source. Other tools will use it as well, for example try accessing MIG's doc from Godoc using the github URL, and you will notice the redirection: https://godoc.org/github.com/mozilla/mig


Trust Github, it's a great service, but controlling your import path is the way to Go ;)

Sunday, July 26 2015

Using Mozilla Investigator (MIG) to detect unknown hosts

MIG is a distributed forensics framework we built at Mozilla to keep an eye on our infrastructure. MIG can run investigations on thousands of servers very quickly, and focuses on providing low-level access to remote systems, without giving the investigator access to raw data.

As I was recently presenting MIG at the DFIR Summit in Austin, someone in the audience asked if it could be used to detect unknown or rogue systems inside a network. The best way to perform that kind of detection is to watch the network, particularly for outbound connections rogue hosts or malware would establish to a C&C server. But MIG can also help the detection by inspecting ARP tables of remote systems and cross-referencing the results with local mac addresses on known systems. Any MAC address not configured on a known system is potentially a rogue agent.

First, we want to retrieve all the MAC addresses from the ARP tables of known systems. The netstat module can perform this task by looking for neighbor MACs that match regex "^[0-9a-f]", which will match anything hexadecimal.

$ mig netstat -nm "^[0-9a-f]" > /tmp/seenmacs

We store the results in /tmp/seenmacs and pull a list of unique MACs using some bash.

$ awk '{print tolower($5)}' /tmp/seenmacs | sort | uniq
00:08:00:85:0b:c2
00:0a:9c:50:b4:36
00:0a:9c:50:bc:61
00:0c:29:41:90:fb
00:0c:29:a7:41:f7
00:10:db:ff:10:00
00:10:db:ff:30:00
00:10:db:ff:f0:00
00:21:53:12:42:c1

We now want to check that every single one of the seen MAC addresses is configured on a known agent. Again, the netstat module can be used for this task, this time by querying local mac addresses with the -lm flag.

Now the list of MACs may be quite long, so instead of running one MIG query per MAC, we group them 50 by 50 using the following script:

#! /usr/bin/env bash
i=50
input=$1
output=$2
while true
do
    echo -n "mig netstat " >> $output
    for mac in $(awk '{print tolower($5)}' $1|sort|uniq|head -$i|tail -50)
    do
        echo -n "-lm $mac " >> $output
    done
    echo >> $output
    i=$((i+50))
    if [ $i -gt $(awk '{print tolower($5)}' $1|sort|uniq|wc -l) ]
    then
        exit 0
    fi
done

The script will build MIG netstat command with 50 arguments max. Invoke it with /tmp/seenmacs as argument 1, and an output file as argument 2.

$ bash /tmp/makemigmac.sh /tmp/seenmacs /tmp/migsearchmacs

/tmp/migsearchmacs now contains a number of MIG netstat commands that will search seen MAC addresses across the configured interfaces of known hosts. Run the commands and pipe the output to a results file.

$ for migcmd $(cat /tmp/migsearchmacs); do $migcmd >> /tmp/migfoundmacs; done

We now have a file with seen MAC addresses, and another one with MAC addresses configured on known systems. Doing the delta of the two is fairly easy in bash:

$ for seenmac in $(awk '{print tolower($5)}' /tmp/seenmacs|sort|uniq); do
hasseen=""; hasseen=$(grep $seenmac /tmp/migfoundmacs)
if [ "$hasseen" == "" ]; then
echo "$seenmac is not accounted for"
fi
done
00:21:59:96:75:7f is not accounted for
00:21:59:98:d5:bf is not accounted for
00:21:59:9c:c0:bf is not accounted for
00:21:59:9e:3c:3f is not accounted for
00:22:64:0e:72:71 is not accounted for
00:23:47:ca:f7:40 is not accounted for
00:25:61:d2:1b:c0 is not accounted for
00:25:b4:1c:c8:1d is not accounted for

Automating the detection

It's probably a good idea to run this procedure on a regular basis. The script below will automate the steps and produce a report you can easily email to your favorite security team.

#!/usr/bin/env bash
SEENMACS=$(mktemp)
SEARCHMACS=$(mktemp)
FOUNDMACS=$(mktemp)
echo "seen mac addresses are in $SEENMACS"
echo "search commands are in $SEARCHMACS"
echo "found mac addresses are in $FOUNDMACS"

echo "step 1: obtain all seen MAC addresses"
$(which mig) netstat -nm "^[0-9a-f]" 2>/dev/null | grep 'found neighbor mac' | awk '{print tolower($5)}' | sort | uniq > $SEENMACS

MACCOUNT=$(wc -l $SEENMACS | awk '{print $1}')
echo "$MACCOUNT MAC addresses found"

echo "step 2: build MIG commands to search for seen MAC addresses"
i=50
while true;
do
    echo -n "$i.."
    echo -n "$(which mig) netstat -e 50s " >> $SEARCHMACS
    for mac in $(cat $SEENMACS | head -$i | tail -50)
    do
        echo -n "-lm $mac " >> $SEARCHMACS
    done
    echo -n " >> $FOUNDMACS" >> $SEARCHMACS
    if [ $i -gt $MACCOUNT ]
    then
        break
    fi
    echo " 2>/dev/null &" >> $SEARCHMACS
    i=$((i+50))
done
echo
echo "step 3: search for MAC addresses configured on local interfaces"
bash $SEARCHMACS

sleep 60

echo "step 4: list unknown MAC addresses"
for seenmac in $(cat $SEENMACS)
do
    hasseen=$(grep "found local mac $seenmac" $FOUNDMACS)
    if [ "$hasseen" == "" ]; then
        echo "$seenmac is not accounted for"
    fi
done

The list of unknown MACs can then be used to investigate the endpoints. They could be switches, routers or other network devices that don't run the MIG agent. Or they could be rogue endpoints that you should keep an eye on.

Happy hunting!

Thursday, July 9 2015

You can't trust the infra; Encrypt client side!

Like most of my peers in the infosec community, I learned that good data protection requires strong infrastructure security controls. I practiced the art of network security, learned the arcanes of systems hardening and used those concepts in securing web infrastructures.

But it's 2015, and infrastructure security just doesn't cut it anymore. The cost of implementing controls continues to grow, while our capabilities keep being reduced by cloud environments that limit the perfect security world we want to live in. Cloud is good for business, but it makes infrastructure security really difficult. In the cloud, IDS/IPS aren't usable, or with very limited capabilities. DDoS protection must be done higher up in the stack because you can't access the routing layer. At rest data encryption isn't useful when the keys are stored next to the data. TLS encryption is not used inside the infrastructure because certificate management is hard, so we end up transferring cleartext userdata on massively shared network, hoping its somewhat isolated. The list of security problems we simply cannot solve with reasonable cost/complexity in cloud environments is quite long, and caused many infosec professional hours of ranting.

What about datacenters? It certainly is easier to control infrastructure security there, but ultimately the problem is the same: we're just not 100% sure of what hardware we run our systems on. SMM malwares are a reality, and we know (Thank you Snowden!) that the NSA and other security agencies have the tools to intercept hardware and install their own little spy packages.

If the ultimate goal is perfect data security, I don't think we can achieve it in the current infrastructure security landscape.

Meanwhile, users have been pushing more and more data into the web. Hackers have been hard at work to break into our services and leak that data out to the world. When it's not hackers, the sheer complexity of web infrastructures themselves has caused many a team to unintentionally press the wrong button, and post data where it shouldn't be (password leak on pastebin, anyone?). Ask around in the incident response community, and they will tell you how busy the last couple of years have been dealing with data leaks.

Heck, Amazon even automated looking into Github, a third party company, for AWS keys that infrastructure operators leak! That's like your banks watching CCTV of public transports to alert when you forget your wallet in the metro. Those incidents have become very common, and unfortunately cannot be solved by another layer of firewall.

Looking at what we host at Mozilla, it's easy to spot a small number of services that store data we absolutely never want to leak. We focus our infosec efforts on those, and with everyone's help build systems that we hope are safe enough. It's hard, and there is always that fear of missing something that could expose information from our users. In that landscape, there is one category of services that I'm just not too worried about: the ones that store data already encrypted on the client side.

Firefox Sync is a good example of such service. The data in Sync is strongly encrypted, in Firefox, before being sent to our storage servers. We (Mozilla) don't have the keys. We can't leak the keys. The worst we can do is leak encrypted blobs that probably no one has the ability to decrypt. This is a much better security control that anything else we can ever put on the infrastructure side. It just seems right.

Designing services that encrypt data on the client is the next challenge of information security. It requires that infosec folks work closely with developers, when most of their time is currently spent with sysadmins. It also changes the skillset we need to do our job, and focuses more on a strong understanding of cryptography. Not just SSL/TLS, but crypto algorithms themselves. Javascript is getting better and APIs like WebCrypto or libraries like OpenPGPJS are the way forward to implement client-side encryption. Key management is almost irrelevant if we accept that keys should be derivated from user passwords, like Firefox Accounts did.

Client-side encryption has the added benefit of empowering the user and making them responsible for the security of their data. It's not realistic to expect every business that operates a web service to run like a bank. But it is realistic to expect individuals to care about the security of their own photos, videos, emails, conversations and browsing history. Most users already do care and would welcome more control on their data. Business people, however, are the ones that are hard to convince, because they love looking inside all that data and building dashboard and graphs, and designing fancy statistical models to boost marketing and convertion rates. Note that those things can still be done, but client side (that's how our Directory Tiles advertising service operates).

We have seen with Lavabit that client-side encryption does not reduce the attack surface to zero. A government can still force a service operator to change the client code to retrieve decrypted data. But the cost of attacking a service that way is immensely larger than simply breaking into a database server.

So, should we get rid of our firewall and encrypt everything with javascript? No! Absolutely not. Infrastructure security remains an important component of any infosec strategy, but it has reached a plateau and we need to look for new techniques to continue improving our posture. Cloud providers help streamline the management of firewall rules and network security policies. DevOps practices with VMs and containers help isolate and rotate services quickly. All those things are important but have not solved the data risk in its entirety. Client-side encryption is the next step.

In the future, I'd like to see web services default to HTTPS and use Javascript (or anything else) to encrypt data before handing it over to services that have grown too large, too complex and too cheap to secure perfectly. How we do this, is very much left as an open question.

Wednesday, March 11 2015

10 years of self-hosting Linuxwall.info

On March 11th, 2005, a small group of nerds studying in the overly boring city of Blois, France, decided to buy a domain to play with self hosting DNS and email.

pict2331.jpg

10 years later, linuxwall.info is still (mostly) self-hosted and we are (mostly) nerds. It has been, and still is, a lot of fun and I believe the learning process has to progress two to three times faster than my peers who don't self-host.

(on the left: probably the first server to host linuxwall.info, back in my apartment in Blois)

10 years is probably long enough of an experiment to draw lessons on the state of self-hosting. So let's start with the conclusion: self-hosting is entirely possible, but only if you invest the time and energy to do it right. There is no good way to self-host without spending time on it. Symptoms of poorly done self-hosting include loss of email, website headaches, connectivity issues and angry spouse.

What follows is a rapid overview of what I learned doing self-hosting for a decade, starting with the most frustrating aspect of it...

 

Don't count on your ISP for help

Over the last 10 years, the people who own the pipes have made absolutely zero effort to facilitate self-hosting. I've hosted linuxwall.info on Free.fr, Neuf/SFR, Comcast and Verizon. The only one who provided me with a static IP was Free.fr. Comcast went as far as blocking tcp/25 inbound without providing a way to disable the blocking. ISPs are completely uncooperative with self-hosters, and for no particular reason other than pushing people toward "business" class of bandwidth (same exact thing, but with a static IP and an extra $100 on your monthly bill).

Uplink is probably what self-hosters fear the most. Will it be enough bandwidth? Will it slow down the Internet for the rest of the house? Here is the truth: you need almost no bandwidth to self-host. For many years, linuxwall.info was hosted with 128kbps of uplink, and it worked fine. The handful of inbound email, http connections, xmpp chats or DNS requests will fit into a tiny percentage of your uplink, and you won't even notice it. Just make sure you don't run offsite backups while your spouse is watching netflix... (or use QoS).

 

Self-hosters need friends

Don't self-host alone, that won't work. The chances of your internet connection going down are too great. Self-hosting must be done the way the internet was built: if one endpoint goes down, there must be another endpoint that takes over. For DNS, that's easy enough with slaves in multiple locations. For web or email, it's more difficult.linuxwall_crew_2005.jpg

(on the right: the linuxwall.info crew: steph, franck, christophe, myself and jerome, circa 2005)

Hosting email servers is definitely the hardest, because building an IMAP distributed cluster with something like dovecot or cyrus-imap is far from trivial. However, it is fairly easy to build a secondary MX with postfix that buffers inbound emails when the primary is down, and just forwards them to the primary when it comes back online.

That's how linuxwall's MX operates. smtp.linuxwall.info is a primary email server (described here), and if it goes down, smtp2.linuxwall.info will receive the inbound mail, cache it, and forward it to the primary when back online. SMTP2 is really just running Postfix with a very basic transport configuration, so we avoid the madness of synchronizing IMAP datastores across servers.

$ dig +short MX linuxwall.info
20 smtp2.linuxwall.info.
10 smtp.linuxwall.info.
root@smtp2:/etc/postfix # cat transport
linuxwall.info  relay:smtp.linuxwall.info:25

DNS is another one that Nowadays, linuxwall's DNS servers host more than one domain. Besides the main one, there is 1nw.eu, chatonly.org, frimousse-action.com, insecure.ws, lesroutesduchocolat.{fr,com}, tzib.net, necto.org and a few more. The root DNS is hosted in my house, but on a dynamic IP, so it is only used as an authority to the 3 slaves that have static IPs. DNS outages are rare because of the distributed nature of the platform, but it does require that several people participate in the network. Don't self-host alone!

linuxwall_dns.svg

 

Do hard things and write about it

DNS, Email, XMPP, distributed storage, VPN, public websites, ... The hardest things to self-host are also the most interesting. After many years of internet presence, I have enough public websites to warrant paying for a hosted server at online.net. Most of the critical parts of my infra (email, dns slave, websites) are now hosted on linux containers on that host. For a while though, everything that touched the linuxwall.info domain was hosted on personal ADSL connections (the french equivalent to cable).

Linuxwall.info was created to experiment with all sorts of cool technologies that we simply didn't have access to as students. One of the goal was to write about these technologies, and logically the first site we created was http://wiki.linuxwall.info. Over the years, we've written about dozens of tools, setups, successes and failures and some of these articles have become quite popular:

As it turns out, most of the tech we experimented with is identical to what is deployed in "professional" environments. Same Apache conf, same postfix, same VPN, same DKIM, same RAID, etc... Sometimes, self-hosting is even more advanced that what you'd find in your typical LAMP stack at work, because the constraints on hosting stuff at home require a good amount of engineering creativity.

Most certainly, self-hosting helped me become a better engineer. It helped me acquire experience much more quickly than if I had waited for exposure at work. It does come with the cost of many and many weekends spent deploying, fixing, upgrading and operating an infrastructure that could just as well been hosted elsewhere. For free. But without the satisfaction or making it all work myself.

 

Control the traffic

QoS is perhaps the most interesting networking challenge a self-hoster can take upon. I have spent hours tweaking the QoS of my gateway server, and wrote a lengthy analysis of Linux's QoS stack at Journey to the Center of the Linux Kernel: Traffic Control, Shaping and QoS. Studying QoS not only helps understand the details of network protocols, but also made me appreciate minimalistic designs: almost all network services that run on the interbits_sent-1week.pngnet today can be hosted with just a few kilobytes per seconds of bandwidth. The myth of gigabits internet connections for residences is built by ISPs as a marketing tool. Even at pick times, self-hosting at home while doing video-conferences does not make my uplink use more than one megabit of traffic. And I verizon pay for 25 of those!

(on the left: a week of uplink from my house, using up to 720kbps)

I can't remember a time when I filled up an uplink during normal operations. Sure, there are times when a large uplink is pleasant - for example when recovering 20GB of email backups from a friend's server - but 99% of the time a basic QoS policy will ensure that your services have the necessary bandwidth without annoying all the users in the house.

 

The takeaway

Most people get exhausted of self-hosting after a few years, give up and move everything to "the cloud". We haven't, and it's been 10 years. The secret is to take the time to do things right, and not do it alone.

For sure, there will be times of reading your emails directly from the cache of Postfix, because the IMAP server is down Or times when you arrive at a nice vacation spot only to realize that the IP of your home server has changed, and you can't update the DNS for a week. Those times suck. But they also help you think about reliability and high-availability. Stuff that, once you've acquired the knowledge, people will pay a lot of money for.

tl;dr: self-hosting is awesome. 10 years in and going strong, Linuxwall.info is reading for the next 10!

Screenshot_-_03112015_-_11_41_57_AM.png

- page 3 of 35 -