pxelinux.cfg/* versus RCS

I’m a fan of version control in systems administration. If you don’t have a central VCS for your server configuration files, you can always use RCS. I habitually add #$Id$ at the top of configuration files, so I can easily see who touched this file last and when.

On an unrelated note, I’m upgrading my virtualization cluster to Ubuntu 10.10. The worker nodes run diskless. Each diskless node reads a configuration file over TFTP. Mine looked like the following:

#$Id$

LABEL linux
KERNEL vmlinuz-2.6.35-27-server
APPEND root=/dev/nfs initrd=initrd.img-2.6.35-27-server-pxe nfsroot=192.0.2.2:/data1/imagine,noacl ip=dhcp rw
TIMEOUT 0

This has worked fine for a year or so now, with me changing the kernel and initrd versions as I upgraded. With the Ubuntu 10.10 update, however, some pieces of hardware wouldn’t reboot. Most booted fine, but a few didn’t come back up again.

This is notably annoying because the hardware is in a remote datacenter. Driving out to view the console messages burns an hour and, more annoyingly, requires that I stir my lazy carcass out of my house. I have a serial console on one of the machines, but not on the affected one. Fortunately, I do have remote power, and I can make changes on the diskless filesystem.

Packet sniffing revealed that the machine successfully made a TFTP request, then just… stopped. This exact same configuration and filesystem worked on other machines, however. Except that the affected machines all had #$Id$ on the first line of their pxelinux.cfg file, and machines that booted successfully didn’t.

That shouldn’t matter. Really, it shouldn’t. pxelinux.cfg files accept comments. But I removed the tag, making the first line the LABEL statement, and power cycled the machine. And it came up perfectly.

Apparently this particular rev of Linux PXE is incompatible with version control ID tags. Oh joy, oh rapture!

diskless ubuntu serial console

I’m using Ubuntu servers with qemu-kvm as a virtualization solution. The software included in 10.04LTS includes a variety of annoyances, such as broken PXE, odd bridge behavior, and “general weirdness.” Although 10.10 is not supported in the long term, I decided to try it.

The good news is, the 10.10 virtualization stack works much better. The bad news is, 10.10 didn’t want to run on my diskless hardware. Boot attempts all died with many lines of:

ipconfig: no devices to configure

and a message about killing init. The server was quite explicit that it was dead, and how it was dying, but didn’t leave any clues as to what had killed it. I’m sure that the console showed useful error messages, but they had scrolled off the top of the screen.

The manual says that if you hit shift-PageUp, Ubuntu should page up through the console messages. That should be amended to read “unless init is dead and your keyboard LEDs are blinking slowly but steadily.”

The only way to resolve this problem is to see the error messages that say why the machine crashed. So, a serial console. I want PXE messages, initrd messages, and kernel boot messages sent to serial console. These are all controlled by the /tftpboot/pxelinux.cfg/machine file. The actual file name is the MAC address of the booting NIC.

If you want to get messages from the pxe and initrd boot stages, the pxelinux.cfg file’s first line must include the SERIAL statement. If you want to get console messages from the booting kernel and/or log into the running system over the serial console, you must append a serial statement to the kernel boot command. The end result for a serial console looks like this:

SERIAL 0 115200
LABEL linux
KERNEL vmlinuz-2.6.35-27-server
APPEND root=/dev/nfs initrd=initrd.img-2.6.35-27-server-pxe nfsroot=192.0.2.1:/nfsroot ip=dhcp rw console=tty0 console=ttyS0,115200n8
TIMEOUT 0

The Web site will probably wrap the APPEND statement around, but that line and everything beneath it down to TIMEOUT is a single line.

If you want a serial login in multiuser mode, you need to create a script to activate the terminal. Here’s the Ubuntu default terminal script:
/etc/init/ttyS0.conf

# ttyS0 - getty
#
# This service maintains a getty on ttyS0 from the point the system is
# started until it is shut down again.

start on stopped rc RUNLEVEL=[2345]
stop on runlevel [!2345]

respawn
exec /sbin/getty -L 115200 ttyS0 vt102

The next time you reboot your diskless box, you should have a full serial console.

Some time soon, more on the actual error and how I fixed it.

DNS DDos of the Day

My phone got a call recently from a systems administrator whose network was under attack. I was busy getting my twice-weekly dose of humility, but a couple hours later, my phone delivered the message.

The attacker was flooding their primary DNS server with requests for isc.org. This is a not-uncommon attack. As DDos attacks go, it’s not terribly effective; it can overwhelm the DNS server’s resources, but doesn’t utterly destroy the victim’s network. You can easily defend against this by controlling which hosts can perform recursive lookups on your server.

This particular sysadmin was running a DNS server that didn’t permit access control for recursive lookups. It ran fine for years, until someone wanted to attack it, much as your house doesn’t need a lock on the door until someone tries to break in. We discussed various ways he could blunt the attack, and a strategy for moving to a public-facing DNS server that supported access control lists.

I could start with “here’s a nickel, kid, go buy a better operating system.” But that’s not exactly helpful. A lot of Unix sysadmins are just as guilty of offering insecure services on their networks, thinking that nobody is going to attack their petty little operation. But you never know when you’ll anger some dweeb who cannot express their emotions in any way other than clicking a few buttons and giggling. This particular sysadmin had run his server for years without difficulty. But you only need to lock your car when someone tries to steal it.

If you’re running a DNS server, use one that supports ACLs. I’ve written about unbound as a recursive DNS server. Or, if you’re running BIND, you can use an ACL:

options {
...
allow-recursion {our_stuff; };
};

acl "our_stuff" {
192.0.2.0/24;
};

Poof! Recursion attacks are stopped.

Nobody wants to attack you? Nobody will EVER want to attack you? You are such an awesome human being that you will never accidentally annoy someone? Fine. I believe you. Wholeheartedly But did you know that open DNS resolvers can be used to amplify DNS-based DDos attacks? And these attacks are growing more common? And that a large number of Internet appliances have open resolvers? Do you issue those devices to your clients? Open resolvers are the new open mail relays.

Today is a good day to check your network for open resolvers. Or you can use a free shell account to run dig against your servers. Check your appliances, too.

This principle applies to services other than SSH, of course. Use keys to authenticate via SSH, or at least restrict the IP addresses that can log in via passwords. Apply your patches regularly. Think about what you’d do if you were under attack, and the points on your network where you could defend. You probably already know about some security holes on your network. Quit playing Angry Birds and go fix them.

But if you run an open resolver, you are ruining another sysadmin’s weekend.

OpenLDAP search filters

I use LDAP authentication on several Web servers. For the first time, I have a Web application that I want to open to customers as well as staff. Usually, I just put the users into a group. Apache validates the password against LDAP and checks for group membership, and either accepts or rejects the request. The relevant Apache configuration looks like this:

AuthLDAPURL “ldap://ldap1.domain.com/ou=people,dc=domain,dc=com” STARTTLS
AuthLDAPGroupAttribute memberUid
require ldap-group cn=groupname,ou=groups,dc=domain,dc=com

Apache requires that I specify where to look for accounts, as shown in bold above. My customers are in a different OU than my coworkers. (It would have made more sense to name the “people” container “staff,” but I didn’t realize that at the time.) Apache will accept a filter in AuthLDAPURL, letting you check in multiple groups. I’ve never taken the time to understand LDAP filters, so I guess I better start now. I’ll write my first filters for ldapsearch(1), and then carry them over to Apache.

Normally, I run ldapsearch like so:

# ldapsearch -WxZD "cn=manager,dc=domain,dc=com"
Enter LDAP Password:

-W tells ldapsearch to ask for a password, -x sets simple auth, -Z toggles startTLS, and -D indicates a bind DN follows. While I have an inherent dislike of typing a password on the command line, I’m going to run many LDAP searches in quick succession on a test machine. My test machine doesn’t use the same password as my production environment, so I’m willing to make an exception for convenience. Drop the -W, and add the password with -w. Specify the password in quotes to escape symbols and such.

# ldapsearch -xZD "cn=manager,dc=domain,dc=com" -w "password"

You should get a dump of your LDAP directory.

Now to build up a filter iteratively, figuring out how they work as we go. ldapsearch expects the filter to be the last item on the command line. Put it in quotes, to escape special characters.

# ldapsearch -xZD "cn=manager,dc=domain,dc=com" -w "password" "(uid=mwlucas)"

This returns only my user account, as I would expect. Now let’s search for one of two accounts, joined by an OR. I’m going to stop including the entire command line, and only list the filter at the end.

"(|(uid=mwlucas)(uid=mwlucas2))"

The OR operator is a pipe symbol (|). It’s followed by the two possible choices, each in parenthesis. This filter matches any entry where the uid is either mwlucas or mwlucas2. I get information for two accounts back.

Similarly, I can search for a group by CN as well as a username. I want to see everything with a UID of “mwlucas” or matching the CN “cacti”.

"(|(uid=mwlucas)(cn=cacti))"

Entries for my account and this group appear.

About this time I realize that I can probably fix my Apache problem by removing the ou=people entry in AuthLDAPURL, giving me:

AuthLDAPURL “ldap://ldap1.domain.com/dc=domain,dc=com” STARTTLS

I try it and, yes, users from both OUs can now log in. But I'm going to learn about search filters, anyway.

I can use two additional logical operators, AND (&) and NOT(!).

Also, filters support wildcards. For example, here I want to see all accounts that have the initials "mwl" in them. I've created more than one test account, and want to be sure that I remember all of them.

(uid=*mwl*)

That generates a lot of output, though. I'm more interested in a list of UIDs. If you specify an attribute after the filter, ldapsearch will only print that attribute. Here's the whole command string for this search.

# ldapsearch -xZD "cn=manager,dc=domain,dc=com" -w "password" "(uid=*mwl*)" uid
...ldap internal stuff deleted...
# mwlucas, people, domain.com
dn: uid=mwlucas,ou=people,dc=domain,dc=com
uid: mwlucas

# mwltest, people, domain.com
dn: uid=mwltest,ou=people,dc=domain,dc=com
uid: mwltest

# mwlstaff, people, domain.com
dn: uid=mwlstaff,ou=people,dc=domain,dc=com
uid: mwlstaff

# mwlucas2, customers, domain.com
dn: uid=mwlucas2,ou=customers,dc=domain,dc=com
uid: mwltest2

That's enough filtering to make my day-to-day life easier, so I'll get back to the problem I'm really trying to solve today.

Adding IPv6 to a FreeBSD Mail/Web Server

We’ve run out of IPv4 addresses. If you’re not already on IPv6, start hoarding gasoline and canned potted meat food product. Doomsday is here, film at eleven. Or, failing that, start running IPv6 on something so you can have a little familiarity with the new Internet protocol before you absolutely must. My personal FreeBSD 9 server (which hosts my email, this blog, web sites for my books, and a whole bunch of other equally trivial cruft) is now IPv6-enabled, even though the local site doesn’t have IPv6 connectivity. Here’s how I did it.

Establishing IPv6 connectivity to and from an IPv4-only server breaks requires:

  • Get an IPv6 tunnel from a tunnel provider
  • Configure a generic IPv4 tunnel to the tunnel provider
  • Assign IPv6 addresses to your IPv4 generic tunnel
  • Assign your IPv6 default route over the tunnel
  • Establish IPv6 DNS resolution
  • Configure services to run on IPv6
  • Offer IPv6 DNS records

    If you’re reading this , you probably don’t have IPv6 at your facility. You’ll need an IPv6 tunnel, offered for free by many providers. I used Hurricane Electric, but use any broker you like. Sign up for an account, respond to the verification mail, and request a tunnel. The Web interface will give you a bunch of details about your tunnel.

    The gif interface provides a generic IPv4 tunnel that can be used for many protocols. Configuring an IPv4 tunnel requires only the IP addresses on each end. ifconfig(8) creates a tunnel with just:

    # ifconfig gif0 tunnel 198.22.63.8 209.51.181.2

    You must be able to ping the tunnel’s remote address.

    Now assign IPv6 addresses to your gif0 tunnel.

    # ifconfig gif0 inet6 your-IPv6-address remote-IPv6-address prefixlen 128

    For example, my HE-assigned IPv6 tunnel endpoint is 2001:470:1f10:b9c::2. The he.net IPv6 address is 2001:470:1f10:b9c::1. I assign my IPv6 addresses as:

    # ifconfig gif0 inet6 2001:470:1f10:b9c::2 2001:470:1f10:b9c::1 prefixlen 128

    Verify that your IPv6 addresses are correctly configured by using ping6 to hit the far end. Remember, standard ping will not work — ping is specific to IPv4.

    # ping6 2001:470:1f10:b9c::1
    PING6(56=40+8+8 bytes) 2001:470:1f10:b9c::2 –> 2001:470:1f10:b9c::1
    16 bytes from 2001:470:1f10:b9c::1, icmp_seq=0 hlim=64 time=19.209 ms
    16 bytes from 2001:470:1f10:b9c::1, icmp_seq=1 hlim=64 time=21.661 ms

    At this point, you have IPv6. Now assign the IPv6 default route to the remote end of the tunnel.

    # route -n add -inet6 default 2001:470:1f10:b9c::1

    Your server will now send all IPv6 traffic across your IPv4 tunnel, while still routing IPv4 traffic as usual. Remember, IPv4 and IPv6 are different protocols.

    Some Internet sites, such as Google, have special requirements for accessing their IPv6 DNS. Your tunnel broker provides an IPv6-aware DNS server. Now that you have a default route, see if you can ping6 it. If you can ping the DNS server, edit /etc/resolv.conf. Remove your IPv4 nameservers. Add the IPv6 nameserver. Check DNS for IPv4 (A records) and IPv6 (AAAA records) with dig(1).

    # dig www.google.com A

    ;; ANSWER SECTION:
    www.google.com. 20478 IN CNAME www.l.google.com.
    www.l.google.com. 222 IN A 209.85.225.99
    www.l.google.com. 222 IN A 209.85.225.147
    www.l.google.com. 222 IN A 209.85.225.104
    www.l.google.com. 222 IN A 209.85.225.105
    www.l.google.com. 222 IN A 209.85.225.103
    www.l.google.com. 222 IN A 209.85.225.106

    This looks correct. Let’s try AAAA records.

    # dig www.google.com AAAA

    www.google.com. 20368 IN CNAME www.l.google.com.
    www.l.google.com. 180 IN AAAA 2001:4860:b007::63

    This is an IPv6 answer. Google has fewer IPv6 servers than IPv4 servers, but that’s to be expected these days.

    Now configure services on your server to listen on IPv6 addresses. Daemons included in FreeBSD listen to IPv6 by default. Run sockstat -6 to see what programs are listening to your new IPv6 address. In my case, Apache only listened to IPv4. At some point in the foggy past, I had turned off IPv6 when configuring the port. I rebuilt devel/apr1 and www/apache22 with IPv6 support, restarted Apache, and it listened to my IPv6 address without issue.

    Last, you must publish AAAA records for the hosts you want to offer over IPv6. By gradually adding AAAA records, you can slowly increase the amount of traffic you deliver over IPv6, letting your your IPv6 traffic grow slowly.

    www IN A 198.22.63.8
    www IN AAAA 2001:470:1f10:b9c::2

    Properly-configured hosts will attempt to connect to services on IPv6 first. If those connection attempts fail, they will try IPv4 instead.

    To make your FreeBSD changes permanent, use your addresses in the /etc/rc.conf entries below.

    gif_interfaces=”gif0″
    gifconfig_gif0=”198.22.63.8 209.51.181.2″
    ipv6_network_interfaces=”gif0 lo0″
    ifconfig_gif0_ipv6=”inet6 2001:470:1f10:b9c::2 2001:470:1f10:b9c::1 prefixlen 128″
    ipv6_defaultrouter=”2001:470:1f10:b9c::1″

    Lastly, tell your users that you have IPv6. Otherwise, nobody will notice. It’s that transparent.

  • tracking latency, loss, and jitter with SmokePing

    Most network monitoring tools retry failed connections. snmpwalk sends multiple SNMP queries, giving the agent multiple chances to respond. Nagios lets you configure how often you retry queries, and specifically delays alarms to avoid transient issues. You do not want your pager going off at 3AM because something dropped a single packet! Losing a packet or two on occasion is fine, but losing one or two every time you run a check is a problem — and most monitoring tools can’t tell the difference. Don’t just crank up your monitoring software’s loss tolerance. You must know how often your network drops requests. That’s where SmokePing comes in. SmokePing measures loss, latency, and jitter for ICMP and application-level requests.

    SmokePing is in the FreeBSD ports as /usr/ports/net-mgmt/smokeping, OpenBSD ports as /usr/ports/net/smokeping, and NetBSD as /usr/pkgsrc/net/smokeping. My example server is FreeBSD 9, with SmokePing 2.4.2.

    The SmokePing port offers several different probes, or utilities for performing checks. In this example we’ll use the default probe, fping. While other probes, such as measuring DNS response time, are useful, they don’t address today’s day job problem.

    SmokePing is configured in /usr/local/etc/smokeping/config. The config file is a little different than most; it’s neither XML-ish nor C-esque. A hash mark is still a comment. Three asterisks marks off a configuration section. SmokePing uses a hierarchical configuration for monitoring hosts, and an item’s depth in the hierarchy is dictated by the number of plus signs before it. Variables are set with equals signs. It’s easy enough once you work through it a bit.

    Here’s the basic settings:


    *** General ***
    owner = mwlucas
    contact = mwlucas@blackhelicopters.org
    mailhost = mail.blackhelicopters.org
    sendmail = /usr/sbin/sendmail

    The Web interface needs some paths. I put my Web sites under /var/www/site/application. On this server, I want any local SmokePing stuff under /var/www/monitor/smoke. I’ll also use Apache aliases to direct part of the site to the directory where the port installed the files.

    imgcache = /var/www/monitor/smoke/images
    imgurl = https://monitor.blackhelicopters.org/smoke-images/
    datadir = /var/db/smoke
    piddir = /usr/local/var/smokeping/
    cgiurl = https://monitor.blackhelicopters.org/smoke/smokeping.cgi
    smokemail = /usr/local/etc/smokeping/smokemail
    tmail = /usr/local/etc/smokeping/tmail
    # specify this to get syslog logging
    syslogfacility = local0

    Create the directories assigned to datadir and imagesdir. The user smokeping must own the directory assigned to datadir. The Web server user (www) must own the imagesdir.

    As a general rule, I don’t permit applications write to files in the same directory that they’re installed in. It interfered with package management and added to security problems. Perhaps that’s not such a big concern these days, but I’m kind of old-school.

    Configure /etc/syslog.conf to log local0 to /var/log/smokeping.

    local0.* /var/log/smokeping

    I’m not configuring alarms right now, so you can comment out the line *** Alerts *** and everything beneath it until the next section. Similarly, comment out the entire *** Slaves *** section.

    Leave “Presentation” and “Database” alone, unless you a) understand RRD and want to muck with the innards of how SmokePing stores its data, and b) understand SmokePing. If you’re reading this article to learn about SmokePing, you automatically fail b).

    Under the Probes header, ensure the path to FPing is correct.

    The interesting bit is the Targets section. Here’s where you define which hosts you want to ping. SmokePing uses a hierarchical configuration that both lists the hosts you want to monitor and how you want the results displayed.

    *** Targets ***
    probe = FPing

    menu = Top
    title = Network Latency Grapher
    remark = Welcome to BH.org SmokePing.

    This header tells SmokePing that we’re configuring objects to be checked with FPing. We set a menu section and title, then proceed to the first target.


    + Southfield
    menu = Southfield
    title = Southfield

    ++ router6
    host = router6.blackhelicopters.org
    ++ router8
    host = router8.blackhelicopters.org

    + chi
    menu = Chicago
    title = Chicago
    ++ chi-1
    host=chi-1.blackhelicopters.org

    Here I’ve set up two first-level menus, Southfield (a suburb of Detroit) and Chicago. The Southfield menu has two entries beneath it. Each sub-entry has a title (indicated with ++) and a host. SmokePing will check these routers with FPing, and will create an interactive menu on the Web site arranging them as you have here.

    Set smokeping_enable=YES in /etc/rc.conf, and run /usr/local/etc/rc.d/smokeping start. Check /var/log/smokeping (you did set up syslog, didn’t you?) for any errors.

    Now the Web interface. FreeBSD’s package installed SmokePing’s CGI and related files in /usr/local/smokeping/htdocs. I want to use /var/www/monitor/smoke/images/ as the image cache. My httpd.conf for this is:

    Alias /smoke/ "/usr/local/smokeping/htdocs/"

    Options ExecCGI
    AllowOverride None
    Allow from All
    AddHandler cgi-script cgi

    Alias /smoke-images/ "/var/www/monitor/smoke/images/"

    I control access to my network management Web sites with LDAP. If you want to restrict with Apache’s IP address ACLs instead, change the Allow from All to something more suitable. Don’t open SmokePing to the world. Your customers and/or users will find it and ask a lot of inconvenient questions.

    SmokePing creates graphs indicating the average ping request latency in a green line, with smoky grey/black bars indicating jitter. When SmokePing loses packets, the line color changes.

    I’ll probably write more about SmokePing, as this hardly touches the surface. Tracking things like DNS query latency can help narrow down server-side problems.

    upgrading to OpenBSD-current, the stupid way

    My desktop runs an OpenBSD snapshot from April 2010. It’s well past time I upgraded. OpenBSD’s usual upgrade path works quite well, but I’m simultaneously lazy and willing to reinstall this system from scratch if something ghastly happens. (This might also invalidate any bug report you send.)

    Don’t do this if you have any need or respect for your computer. I treat my desktop with a mix of indifference and contempt, so I’ll proceed.

    Back up your data. I attached my external 1TB USB drive. /var/log/messages shows:

    Jan 21 10:08:17 avarice /bsd: sd0 at scsibus2 targ 1 lun 0: SCSI2 0/direct fixed
    Jan 21 10:08:17 avarice /bsd: sd0: 953869MB, 512 bytes/sec, 1953525168 sec total

    It’s device sd0. What partitions are on it?

    $ sudo disklabel sd0
    ...16 partitions:
    # size offset fstype [fsize bsize cpg]
    c: 1953525168 0 unused
    i: 1953520002 63 MSDOS

    I want to mount sd0i.

    $ sudo mount_msdos /dev/sd0i /mnt/
    $ cd /home
    $ sudo gtar -cvMf /mnt/laptop.tar mwlucas

    One annoyance with using an MSDOS-formatted disk for backup is that you can’t have a file larger than 4GB. My home directory is multiple times that. I must use gtar to back up my home directory, and use the multiple-volumes option. When gtar completes a 4GB file, it asks me to prepare a new volume. Move the existing backup file to a different file, then hit return to have gtar continue.

    While that’s running, let’s get the download files. Go to the OpenBSD mirror list and choose one near you. Use a web browser to verify that the shapshot on the site is current. Open a FTP session to that site, and grab all the bsd* and *.tgz files.

    ftp> cd pub/OpenBSD/snapshots/amd64
    250 Directory successfully changed.
    ftp> prompt
    Interactive mode off.
    ftp> mget bsd*
    wait
    ftp> mget *.tgz
    wait…

    Verify the checksums of the downloaded files against the checksums in the SHA256 file on the FTP site.

    $ cksum -a sha256 *

    I have backups. I have the files, and they aren’t corrupt. We are now at the point of no return. You can still follow the recommended upgrade procedure. I encourage you to do so.

    Shut down all unnecessary processes. If you’re forwarding packets, stop. If you’re in X, exit to a text console. Kill all daemons that aren’t necessary for a minimally-running system.

    Copy your desired kernel to the root directory. I’m using the multiprocessor kernel. Also save a copy of your current reboot command.

    $ rm /obsd ; ln /bsd /obsd && cp bsd.mp /nbsd && mv /nbsd /bsd
    $ cp bsd.rd /
    $ cp bsd /bsd.sp

    Now overwrite the nonessential parts of your userland.

    $ tar -C / -xzvphf xserv49.tgz
    $ tar -C / -xzphf xfont49.tgz
    $ tar -C / -xzphf xshare49.tgz
    $ tar -C / -xzphf xbase49.tgz
    $ tar -C / -xzphf game49.tgz
    $ tar -C / -xzphf comp49.tgz
    $ tar -C / -xzphf man49.tgz

    Do not extract the etc49.tgz distribution, as that will overwrite your core system configuration! You must update /etc separately.

    Update the core programs last. The core system includes programs like tar and reboot. Once you update the core, your system is running a new userland on an old kernel.

    $ tar -C / -xzphf base49.tgz

    Your system is now basically unusable; you have new binaries running on an old kernel. You must reboot now. Afterwards, I’m running:

    OpenBSD 4.9-beta (GENERIC.MP) #777: Tue Jan 18 13:56:34 MST 2011

    Generate the new device nodes.

    $ cd /dev/
    $ sudo ./MAKEDEV all

    I prefer to reboot after recreating device nodes. The new reboot command is now usable. After the next reboot everything looks fine, except for this message:

    Could not load host key: /etc/ssh/ssh_host_ecdsa_key

    So, there’s a new key type. I’ll get that as I upgrade /etc, by running sysmerge(8). Go to the snapshot directory and run:

    $ sudo sysmerge -s etc49.tgz -x xetc49.tgz

    Sysmerge will compare your installed /etc with the snapshot fileset and show you the diffs. You can install the new file, delete the new file, or merge the two together. If you’ve used mergemaster(8), sysmerge(8) will be no surprise.

    Then reboot again. With the new /etc, OpenBSD automatically generates the missing SSH key for the new crypto algorithm.

    My system is now upgraded.

    In the interest of sanity, I need to remove and reinstall all the packages on this system. This isn’t a big deal, except for those few that must be built as ports because I require something unusual. Set PKG_PATH to the packages directory of your closest FTP mirror and run pkg_add -ui

    $ sudo pkg_add -iu
    quirks-1.32: ok
    ORBit2-2.14.19:libiconv-1.13p0->libiconv-1.13p2: ok
    ORBit2-2.14.19:pcre-7.9->pcre-8.02p1: ok
    ORBit2-2.14.19:libgamin-0.1.10->libgamin-0.1.10p3: ok
    ORBit2-2.14.19:gettext-0.17p0->gettext-0.18.1p0: ok
    ...

    Walk away.

    In this particular case, pkg_add crashed when my chosen FTP mirror limited the number of successive connections from my IP address. I raised this on misc@, and got an answer and a fix almost immediately.

    So, even fools like me can get help. But don’t count on it.

    mod_security2 case sensitive?

    I’ve written previously about using mod_security to block referral spam and hosts on a DNS-based RBL.  I thought it was working pretty well, until I looked at my referrers today and saw lots of hits from “FreePornVideos.bogus” (domain name & suffix altered).  I shouldn’t see this, as my mod_security rules include:

    SecRule REQUEST_HEADERS:REFERER "porn" deny,status:500

    Lots of mod_security documentation claims that matches are case-insensitive.  I should not be seeing this.  What’s going on?  I believe that the problem is that the referral matches are case-sensitive, but let’s verify that.  First, let’s try a simple referral in lower case.

    $ wget http://www.michaelwlucas.com/ --referer=porn
    --2011-01-19 10:17:32--  http://www.michaelwlucas.com/
    Resolving www.michaelwlucas.com (www.michaelwlucas.com)... 198.22.63.8
    Connecting to www.michaelwlucas.com (www.michaelwlucas.com)|198.22.63.8|:80... connected.
    HTTP request sent, awaiting response... 500 Internal Server Error
    2011-01-19 10:17:32 ERROR 500: Internal Server Error.

    That works as expected.  Now try with a capital letter:

    $ wget http://www.michaelwlucas.com/ --referer=Porn
    --2011-01-19 10:17:34--  http://www.michaelwlucas.com/
    Resolving www.michaelwlucas.com (www.michaelwlucas.com)... 198.22.63.8
    Connecting to www.michaelwlucas.com (www.michaelwlucas.com)|198.22.63.8|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 10376 (10K) [text/html]
    Saving to: `index.html'

    Matches are case sensitive, despite what I read in the documentation.  Listing both Porn and porn won’t solve the problem, because that won’t protect me from pORN.

    Lesson of the day: verify you’re reading the correct documentation, and that you read what the author actually wrote.  mod_security2 uses PCRE for regular expressions. Version 1 used POSIX.  If I want case-insensitive matching, I have to declare that in my regex.  I modified the rule to read:

    SecRule REQUEST_HEADERS:REFERER "(?i:(porn))" deny,status:500

    Reload Apache. Test again with wget.  Both porn and Porn are now blocked, as well as pORN.  Petulance of the day remediated. Now back to BGP.

    identifying probable intrusion vectors with flow data

    Shortly after Absolute FreeBSD came out, I worked with gpart(8) and thought “I should have put this in the book.”  Just after Cisco Routers for the Desperate went to the printer, I worked with tracking gateway availability and said “Drat!  This should have gone into the book!”  This is a recurring motif in my life.

    Now that Network Flow Analysis is out, I should have marked calendar space for “interesting flow analysis opportunity.”  If you want to know the details behind all of this, look in the book or in the flow-tools documentation.

    Someone recently penetrated a dev server I help support. I want to learn how they got access, using flow data.  I have no idea if this is realistic, but let’s go for it.  I previously made a reasonable guess about the date the host was compromised, so I know the time window to examine. I’ll attack the problem by identifying “known good” traffic, removing it from the data, and examining what remains. (This might not be the best method, but I know that a couple security and intrusion response folks read this blog, and one in particular won’t hesitate to tell me I’m fubar, so check for comments.)

    First, let’s see the traffic this host sends and receives.

    # flow-cat 2010-11-09/ft* | flow-nfilter -F ip-addr -v ADDR=189.22.36.165 | flow-print | less

    srcIP            dstIP            prot  srcPort  dstPort  octets      packets
    189.22.36.165    194.28.157.50    6     7781     80       40          1
    194.28.157.50    189.22.36.165    6     80       7781     40          1
    189.22.36.165    194.28.157.50    6     9008     80       40          1
    189.22.36.165    194.28.157.50    6     9008     80       40          1
    194.28.157.50    189.22.36.165    6     80       9008     80          2
    189.22.36.165    194.28.157.50    6     6625     80       80          2
    194.28.157.50    189.22.36.165    6     80       6625     80          2
    189.22.36.165    82.135.96.18     6     445      59423    80          2
    82.135.96.18     189.22.36.165    6     59423    445      96          2
    189.22.36.165    72.167.161.47    6     80       51428    40          1
    72.167.161.47    189.22.36.165    6     51404    21       84          2
    ...

    This machine is an Ubuntu box.  It regularly contacts random Internet sites to check for updates.  The developer also browses the Web from it.  If I’m to have any luck, I must exclude Web browsing traffic from this host.  (To the best of my knowledge, there is not yet a Web site that will automatically root any Unix-like system.  I might be wrong.)  I normally configure most filtering on the command line, but this is complicated enough that I need to write an actual filter for it.


    filter-primitive port80
    type ip-port
    permit 80

    filter-primitive victim
    type ip-address
    permit 189.22.36.165

    filter-definition victim-browsing
    invert
    match ip-source-address victim
    match ip-destination-port port80
    or
    match ip-destination-address victim
    match ip-source-port port80

    We match all traffic from the victim machine to port 80, and from port 80 to the victim machine, then invert the filter to exclude everything that matches. Add this filter to the command line and we get:

    srcIP            dstIP            prot  srcPort  dstPort  octets      packets
    189.22.36.165    82.135.96.18     6     445      59423    80          2
    82.135.96.18     189.22.36.165    6     59423    445      96          2
    189.22.36.165    72.167.161.47    6     80       51428    40          1
    72.167.161.47    189.22.36.165    6     51404    21       84          2
    72.167.161.47    189.22.36.165    6     49768    21       296         6
    189.22.36.165    72.167.161.47    6     21       49768    262         3
    72.167.161.47    189.22.36.165    6     51428    80       40          1
    ...

    Some interesting things here. This machine shouldn’t be running a SMB server, but the first two flows show that someone connected to us on port 445, we answered, and we sent a bunch of data. The developer owner probably installed Samba as a dependency of something else she installed, and never even noticed. Nobody on the outside world should be talking to this machine’s Web site, but it’s not that surprising that someone did. There’s a small FTP query next; I suspect it’s one of the innumerable FTP scanners.

    There’s still 1,690 lines of this stuff; far too much to assess by eye.  Let’s trim it down by assuming this is the most common sort of intrusion.

    Generally, an intruder attacks a service on a machine. He would then send the code for the exploit or IRC bouncer to the machine through that service.  Let’s make the (uncertain and unreliable) assumption that one or the other of these is larger than 1 packet.  Most DNS transactions, pings, etc, are 1 packet, so by looking for flows larger than 1 packet we exclude this innocuous traffic.  The following primitive and filter only passes flows larger than 1 packet.

    filter-primitive gt1packet
    type counter
    permit gt 1

    filter-definition gt1packet
    match packets gt1packet

    Now add |flow-nfilter -F gt1packet to the command line and see what remains. The following immediately stands out:

    ...
    189.22.36.165    79.115.103.225   6     22       4382     3703        19
    189.22.36.165    79.115.103.225   6     22       4383     3095        11
    189.22.36.165    79.115.103.225   6     6667     4384     120         3
    189.22.36.165    79.115.103.225   6     6667     4385     120         3
    ...

    The first port 6667 connections are to a host 79.115.103.225, a Romanian system. Let’s strip out all of the previous filters and see what traffic these two hosts have exchanged. There’s a lot of SSH traffic, more than we see from the usual brute-force guesser.

    # flow-cat 2010-11-09/ft* | flow-nfilter -F ip-addr -v ADDR=189.22.36.165 | \
       flow-nfilter -F ip-addr -v ADDR=79.115.103.225  | flow-print | less
    srcIP            dstIP            prot  srcPort  dstPort  octets      packets
    79.115.103.225   189.22.36.165    6     4381     22       371         6
    189.22.36.165    79.115.103.225   6     22       4381     394         7
    79.115.103.225   189.22.36.165    6     4383     22       1984        14
    189.22.36.165    79.115.103.225   6     22       4382     3703        19
    189.22.36.165    79.115.103.225   6     22       4383     3095        11
    189.22.36.165    79.115.103.225   6     6667     4384     120         3
    189.22.36.165    79.115.103.225   6     6667     4385     120         3
    79.115.103.225   189.22.36.165    6     4384     6667     192         3
    79.115.103.225   189.22.36.165    6     4382     22       11804       118
    79.115.103.225   189.22.36.165    6     4385     6667     192         3
    189.22.36.165    79.115.103.225   6     22       4382     12688       103
    79.115.103.225   189.22.36.165    6     4382     22       1664        19
    79.115.103.225   189.22.36.165    6     4382     22       5564        64
    189.22.36.165    79.115.103.225   6     22       4382     9708        50
    79.115.103.225   189.22.36.165    6     4382     22       14956       169
    189.22.36.165    79.115.103.225   6     22       4382     16060       129
    79.115.103.225   189.22.36.165    6     4382     22       1040        12
    189.22.36.165    79.115.103.225   6     22       4382     928         8
    189.22.36.165    79.115.103.225   6     8888     4470     120         3
    79.115.103.225   189.22.36.165    6     4470     8888     192         3
    79.115.103.225   189.22.36.165    6     4382     22       4316        49
    189.22.36.165    79.115.103.225   6     22       4382     11344       42
    79.115.103.225   189.22.36.165    6     4382     22       1924        23
    189.22.36.165    79.115.103.225   6     22       4382     8800        20
    ...
    

    Using flow-print -f 5, I can view the timestamps and verify that the IRC activity started shortly after the SSH activity started using larger amounts of bandwidth.

    Can I be certain that 79.115.103.225 is my attacker? No. Is this activity suspicious? Absolutely. I can examine the hacked machine, or a disk image thereof, and identify the account used to penetrate the machine.

    This is not proof, but it’s a place to start. In assessing the rest of the data, I can now exclude this host. This will further reduce the pool of data I am assessing.

    While I can’t use this as grounds for flying to Romania with body armor, a machine gun, and a machete, I can realistically act on this information. I can report the activity to the IP address owner. I can check my network for other connections from this host, and verify the integrity of any machines it’s connected to. I can use this a a part of my business case to firewall off this part of the network. It will support my argument to forbid passwords for SSH connections on dev machines.

    In retrospect, I could have made other assumptions that might have let me find this more quickly, e.g., I could have investigated the first hosts contacted on the questionable ports. But every puzzle is easy once you’ve solved it. After this, I’d have to say that backtracking intrusion vectors through flow data is very practical, even when you don’t have much experience.