Shortly after Absolute FreeBSD came out, I worked with gpart(8) and thought “I should have put this in the book.” Just after Cisco Routers for the Desperate went to the printer, I worked with tracking gateway availability and said “Drat! This should have gone into the book!” This is a recurring motif in my life.
Now that Network Flow Analysis is out, I should have marked calendar space for “interesting flow analysis opportunity.” If you want to know the details behind all of this, look in the book or in the flow-tools documentation.
Someone recently penetrated a dev server I help support. I want to learn how they got access, using flow data. I have no idea if this is realistic, but let’s go for it. I previously made a reasonable guess about the date the host was compromised, so I know the time window to examine. I’ll attack the problem by identifying “known good” traffic, removing it from the data, and examining what remains. (This might not be the best method, but I know that a couple security and intrusion response folks read this blog, and one in particular won’t hesitate to tell me I’m fubar, so check for comments.)
First, let’s see the traffic this host sends and receives.
# flow-cat 2010-11-09/ft* | flow-nfilter -F ip-addr -v ADDR=189.22.36.165 | flow-print | less
srcIP dstIP prot srcPort dstPort octets packets
189.22.36.165 194.28.157.50 6 7781 80 40 1
194.28.157.50 189.22.36.165 6 80 7781 40 1
189.22.36.165 194.28.157.50 6 9008 80 40 1
189.22.36.165 194.28.157.50 6 9008 80 40 1
194.28.157.50 189.22.36.165 6 80 9008 80 2
189.22.36.165 194.28.157.50 6 6625 80 80 2
194.28.157.50 189.22.36.165 6 80 6625 80 2
189.22.36.165 82.135.96.18 6 445 59423 80 2
82.135.96.18 189.22.36.165 6 59423 445 96 2
189.22.36.165 72.167.161.47 6 80 51428 40 1
72.167.161.47 189.22.36.165 6 51404 21 84 2
...
This machine is an Ubuntu box. It regularly contacts random Internet sites to check for updates. The developer also browses the Web from it. If I’m to have any luck, I must exclude Web browsing traffic from this host. (To the best of my knowledge, there is not yet a Web site that will automatically root any Unix-like system. I might be wrong.) I normally configure most filtering on the command line, but this is complicated enough that I need to write an actual filter for it.
filter-primitive port80
type ip-port
permit 80
filter-primitive victim
type ip-address
permit 189.22.36.165
filter-definition victim-browsing
invert
match ip-source-address victim
match ip-destination-port port80
or
match ip-destination-address victim
match ip-source-port port80
We match all traffic from the victim machine to port 80, and from port 80 to the victim machine, then invert the filter to exclude everything that matches. Add this filter to the command line and we get:
srcIP dstIP prot srcPort dstPort octets packets
189.22.36.165 82.135.96.18 6 445 59423 80 2
82.135.96.18 189.22.36.165 6 59423 445 96 2
189.22.36.165 72.167.161.47 6 80 51428 40 1
72.167.161.47 189.22.36.165 6 51404 21 84 2
72.167.161.47 189.22.36.165 6 49768 21 296 6
189.22.36.165 72.167.161.47 6 21 49768 262 3
72.167.161.47 189.22.36.165 6 51428 80 40 1
...
Some interesting things here. This machine shouldn’t be running a SMB server, but the first two flows show that someone connected to us on port 445, we answered, and we sent a bunch of data. The developer owner probably installed Samba as a dependency of something else she installed, and never even noticed. Nobody on the outside world should be talking to this machine’s Web site, but it’s not that surprising that someone did. There’s a small FTP query next; I suspect it’s one of the innumerable FTP scanners.
There’s still 1,690 lines of this stuff; far too much to assess by eye. Let’s trim it down by assuming this is the most common sort of intrusion.
Generally, an intruder attacks a service on a machine. He would then send the code for the exploit or IRC bouncer to the machine through that service. Let’s make the (uncertain and unreliable) assumption that one or the other of these is larger than 1 packet. Most DNS transactions, pings, etc, are 1 packet, so by looking for flows larger than 1 packet we exclude this innocuous traffic. The following primitive and filter only passes flows larger than 1 packet.
filter-primitive gt1packet
type counter
permit gt 1
filter-definition gt1packet
match packets gt1packet
Now add |flow-nfilter -F gt1packet
to the command line and see what remains. The following immediately stands out:
...
189.22.36.165 79.115.103.225 6 22 4382 3703 19
189.22.36.165 79.115.103.225 6 22 4383 3095 11
189.22.36.165 79.115.103.225 6 6667 4384 120 3
189.22.36.165 79.115.103.225 6 6667 4385 120 3
...
The first port 6667 connections are to a host 79.115.103.225, a Romanian system. Let’s strip out all of the previous filters and see what traffic these two hosts have exchanged. There’s a lot of SSH traffic, more than we see from the usual brute-force guesser.
# flow-cat 2010-11-09/ft* | flow-nfilter -F ip-addr -v ADDR=189.22.36.165 | \
flow-nfilter -F ip-addr -v ADDR=79.115.103.225 | flow-print | less
srcIP dstIP prot srcPort dstPort octets packets
79.115.103.225 189.22.36.165 6 4381 22 371 6
189.22.36.165 79.115.103.225 6 22 4381 394 7
79.115.103.225 189.22.36.165 6 4383 22 1984 14
189.22.36.165 79.115.103.225 6 22 4382 3703 19
189.22.36.165 79.115.103.225 6 22 4383 3095 11
189.22.36.165 79.115.103.225 6 6667 4384 120 3
189.22.36.165 79.115.103.225 6 6667 4385 120 3
79.115.103.225 189.22.36.165 6 4384 6667 192 3
79.115.103.225 189.22.36.165 6 4382 22 11804 118
79.115.103.225 189.22.36.165 6 4385 6667 192 3
189.22.36.165 79.115.103.225 6 22 4382 12688 103
79.115.103.225 189.22.36.165 6 4382 22 1664 19
79.115.103.225 189.22.36.165 6 4382 22 5564 64
189.22.36.165 79.115.103.225 6 22 4382 9708 50
79.115.103.225 189.22.36.165 6 4382 22 14956 169
189.22.36.165 79.115.103.225 6 22 4382 16060 129
79.115.103.225 189.22.36.165 6 4382 22 1040 12
189.22.36.165 79.115.103.225 6 22 4382 928 8
189.22.36.165 79.115.103.225 6 8888 4470 120 3
79.115.103.225 189.22.36.165 6 4470 8888 192 3
79.115.103.225 189.22.36.165 6 4382 22 4316 49
189.22.36.165 79.115.103.225 6 22 4382 11344 42
79.115.103.225 189.22.36.165 6 4382 22 1924 23
189.22.36.165 79.115.103.225 6 22 4382 8800 20
...
Using flow-print -f 5
, I can view the timestamps and verify that the IRC activity started shortly after the SSH activity started using larger amounts of bandwidth.
Can I be certain that 79.115.103.225 is my attacker? No. Is this activity suspicious? Absolutely. I can examine the hacked machine, or a disk image thereof, and identify the account used to penetrate the machine.
This is not proof, but it’s a place to start. In assessing the rest of the data, I can now exclude this host. This will further reduce the pool of data I am assessing.
While I can’t use this as grounds for flying to Romania with body armor, a machine gun, and a machete, I can realistically act on this information. I can report the activity to the IP address owner. I can check my network for other connections from this host, and verify the integrity of any machines it’s connected to. I can use this a a part of my business case to firewall off this part of the network. It will support my argument to forbid passwords for SSH connections on dev machines.
In retrospect, I could have made other assumptions that might have let me find this more quickly, e.g., I could have investigated the first hosts contacted on the questionable ports. But every puzzle is easy once you’ve solved it. After this, I’d have to say that backtracking intrusion vectors through flow data is very practical, even when you don’t have much experience.