identifying probable intrusion vectors with flow data

Shortly after Absolute FreeBSD came out, I worked with gpart(8) and thought “I should have put this in the book.” Just after Cisco Routers for the Desperate went to the printer, I worked with tracking gateway availability and said “Drat! This should have gone into the book!” This is a recurring motif in my life.

Now that Network Flow Analysis is out, I should have marked calendar space for “interesting flow analysis opportunity.” If you want to know the details behind all of this, look in the book or in the flow-tools documentation.

Someone recently penetrated a dev server I help support. I want to learn how they got access, using flow data. I have no idea if this is realistic, but let’s go for it. I previously made a reasonable guess about the date the host was compromised, so I know the time window to examine. I’ll attack the problem by identifying “known good” traffic, removing it from the data, and examining what remains. (This might not be the best method, but I know that a couple security and intrusion response folks read this blog, and one in particular won’t hesitate to tell me I’m fubar, so check for comments.)

First, let’s see the traffic this host sends and receives.

# flow-cat 2010-11-09/ft* | flow-nfilter -F ip-addr -v ADDR=189.22.36.165 | flow-print | less

srcIP dstIP prot srcPort dstPort octets packets 189.22.36.165 194.28.157.50 6 7781 80 40 1 194.28.157.50 189.22.36.165 6 80 7781 40 1 189.22.36.165 194.28.157.50 6 9008 80 40 1 189.22.36.165 194.28.157.50 6 9008 80 40 1 194.28.157.50 189.22.36.165 6 80 9008 80 2 189.22.36.165 194.28.157.50 6 6625 80 80 2 194.28.157.50 189.22.36.165 6 80 6625 80 2 189.22.36.165 82.135.96.18 6 445 59423 80 2 82.135.96.18 189.22.36.165 6 59423 445 96 2 189.22.36.165 72.167.161.47 6 80 51428 40 1 72.167.161.47 189.22.36.165 6 51404 21 84 2 ...

This machine is an Ubuntu box. It regularly contacts random Internet sites to check for updates. The developer also browses the Web from it. If I’m to have any luck, I must exclude Web browsing traffic from this host. (To the best of my knowledge, there is not yet a Web site that will automatically root any Unix-like system. I might be wrong.) I normally configure most filtering on the command line, but this is complicated enough that I need to write an actual filter for it.

filter-primitive port80 type ip-port permit 80

filter-primitive victim type ip-address permit 189.22.36.165

filter-definition victim-browsing invert match ip-source-address victim match ip-destination-port port80 or match ip-destination-address victim match ip-source-port port80

We match all traffic from the victim machine to port 80, and from port 80 to the victim machine, then invert the filter to exclude everything that matches. Add this filter to the command line and we get:

srcIP            dstIP            prot  srcPort  dstPort  octets      packets
189.22.36.165    82.135.96.18     6     445      59423    80          2
82.135.96.18     189.22.36.165    6     59423    445      96          2
189.22.36.165    72.167.161.47    6     80       51428    40          1
72.167.161.47    189.22.36.165    6     51404    21       84          2
72.167.161.47    189.22.36.165    6     49768    21       296         6
189.22.36.165    72.167.161.47    6     21       49768    262         3
72.167.161.47    189.22.36.165    6     51428    80       40          1
...

Some interesting things here. This machine shouldn’t be running a SMB server, but the first two flows show that someone connected to us on port 445, we answered, and we sent a bunch of data. The developer owner probably installed Samba as a dependency of something else she installed, and never even noticed. Nobody on the outside world should be talking to this machine’s Web site, but it’s not that surprising that someone did. There’s a small FTP query next; I suspect it’s one of the innumerable FTP scanners.

There’s still 1,690 lines of this stuff; far too much to assess by eye. Let’s trim it down by assuming this is the most common sort of intrusion.

Generally, an intruder attacks a service on a machine. He would then send the code for the exploit or IRC bouncer to the machine through that service. Let’s make the (uncertain and unreliable) assumption that one or the other of these is larger than 1 packet. Most DNS transactions, pings, etc, are 1 packet, so by looking for flows larger than 1 packet we exclude this innocuous traffic. The following primitive and filter only passes flows larger than 1 packet.

filter-primitive gt1packet type counter permit gt 1

filter-definition gt1packet match packets gt1packet

Now add |flow-nfilter -F gt1packet to the command line and see what remains. The following immediately stands out:

...
189.22.36.165    79.115.103.225   6     22       4382     3703        19
189.22.36.165    79.115.103.225   6     22       4383     3095        11
189.22.36.165    79.115.103.225   6     6667     4384     120         3
189.22.36.165    79.115.103.225   6     6667     4385     120         3
...

The first port 6667 connections are to a host 79.115.103.225, a Romanian system. Let’s strip out all of the previous filters and see what traffic these two hosts have exchanged. There’s a lot of SSH traffic, more than we see from the usual brute-force guesser.

# flow-cat 2010-11-09/ft* | flow-nfilter -F ip-addr -v ADDR=189.22.36.165 | \
   flow-nfilter -F ip-addr -v ADDR=79.115.103.225  | flow-print | less
srcIP            dstIP            prot  srcPort  dstPort  octets      packets
79.115.103.225   189.22.36.165    6     4381     22       371         6
189.22.36.165    79.115.103.225   6     22       4381     394         7
79.115.103.225   189.22.36.165    6     4383     22       1984        14
189.22.36.165    79.115.103.225   6     22       4382     3703        19
189.22.36.165    79.115.103.225   6     22       4383     3095        11
189.22.36.165    79.115.103.225   6     6667     4384     120         3
189.22.36.165    79.115.103.225   6     6667     4385     120         3
79.115.103.225   189.22.36.165    6     4384     6667     192         3
79.115.103.225   189.22.36.165    6     4382     22       11804       118
79.115.103.225   189.22.36.165    6     4385     6667     192         3
189.22.36.165    79.115.103.225   6     22       4382     12688       103
79.115.103.225   189.22.36.165    6     4382     22       1664        19
79.115.103.225   189.22.36.165    6     4382     22       5564        64
189.22.36.165    79.115.103.225   6     22       4382     9708        50
79.115.103.225   189.22.36.165    6     4382     22       14956       169
189.22.36.165    79.115.103.225   6     22       4382     16060       129
79.115.103.225   189.22.36.165    6     4382     22       1040        12
189.22.36.165    79.115.103.225   6     22       4382     928         8
189.22.36.165    79.115.103.225   6     8888     4470     120         3
79.115.103.225   189.22.36.165    6     4470     8888     192         3
79.115.103.225   189.22.36.165    6     4382     22       4316        49
189.22.36.165    79.115.103.225   6     22       4382     11344       42
79.115.103.225   189.22.36.165    6     4382     22       1924        23
189.22.36.165    79.115.103.225   6     22       4382     8800        20
...

Using flow-print -f 5, I can view the timestamps and verify that the IRC activity started shortly after the SSH activity started using larger amounts of bandwidth.

Can I be certain that 79.115.103.225 is my attacker? No. Is this activity suspicious? Absolutely. I can examine the hacked machine, or a disk image thereof, and identify the account used to penetrate the machine.

This is not proof, but it’s a place to start. In assessing the rest of the data, I can now exclude this host. This will further reduce the pool of data I am assessing.

While I can’t use this as grounds for flying to Romania with body armor, a machine gun, and a machete, I can realistically act on this information. I can report the activity to the IP address owner. I can check my network for other connections from this host, and verify the integrity of any machines it’s connected to. I can use this a a part of my business case to firewall off this part of the network. It will support my argument to forbid passwords for SSH connections on dev machines.

In retrospect, I could have made other assumptions that might have let me find this more quickly, e.g., I could have investigated the first hosts contacted on the questionable ports. But every puzzle is easy once you’ve solved it. After this, I’d have to say that backtracking intrusion vectors through flow data is very practical, even when you don’t have much experience.

One Reply to “identifying probable intrusion vectors with flow data”