Table of Contents

Problem

Given two log files, containing one log entry per line, generate lists IP addresses unique to each file. (IP's in file1.log but not in file2.log and IP's in file2.log but not in file1.log)

Sample Data

Use the following to verify results:

test_log1.txt

38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET / HTTP/1.1" 403 259
38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET /favicon.ico HTTP/1.1" 404 266
38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET /favicon.ico HTTP/1.1" 404 266
38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET / HTTP/1.1" 403 263
38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET /favicon.ico HTTP/1.1" 404 270
38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET /favicon.ico HTTP/1.1" 404 270
38.112.93.68 - - [21/Nov/2006:11:37:39 -0500] "GET /favicon.ico HTTP/1.1" 404 270
74.98.22.196 - - [05/Feb/2007:20:44:57 -0500] "GET / HTTP/1.1" 200 77
74.98.22.196 - - [05/Feb/2007:20:44:58 -0500] "GET /favicon.ico HTTP/1.1" 404 276
74.98.22.196 - - [05/Feb/2007:21:44:08 -0500] "GET /favicon.ico HTTP/1.1" 404 276

test_log2.txt

38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET / HTTP/1.1" 403 259
38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET /favicon.ico HTTP/1.1" 404 266
38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET /favicon.ico HTTP/1.1" 404 266
38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET / HTTP/1.1" 403 263
38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET /favicon.ico HTTP/1.1" 404 270
38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET /favicon.ico HTTP/1.1" 404 270
38.112.93.68 - - [21/Nov/2006:11:37:39 -0500] "GET /favicon.ico HTTP/1.1" 404 270
74.98.22.196 - - [05/Feb/2007:20:44:57 -0500] "GET / HTTP/1.1" 200 77
74.98.22.196 - - [05/Feb/2007:20:44:58 -0500] "GET /favicon.ico HTTP/1.1" 404 276
74.98.22.196 - - [05/Feb/2007:21:44:08 -0500] "GET /favicon.ico HTTP/1.1" 404 276
72.30.161.228 - - [05/Dec/2008:15:35:41 -0500] "GET /blog/2006/05/21/ HTTP/1.0" 404 281
74.6.22.177 - - [05/Dec/2008:18:44:14 -0500] "GET /robots.txt HTTP/1.0" 404 271
74.6.22.177 - - [05/Dec/2008:18:44:14 -0500] "GET /cython-doc/docs/external_C_code.html HTTP/1.0" 404 297
67.195.37.158 - - [05/Dec/2008:20:32:08 -0500] "GET /robots.txt HTTP/1.0" 404 275
67.195.37.158 - - [05/Dec/2008:20:32:09 -0500] "GET / HTTP/1.0" 200 2238
67.195.37.158 - - [05/Dec/2008:20:33:52 -0500] "GET /blog/ggellner HTTP/1.0" 302 280
65.55.105.194 - - [06/Dec/2008:04:38:32 -0500] "GET /robots.txt HTTP/1.1" 404 275
65.55.105.194 - - [06/Dec/2008:04:38:36 -0500] "GET / HTTP/1.1" 304 -
208.80.194.38 - - [06/Dec/2008:05:24:21 -0500] "GET / HTTP/1.0" 200 2238
91.121.106.59 - - [06/Dec/2008:10:02:17 -0500] "HEAD / HTTP/1.1" 200 -

resulting output should be similar to

Only in test_log1.txt
set([[])

Only in test_log2.txt
set(['67.195.37.158', '74.6.22.177', '65.55.105.194', '91.121.106.59', '72.30.161.228', '208.80.194.38'])

If you would like a large dataset to work with, here are two files containing 25 thousand log entries each.

Solutions