Given two log files, containing one log entry per line, generate lists IP addresses unique to each file. (IP's in file1.log but not in file2.log and IP's in file2.log but not in file1.log)
Use the following to verify results:
test_log1.txt
38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET / HTTP/1.1" 403 259 38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET /favicon.ico HTTP/1.1" 404 266 38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET /favicon.ico HTTP/1.1" 404 266 38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET / HTTP/1.1" 403 263 38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET /favicon.ico HTTP/1.1" 404 270 38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET /favicon.ico HTTP/1.1" 404 270 38.112.93.68 - - [21/Nov/2006:11:37:39 -0500] "GET /favicon.ico HTTP/1.1" 404 270 74.98.22.196 - - [05/Feb/2007:20:44:57 -0500] "GET / HTTP/1.1" 200 77 74.98.22.196 - - [05/Feb/2007:20:44:58 -0500] "GET /favicon.ico HTTP/1.1" 404 276 74.98.22.196 - - [05/Feb/2007:21:44:08 -0500] "GET /favicon.ico HTTP/1.1" 404 276
test_log2.txt
38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET / HTTP/1.1" 403 259 38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET /favicon.ico HTTP/1.1" 404 266 38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET /favicon.ico HTTP/1.1" 404 266 38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET / HTTP/1.1" 403 263 38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET /favicon.ico HTTP/1.1" 404 270 38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET /favicon.ico HTTP/1.1" 404 270 38.112.93.68 - - [21/Nov/2006:11:37:39 -0500] "GET /favicon.ico HTTP/1.1" 404 270 74.98.22.196 - - [05/Feb/2007:20:44:57 -0500] "GET / HTTP/1.1" 200 77 74.98.22.196 - - [05/Feb/2007:20:44:58 -0500] "GET /favicon.ico HTTP/1.1" 404 276 74.98.22.196 - - [05/Feb/2007:21:44:08 -0500] "GET /favicon.ico HTTP/1.1" 404 276 72.30.161.228 - - [05/Dec/2008:15:35:41 -0500] "GET /blog/2006/05/21/ HTTP/1.0" 404 281 74.6.22.177 - - [05/Dec/2008:18:44:14 -0500] "GET /robots.txt HTTP/1.0" 404 271 74.6.22.177 - - [05/Dec/2008:18:44:14 -0500] "GET /cython-doc/docs/external_C_code.html HTTP/1.0" 404 297 67.195.37.158 - - [05/Dec/2008:20:32:08 -0500] "GET /robots.txt HTTP/1.0" 404 275 67.195.37.158 - - [05/Dec/2008:20:32:09 -0500] "GET / HTTP/1.0" 200 2238 67.195.37.158 - - [05/Dec/2008:20:33:52 -0500] "GET /blog/ggellner HTTP/1.0" 302 280 65.55.105.194 - - [06/Dec/2008:04:38:32 -0500] "GET /robots.txt HTTP/1.1" 404 275 65.55.105.194 - - [06/Dec/2008:04:38:36 -0500] "GET / HTTP/1.1" 304 - 208.80.194.38 - - [06/Dec/2008:05:24:21 -0500] "GET / HTTP/1.0" 200 2238 91.121.106.59 - - [06/Dec/2008:10:02:17 -0500] "HEAD / HTTP/1.1" 200 -
resulting output should be similar to
Only in test_log1.txt set([[]) Only in test_log2.txt set(['67.195.37.158', '74.6.22.177', '65.55.105.194', '91.121.106.59', '72.30.161.228', '208.80.194.38'])
If you would like a large dataset to work with, here are two files containing 25 thousand log entries each.