==== Problem ==== Given two log files, containing one log entry per line, generate lists IP addresses unique to each file. (IP's in file1.log but not in file2.log and IP's in file2.log but not in file1.log) ==== Sample Data ==== Use the following to verify results: test_log1.txt 38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET / HTTP/1.1" 403 259 38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET /favicon.ico HTTP/1.1" 404 266 38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET /favicon.ico HTTP/1.1" 404 266 38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET / HTTP/1.1" 403 263 38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET /favicon.ico HTTP/1.1" 404 270 38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET /favicon.ico HTTP/1.1" 404 270 38.112.93.68 - - [21/Nov/2006:11:37:39 -0500] "GET /favicon.ico HTTP/1.1" 404 270 74.98.22.196 - - [05/Feb/2007:20:44:57 -0500] "GET / HTTP/1.1" 200 77 74.98.22.196 - - [05/Feb/2007:20:44:58 -0500] "GET /favicon.ico HTTP/1.1" 404 276 74.98.22.196 - - [05/Feb/2007:21:44:08 -0500] "GET /favicon.ico HTTP/1.1" 404 276 test_log2.txt 38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET / HTTP/1.1" 403 259 38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET /favicon.ico HTTP/1.1" 404 266 38.112.93.68 - - [21/Nov/2006:11:15:06 -0500] "GET /favicon.ico HTTP/1.1" 404 266 38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET / HTTP/1.1" 403 263 38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET /favicon.ico HTTP/1.1" 404 270 38.112.93.68 - - [21/Nov/2006:11:15:12 -0500] "GET /favicon.ico HTTP/1.1" 404 270 38.112.93.68 - - [21/Nov/2006:11:37:39 -0500] "GET /favicon.ico HTTP/1.1" 404 270 74.98.22.196 - - [05/Feb/2007:20:44:57 -0500] "GET / HTTP/1.1" 200 77 74.98.22.196 - - [05/Feb/2007:20:44:58 -0500] "GET /favicon.ico HTTP/1.1" 404 276 74.98.22.196 - - [05/Feb/2007:21:44:08 -0500] "GET /favicon.ico HTTP/1.1" 404 276 72.30.161.228 - - [05/Dec/2008:15:35:41 -0500] "GET /blog/2006/05/21/ HTTP/1.0" 404 281 74.6.22.177 - - [05/Dec/2008:18:44:14 -0500] "GET /robots.txt HTTP/1.0" 404 271 74.6.22.177 - - [05/Dec/2008:18:44:14 -0500] "GET /cython-doc/docs/external_C_code.html HTTP/1.0" 404 297 67.195.37.158 - - [05/Dec/2008:20:32:08 -0500] "GET /robots.txt HTTP/1.0" 404 275 67.195.37.158 - - [05/Dec/2008:20:32:09 -0500] "GET / HTTP/1.0" 200 2238 67.195.37.158 - - [05/Dec/2008:20:33:52 -0500] "GET /blog/ggellner HTTP/1.0" 302 280 65.55.105.194 - - [06/Dec/2008:04:38:32 -0500] "GET /robots.txt HTTP/1.1" 404 275 65.55.105.194 - - [06/Dec/2008:04:38:36 -0500] "GET / HTTP/1.1" 304 - 208.80.194.38 - - [06/Dec/2008:05:24:21 -0500] "GET / HTTP/1.0" 200 2238 91.121.106.59 - - [06/Dec/2008:10:02:17 -0500] "HEAD / HTTP/1.1" 200 - resulting output should be similar to Only in test_log1.txt set([[]) Only in test_log2.txt set(['67.195.37.158', '74.6.22.177', '65.55.105.194', '91.121.106.59', '72.30.161.228', '208.80.194.38']) If you would like a large dataset to work with, here are two files containing 25 thousand log entries each. * {{:codeapalooza:log1.txt|log1.txt}} * {{:codeapalooza:log2.txt|log2.txt}} ==== Solutions ==== * Josh - {{:codeapalooza:logdiff.java|LogDiff.java}} (Java) * Josh - {{:codeapalooza:logdiff.cpp|logdiff.cpp}} (C++) * Josh - {{:codeapalooza:logdiff.m|logdiff.m}} (Obj-C) * Gabriel - {{:codeapalooza:ipcompare.py|ipcompare.py}} (Python)