I wish to have the ability to perform a standard diff on two large files. I have got something which works but it is not as quick as diff around the command line.
A = load 'A' as (line); B = load 'B' as (line); JOINED = join A by line full outer, B by line; DIFF = FILTER JOINED by A::line is null or B::line is null; DIFF2 = FOREACH DIFF GENERATE (A::line is null?B::line : A::line), (A::line is null?'REMOVED':'ADDED'); STORE DIFF2 into 'diff';
Anybody got much better ways to get this done?