Mar
16th
Wed
16th
tip: hadoop
in pig, count number of tokens in a chararray with ((to is NULL) ? 0 : COUNT(TOKENIZE(text).$0)) AS num_tokens
in pig, count number of tokens in a chararray with ((to is NULL) ? 0 : COUNT(TOKENIZE(text).$0)) AS num_tokens
use -inputformat WholeFileInputFormat to process full files as records http://bit.ly/dHFWBm
access input file names in streaming using $MAP_INPUT_FILE http://bit.ly/cRfB6k (more here: http://bit.ly/9EzL1u)
to call a url upon job completion, use -Djob.end.notification.url=’http://…?jobid=&jobStatus=’
to update a counter in streaming, write reporter:counter:
use job.end.notification.url to specify a url to be called at job completion http://bit.ly/3PMKd
streaming jobs can access jobconf vars, e.g. mapred.map.tasks->$MAPRED_MAP_TASKS http://bit.ly/3wEkTZ (via @tlipcon)