jake RSS


jakehofman.com

Archive

Mar
16th
Wed
permalink

tip: hadoop

in pig, count number of tokens in a chararray with ((to is NULL) ? 0 : COUNT(TOKENIZE(text).$0)) AS num_tokens

tips  hadoop 
Jan
14th
Fri
permalink

tip: hadoop

use -inputformat WholeFileInputFormat to process full files as records http://bit.ly/dHFWBm

tips  hadoop 
Jun
4th
Fri
permalink

tip: hadoop

access input file names in streaming using $MAP_INPUT_FILE http://bit.ly/cRfB6k (more here: http://bit.ly/9EzL1u)

tips  hadoop 
Mar
26th
Fri
permalink

tip: hadoop

to call a url upon job completion, use -Djob.end.notification.url=’http://…?jobid=&jobStatus=’

tips  hadoop 
Mar
23rd
Tue
permalink

tip: hadoop

to update a counter in streaming, write reporter:counter:,, to stderr http://tinyurl.com/ycwx9en
tips  hadoop 
May
13th
Wed
permalink

tip: hadoop

use job.end.notification.url to specify a url to be called at job completion http://bit.ly/3PMKd

tips  hadoop 
Apr
30th
Thu
permalink

tip: hadoop

streaming jobs can access jobconf vars, e.g. mapred.map.tasks->$MAPRED_MAP_TASKS http://bit.ly/3wEkTZ (via @tlipcon)

tips  hadoop