Review the following data and Pig code.
M,38,95111 -
F,29,95060 -
F,45,95192 -
M,62,95102 -
F,56,95102 -
A = LOAD 'data' USING PigStorage('.') as (gender:Chararray, age:int, zlp:chararray);
B = FOREACH A GENERATE age;
Which one of the following commands would save the results of B to a folder in hdfs named myoutput?
Answer : C
You need to create a job that does frequency analysis on input data. You will do this by writing a Mapper that uses TextInputFormat and splits each value (a line of text from an input file) into individual characters. For each one of these characters, you will emit the character as a key and an InputWritable as the value. As this will produce proportionally more intermediate data than input data, which two resources should you expect to be bottlenecks?
Answer : B
MapReduce v2 (MRv2/YARN) is designed to address which two issues?
Answer : A,B
Reference: Apache Hadoop YARN – Concepts & Applications
What are the TWO main components of the YARN ResourceManager process? Choose 2 answers
Answer : C,D
Given a directory of files with the following structure: line number, tab character, string:
Example:
1abialkjfjkaoasdfjksdlkjhqweroij
2kadfjhuwqounahagtnbvaswslmnbfgy
3kjfteiomndscxeqalkzhtopedkfsikj
You want to send each line as one record to your Mapper. Which InputFormat should you use to complete the line: conf.setInputFormat (____.class) ; ?
Answer : C
Explanation:
http://stackoverflow.com/questions/9721754/how-to-parse-customwritable-from-text-in- hadoop