MapReduce中如何处理SequenceFile的输入文件

0 0

MapReduce中如何处理SequenceFile的输入文件

本人欲实现一个全局排序的MR程序。
问题背景：
原始数据格式为<(Text)name, (IntWritable)score>，现需要按score进行全局排序。
目前实现过程为：分为两个job，设置第一个job转化该原始数据文件为SequenceFile，该SequenceFile的数据格式为<(intWritable)score，(Text)name>，并作为第二个job的输入文件。第二个job进行相应的取样以及其他工作。
问题描述：
目前问到的问题是，将该SequenceFile作为第二个map的输入文件时，在map中该如何得到key、value？
目前是按照如下方法进行的：

  public static class TotalOrderSortMap
  

  extends Mapper<IntWritable, Text, IntWritable, Text> {
  

  private  SequenceFile.Reader reader = null;
  

  

  public void map(IntWritable key, Text value, Context context)
  

  throws IOException, InterruptedException {
  

  IntWritable score = new IntWritable();
  

  Text name = new Text();
  

  //IntWritable score = (IntWritable)ReflectionUtils.newInstance(reader.getKeyClass(), conf);
  

  //Text name = (Text)ReflectionUtils.newInstance(reader.getValueClass(), conf);
  

  while (reader.next(score, name)) {
  

  context.write(score, name);
  

  }
  

  }
  

  }

但总会在运行时出现java.lang.NullPointerException的错误。
Ps：注意，该问题时解决在map中如何处理SequenceFile。
PPs：如果大家有什么其他相关的想法都可以在发表一下见解，先谢谢大家。

————————————————————————————————————————————————————————————————————————
PPPs：刚刚又看了一下，解决了这个问题。在《Hadoop：The Definitive Guide》（第三版修订版）中249页下面有一句话：
“The keys and values are determined by the sequence file, and you need to make sure that your map input types correspond.”，所以我们在map中就不用使用SequenceFile.Reader.next一个键值对一个键值对地读了。对于SequenceFile类型的输入文件，每个Record就是一个键值对。
map中的处理直接为（原样输出）：context(key, value);

hadoop mapreduce

11 years, 5 months ago

samael

samael 11 years, 5 months ago

MapReduce中如何处理SequenceFile的输入文件

samael

Answers

Your Answer