问题描述
我是Hadoop和Java的新手。所以,请忍受我。
我能够使mapreduce与.tsv
文件一起使用,但似乎无法使其与.csv
文件一起使用。
这是代码:
package question5;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class FreqMapper extends Mapper<LongWritable,Text,IntWritable>{
@Override
public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException{
/*
* When the file is inputed,the first line is read.
* The first line in this case,are the headers,which we do not want.
* Since the input is split into a key-pair structure,we only need to skip key 0.
* As seen below.
* */
if(key.get()==0) {
return;
}else {
/*
* After skipping the first line,we extract the necessary data to be mapped into our desired
* key-pair structure.
*
* In this case,channel_title -> likes
* channel_title being Text data type
* likes being IntWritable data type
*
* The data is split at the comma.
* */
String line = value.toString();
Text channel_name = new Text(line.split(",")[3]);
IntWritable likes = new IntWritable(Integer.parseInt(line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)")[8]));
context.write(channel_name,likes);
}
}
}
当我要访问索引8的拆分数组时,问题发生在IntWritable。发生IndexOutOfBoundsException。我测试了正则表达式,它工作正常,如此处https://regex101.com/r/J3P6xQ/1
任何建议都将受到欢迎。谢谢您的阅读。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)