Sunday, May 10, 2015

HBase ignore comma while using bulk loading importTSV

HBase simply ignore Text while importing the same with CSV file, and the best part it didn't even inform you.
Entire job will be passed , but your HBase table won't have any data or partial data , like if any column has some values

"this text can be uploaded , but it has more" , then till uploaded it will be there in HBase Table cell , then rest of the contents are gone.
This is because I was importing TSV with seperator comma (,) and that lead to import engine to ignore comma among the csv cell.



It took 32 YARN jobs to figure out the actual issue.

Import CSV command -


create ‘bigdatatwitter’,’main’,’detail’,’other’


hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns="HBASE_ROW_KEY,other:contributors,main:truncated,main:text,main:in_reply_to_status_id,main:id,main:favorite_count,main:source,detail:retweeted,detail:coordinates,detail:entities,detail:in_reply_to_screen_name,detail:in_reply_to_user_id,detail:retweet_count,detail:id_str,detail:favorited,detail:retweeted_status,other:user,other:geo,other:in_reply_to_user_id_str,other:possibly_sensitive,other:lang,detail:created_at,other:in_reply_to_status_id_str,detail:place,detail:metadata" -Dimporttsv.separator="," bigdatatwitter file:///Users/abhishekchoudhary/PycharmProjects/DeepLearning/AllMix/bigdata3.csv

No comments: