Let us understand how to compress the data while importing data using sqoop import
- We can enable the compression by using –compress
- Default compression with be deflate
- We can pass a compression algorithm using –compression-codec.
- We can review io.compression.codecs property in core-site.xml to get list of valid compression algorithms that can be used.
- All compression algorithms might not be compatible with all file formats and hence it is important to use only compatible compression algorithms based on the file formats used.
- Here is the example of sqoop import command to compress the data using default compression algorithm.
sqoop import \
--connect "jdbc:mysql://ms.itversity.com:3306/retail_db" \
--username retail_user \
--password itversity \
--table order_items \
--warehouse-dir /user/training/sqoop_import/retail_db \
--delete-target-dir \
--compress
Here is the example of sqoop import command to compress the data using snappy compression algorithm.
sqoop import \
--connect "jdbc:mysql://ms.itversity.com:3306/retail_db" \
--username retail_user \
--password itversity \
--table order_items \
--warehouse-dir /user/training/sqoop_import/retail_db \
--delete-target-dir \
--compress \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec