1、shuffle流程演变
- Spark 0.8及以前 Hash Based Shuffle
- Spark 0.8.1 为Hash Based Shuffle引入File Consolidation机制
- Spark 1.1 引入Sort Based Shuffle,但默认仍为Hash Based Shuffle
- Spark 1.2 默认的Shuffle方式改为Sort Based Shuffle
- Spark 2.0 Hash Based Shuffle退出历史舞台
2、Hash Based Shuffle
- 未引入Consolidation前
- 引入Consolidation后
3、Sort Based Shuffle
4、Shuffle Writer的三种方式
- Shuffle Writer有ByPassMergeSortShuffleWriter、UnSafeShuffleWriter、SortShuffleWriter
- 三种Shuffle Writer的选择方式: