“XGBoost: A Scalable Tree Boosting System”, Tianqi Chen, Carlos Guestrin2016-03-09 (; backlinks; similar)⁠:

Tree boosting is a highly effective and widely used machine learning method.

In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges.

We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression, and sharding to build a scalable tree boosting system.

By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.