Mitchell R, Frank E. (2017) Accelerating the XGBoost algorithm using GPU computing. PeerJ Computer Science 3:e127 https://doi.org/10.7717/peerj-cs.127
Zhang, Si, Hsien, GPU acceleration for Large Scale Tree Boosting
Zhang, Si, Hsien, GPU acceleration for Large Scale Tree Boosting
With data size over 1-10 million range, I have observed GPU acceleration speeding up training time by up to 5-10x, while offering comparable accuracy, which is a big boost to data science work. However, the most significant constraint with the use of GPU for machine learning is GPU memory, which prevents modeling with larger data sets.
Know how much memory your GPU has
Whether you are on AWS, Google or Azure, chances are we have the same memory constraint based on Nvidia GPU, the most common hardware for machine learning is Tesla V100 with 16G GPU memory, which can fit the ballpark data size of around 10 million records. Actual milage would vary with data set and how you tuned it, more on this later.
It should be noted although multi-GPU option exists, it does nothing to alleviate per GPU memory constraint, since data set needs to be fit on a single GPU.
Gradient Boosting parameters affecting GPU memory footprint
With implementation such as GPU version of xgboost, certain parameters affect memory allocation, therefore determines whether your data set will fit (otherwise you will get a memory error). The following parameters have an effect on whether your data set will fit on limited GPU memory, so you will need to budget available memory and allocate wisely. For example, you don't want to set parameters values unnecessarily high and causing over-allocation in certain dimensions.Max Bin
Research in gradient boosting shows histogram method, which turns continuous feature value into discrete points for eval, can be equally effective, and it leads to more efficient implementation on GPU. Max_bin is a parameter that defines the maximum number of discrete points to evaluate the feature on.Reducing from default value of 256 to 64, 32, or even 16 will reduce GPU memory required. You can run comparison tests on smaller data sets and compare evaluation metrics in order to determine impact on model performance.
Max depth, Boost round, Early Stopping
These parameters determines the number of trees to build and depth, which directly affects amount of memory allocated, so you want to put some thoughts into adjusting. Maybe start with an iterative approach and setting parameters how enough to fit the model first, then combine with parameter optimization, push the boundary of maximum memory utilization with fine tuning.
CPU Predictor
Even with GPU based training, you can still set CPU to be used for prediction, which reduces GPU memory required.
'predictor':'cpu_predictor',
'predictor':'cpu_predictor',
 
 
No comments:
Post a Comment