Abstract:
Heteroscedasticity is often neglected during the analysis of high dimensional data but can pose a problem
when using modern regression methods such as Bayesian Additive Regression Trees (BART), Random
Forest, and Neural Networks. When predictors are more than the sample size, the BART model was
computationally expensive and tedious; this prompted the development of the Bayesian Additive
Regression Trees using Bayesian Modelling Average (BART-BMA) as an alternative. However, none of
these methods are effective when heteroscedasticity is present in the data. We solved this challenge by
modeling the variance component of the BART-BMA using multiplicative trees called the Modified
Bayesian Additive Regression Trees (MBART). Heteroscedasticity was introduced into the data structure
used by Friedman in 1991. The model's effectiveness was tested by applying the method to out-of-sample
data, five-fold cross-validation, and a real-life dataset obtained from the University of California Machine
Learning Cloud Storage, at different levels of sample sizes, trees, and predictor variables. The results
indicated that MBART performed better than Random Forest, BART, and BART-BMA when applied to
simulated out-of-sample data, the real-life data, and fivefold cross-validation at the levels of sample sizes
50, 100, and 500 and tree sizes (5, 10, 25, and 50).