Oct 5, Data Mining Team (Student Seminar Series)
Student Seminar Series
presented by the DATA MINING TEAM
DATE: Wednesday October 5, 2016, 12:00pm
LOCATION: MSB 1147 (Colloquium Room). Pizza will be served.
SPEAKERS: Data Mining Team, Dept Statistics UC Davis
TITLE: “Predicting Return Quantity for a Fashion Distributor via Gradient Boosting and Ensemble Learning”
ABSTRACT: In this talk, we will present our approach in the Data Mining Cup (DMC) 2016, where the task is to predict return quantity for a fashion distributor. The order data and the related return data were recorded over a two-year period, which are used to build a model enabling a good prediction of return quantity. From the raw data, we extracted a number of features with a strong predictive power and good interpretability. We transformed the data such that the problem is reduced to binary classification, and thus the computational cost is significantly lowered. Several state-of-the-art predictive models are fitted to the data, including regularized logistic regression, random forest, gradient boosting and deep learning, where gradient boosting gives the best predictive accuracy. To further improve the predictive accuracy and leverage the strength of each model, we applied three-layer stacked generalization to combine the predictions from these models.