5. Discussion, limitations, and conclusion
The aim of this research is to develop a predictive model for waste generation at the building-level in a dense urban environment, using New York City as a case study. We combine a socio-spatial model of waste generation per capita per week with estimates of the occupant population for each of the more than 750,000 residential buildings in the City. Our best-performing predictive model (GBRT) is able to predict total weekly waste generation for DSNY sub-sections with an outof-sample R-squared value of 0.87. Subsequent models built predicting refuse, MGP, and paper recycling, respectively, also perform well. We find that the variables with the highest feature importance are weather (temperature, precipitation, wind speed, and snow event), residential building type and density, and demographic variables. Weather-related features capture temporal (i.e. seasonal) variations in the data, in addition to capturing weekly anomalous weather activity. Our building prediction model demonstrates high levels of accuracy following two validation processes. In the two collection truck validation cases, the model resulted in 99.8% and 93.9% prediction accuracy, respectively.
Certain data limitations constrain the predictive power of our model, although iterative improvements are expected as additional validation data are acquired. In order to reflect the spatial heterogeneity in waste generation behavior and the propensity of a unit in a building to be occupied, additional features should be considered. Our model could be improved with accurate information on building occupancies at high temporal resolution, particularly accounting for weekly fluctuations in residential population. In addition, specific information on the waste set-out (pick-up/drop-off) point for each building would be useful to match buildings to their true truck route narratives. We are currently working with DSNY to collect these data across selected routes.