How much data are enough? A statistical approach with case study on longitudinal driving behavior

Wenshuo Wang, Chang Liu, Ding Zhao*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

88 Citations (Scopus)

Abstract

Big data has shown its uniquely powerful ability to reveal, model, and understand driver behaviors. The amount of data affects the experiment cost and conclusions in the analysis. Insufficient data may lead to inaccurate models, whereas excessive data lead to waste resources. For projects that cost millions of dollars, it is critical to determine the right amount of data needed. However, how to decide the appropriate amount has not been fully studied in the realm of driver behaviors. This paper systematically investigates this issue to estimate how much naturalistic driving data (NDD) is needed for understanding driver behaviors from a statistical point of view. A general assessment method is proposed using a Gaussian kernel density estimation to catch the underlying characteristics of driver behaviors. We then apply the Kullback-Leibler divergence method to measure the similarity between density functions with differing amounts of NDD. A max-minimum approach is used to compute the appropriate amount of NDD. To validate our proposed method, we investigated the car-following case using NDD collected from the University of Michigan Safety Pilot Model Deployment program. We demonstrate that from a statistical perspective, the proposed approach can provide an appropriate amount of NDD capable of capturing most features of the normal car-following behavior, which is consistent with the experiment settings in many literatures.

Original languageEnglish
Article number7959200
Pages (from-to)85-98
Number of pages14
JournalIEEE Transactions on Intelligent Vehicles
Volume2
Issue number2
DOIs
Publication statusPublished - Jun 2017

Keywords

  • Car-following behaviors
  • Kernel density estimation
  • Kullback-Liebler divergence
  • Modeling driver behaviors
  • Naturalistic driving data

Fingerprint

Dive into the research topics of 'How much data are enough? A statistical approach with case study on longitudinal driving behavior'. Together they form a unique fingerprint.

Cite this