The quality of data is a paramount factor for the reliability and robustness of travel demand analyses. In recent years, big data sources have been gaining popularity among transport modelers thanks to the spread of smart wearable devices, which allow to passively collect data for an impressive number of travelers (Anda et al., 2017) (Willumsen, 2021). Smartphones, in particular, can be used to retrieve full individual trajectories either through apps using GPS sensors or by the Mobile Network Operator (MNO) tracing the connections to mobile antennas along travelers’ routes. Several studies proved that these types of data can be used to infer travelers’ door-to-door trips, and to estimate origin-destination (OD) matrices, which are a key feedstock for transport modeling and planning (Tolouei et al., 2017) (Wismans et al., 2018). Some MNOs do already commercialize such services. However, the demand estimates offered are in general very aggregated; typically, they allow travel density analysis in space and time (e.g. ‘heat map’) but they do not include any trip-related variables such as mode of transport and travel purpose. In few cases, additional information is obtained by means of heuristic considerations and pre-determined rules, that might be affected by prior expectation and biases. The study presented in this paper aims at exploring the potential of machine learning techniques for data-driven disaggregate demand estimation, particularly OD matrices by travel purpose, using different clustering techniques. The methodological approach consists of three stages: a) data provisioning, b) data processing and c) clustering tests. Since no MNO or app dataset is publicly available, the data provisioning stage have been emulated with some volunteers who were provided with a dedicated app to trace their location and to record the trajectories of their travels for a two-weeks period. The data processing stage considered the application of well-established techniques to convert trajectories into trips and trip chains, followed by a feature engineering process to unbound trips from the geographical location of their origins and destinations, but rather to keep them linked to the corresponding travelers’ habits (e.g. frequency of trips to a given destination made by the specific user, the time span between two subsequent trips, etc.). These headless trips were the input data points for the clustering tests, particularly hierarchical clustering (distance-based) and DBSCAN (density-based), carried out with different selections of attributes, normalization strategies and learning algorithms. Despite the limitations given by the small sample of users and the reduced or altered mobility habits due to COVID-19 pandemics, clustering algorithms (both distance- and density-based) managed to find meaningful results, with the identification of the travel purposes, such as trips to home with or without an overnight stay, to occasional destinations, to work, and to holiday stays. These preliminary but promising results suggest that machine learning does have great potentialities in mobility analysis, and it could likely be employed to estimate OD matrices with a high level of disaggregation and high quality suitable for transport modeling and analyses. References: Anda C., Erath A., and Fourie P. J., “Transport modelling in the age of big data”, Int. J. Urban Sci., vol. 21, no. sup1, pp. 19–42, Aug. 2017, doi: 10.1080/12265934.2017.1281150. Tolouei R., Psarras S., and Prince R., “Origin-Destination Trip Matrix Development: Conventional Methods versus Mobile Phone Data”, Transp. Res. Procedia, vol. 26, pp. 39–52, 2017, doi: 10.1016/j.trpro.2017.07.007. Willumsen L., Use of Big Data in Transport Modelling, vol. No. 2021/05. Paris: OECD Publishing. Wismans L. J. J., Friso K., Rijsdijk J., de Graaf S. W., and Keij J., “Improving A Priori Demand Estimates Transport Models using Mobile Phone Data: A Rotterdam-Region Case”, J. Urban Technol., vol. 25, no. 2, pp. 63–83, Apr. 2018, doi: 10.1080/10630732.2018.1442075.

Machine learning for data-driven disaggregate travel demand analysis

F. Silvestri;L. Barbierato
2022-01-01

Abstract

The quality of data is a paramount factor for the reliability and robustness of travel demand analyses. In recent years, big data sources have been gaining popularity among transport modelers thanks to the spread of smart wearable devices, which allow to passively collect data for an impressive number of travelers (Anda et al., 2017) (Willumsen, 2021). Smartphones, in particular, can be used to retrieve full individual trajectories either through apps using GPS sensors or by the Mobile Network Operator (MNO) tracing the connections to mobile antennas along travelers’ routes. Several studies proved that these types of data can be used to infer travelers’ door-to-door trips, and to estimate origin-destination (OD) matrices, which are a key feedstock for transport modeling and planning (Tolouei et al., 2017) (Wismans et al., 2018). Some MNOs do already commercialize such services. However, the demand estimates offered are in general very aggregated; typically, they allow travel density analysis in space and time (e.g. ‘heat map’) but they do not include any trip-related variables such as mode of transport and travel purpose. In few cases, additional information is obtained by means of heuristic considerations and pre-determined rules, that might be affected by prior expectation and biases. The study presented in this paper aims at exploring the potential of machine learning techniques for data-driven disaggregate demand estimation, particularly OD matrices by travel purpose, using different clustering techniques. The methodological approach consists of three stages: a) data provisioning, b) data processing and c) clustering tests. Since no MNO or app dataset is publicly available, the data provisioning stage have been emulated with some volunteers who were provided with a dedicated app to trace their location and to record the trajectories of their travels for a two-weeks period. The data processing stage considered the application of well-established techniques to convert trajectories into trips and trip chains, followed by a feature engineering process to unbound trips from the geographical location of their origins and destinations, but rather to keep them linked to the corresponding travelers’ habits (e.g. frequency of trips to a given destination made by the specific user, the time span between two subsequent trips, etc.). These headless trips were the input data points for the clustering tests, particularly hierarchical clustering (distance-based) and DBSCAN (density-based), carried out with different selections of attributes, normalization strategies and learning algorithms. Despite the limitations given by the small sample of users and the reduced or altered mobility habits due to COVID-19 pandemics, clustering algorithms (both distance- and density-based) managed to find meaningful results, with the identification of the travel purposes, such as trips to home with or without an overnight stay, to occasional destinations, to work, and to holiday stays. These preliminary but promising results suggest that machine learning does have great potentialities in mobility analysis, and it could likely be employed to estimate OD matrices with a high level of disaggregation and high quality suitable for transport modeling and analyses. References: Anda C., Erath A., and Fourie P. J., “Transport modelling in the age of big data”, Int. J. Urban Sci., vol. 21, no. sup1, pp. 19–42, Aug. 2017, doi: 10.1080/12265934.2017.1281150. Tolouei R., Psarras S., and Prince R., “Origin-Destination Trip Matrix Development: Conventional Methods versus Mobile Phone Data”, Transp. Res. Procedia, vol. 26, pp. 39–52, 2017, doi: 10.1016/j.trpro.2017.07.007. Willumsen L., Use of Big Data in Transport Modelling, vol. No. 2021/05. Paris: OECD Publishing. Wismans L. J. J., Friso K., Rijsdijk J., de Graaf S. W., and Keij J., “Improving A Priori Demand Estimates Transport Models using Mobile Phone Data: A Rotterdam-Region Case”, J. Urban Technol., vol. 25, no. 2, pp. 63–83, Apr. 2018, doi: 10.1080/10630732.2018.1442075.
2022
Travel demand, data-driven estimation, unsupervised learning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1227456
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact