-
Metropolitan Scale and Longitudinal Dataset of Anonymized Human Mobility Trajectories
Authors:
Takahiro Yabe,
Kota Tsubouchi,
Toru Shimizu,
Yoshihide Sekimoto,
Kaoru Sezaki,
Esteban Moro,
Alex Pentland
Abstract:
Modeling and predicting human mobility trajectories in urban areas is an essential task for various applications. The recent availability of large-scale human movement data collected from mobile devices have enabled the development of complex human mobility prediction models. However, human mobility prediction methods are often trained and tested on different datasets, due to the lack of open-sour…
▽ More
Modeling and predicting human mobility trajectories in urban areas is an essential task for various applications. The recent availability of large-scale human movement data collected from mobile devices have enabled the development of complex human mobility prediction models. However, human mobility prediction methods are often trained and tested on different datasets, due to the lack of open-source large-scale human mobility datasets amid privacy concerns, posing a challenge towards conducting fair performance comparisons between methods. To this end, we created an open-source, anonymized, metropolitan scale, and longitudinal (90 days) dataset of 100,000 individuals' human mobility trajectories, using mobile phone location data. The location pings are spatially and temporally discretized, and the metropolitan area is undisclosed to protect users' privacy. The 90-day period is composed of 75 days of business-as-usual and 15 days during an emergency. To promote the use of the dataset, we will host a human mobility prediction data challenge (`HuMob Challenge 2023') using the human mobility dataset, which will be held in conjunction with ACM SIGSPATIAL 2023.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
GEO-BLEU: Similarity Measure for Geospatial Sequences
Authors:
Toru Shimizu,
Kota Tsubouchi,
Takahiro Yabe
Abstract:
In recent geospatial research, the importance of modeling large-scale human mobility data and predicting trajectories is rising, in parallel with progress in text generation using large-scale corpora in natural language processing. Whereas there are already plenty of feasible approaches applicable to geospatial sequence modeling itself, there seems to be room to improve with regard to evaluation,…
▽ More
In recent geospatial research, the importance of modeling large-scale human mobility data and predicting trajectories is rising, in parallel with progress in text generation using large-scale corpora in natural language processing. Whereas there are already plenty of feasible approaches applicable to geospatial sequence modeling itself, there seems to be room to improve with regard to evaluation, specifically about measuring the similarity between generated and reference trajectories. In this work, we propose a novel similarity measure, GEO-BLEU, which can be especially useful in the context of geospatial sequence modeling and generation. As the name suggests, this work is based on BLEU, one of the most popular measures used in machine translation research, while introducing spatial proximity to the idea of n-gram. We compare this measure with an established baseline, dynamic time warping, applying it to actual generated geospatial sequences. Using crowdsourced annotated data on the similarity between geospatial sequences collected from over 12,000 cases, we quantitatively and qualitatively show the proposed method's superiority.
△ Less
Submitted 31 October, 2022; v1 submitted 13 December, 2021;
originally announced December 2021.
-
Nation-wide Mood: Large-scale Estimation of People's Mood from Web Search Query and Mobile Sensor Data
Authors:
Wataru Sasaki,
Hiroshi Kawane,
Satoko Miyahara,
Kota Tsubouchi,
Tadashi Okoshi
Abstract:
The ability to estimate the current affective statuses of web users has considerable potential for the realization of user-centric services in the society. However, in real-world web services, it is difficult to determine the type of data to be used for such estimation, as well as collecting the ground truths of such affective statuses. We propose a novel method of such estimation based on the com…
▽ More
The ability to estimate the current affective statuses of web users has considerable potential for the realization of user-centric services in the society. However, in real-world web services, it is difficult to determine the type of data to be used for such estimation, as well as collecting the ground truths of such affective statuses. We propose a novel method of such estimation based on the combined use of user web search queries and mobile sensor data. The system was deployed in our product server stack, and a large-scale data analysis with more than 11,000,000 users was conducted. Interestingly, our proposed "Nation-wide Mood Score," which bundles the mood values of users across the country, (1) shows the daily and weekly rhythm of people's moods, (2) explains the ups and downs of people's moods in the COVID-19 pandemic, which is inversely synchronized to the number of new COVID-19 cases, and (3) detects the linkage with big news, which may affect many user's mood states simultaneously, even in a fine-grained time resolution, such as the order of hours.
△ Less
Submitted 10 November, 2021; v1 submitted 10 November, 2021;
originally announced November 2021.
-
Multiwave COVID-19 Prediction from Social Awareness using Web Search and Mobility Data
Authors:
J. Xue,
T. Yabe,
K. Tsubouchi,
J. Ma,
S. V. Ukkusuri
Abstract:
Recurring outbreaks of COVID-19 have posed enduring effects on global society, which calls for a predictor of pandemic waves using various data with early availability. Existing prediction models that forecast the first outbreak wave using mobility data may not be applicable to the multiwave prediction, because the evidence in the USA and Japan has shown that mobility patterns across different wav…
▽ More
Recurring outbreaks of COVID-19 have posed enduring effects on global society, which calls for a predictor of pandemic waves using various data with early availability. Existing prediction models that forecast the first outbreak wave using mobility data may not be applicable to the multiwave prediction, because the evidence in the USA and Japan has shown that mobility patterns across different waves exhibit varying relationships with fluctuations in infection cases. Therefore, to predict the multiwave pandemic, we propose a Social Awareness-Based Graph Neural Network (SAB-GNN) that considers the decay of symptom-related web search frequency to capture the changes in public awareness across multiple waves. Our model combines GNN and LSTM to model the complex relationships among urban districts, inter-district mobility patterns, web search history, and future COVID-19 infections. We train our model to predict future pandemic outbreaks in the Tokyo area using its mobility and web search data from April 2020 to May 2021 across four pandemic waves collected by Yahoo Japan Corporation under strict privacy protection rules. Results demonstrate our model outperforms state-of-the-art baselines such as ST-GNN, MPNN, and GraphLSTM. Though our model is not computationally expensive (only 3 layers and 10 hidden neurons), the proposed model enables public agencies to anticipate and prepare for future pandemic outbreaks.
△ Less
Submitted 9 June, 2022; v1 submitted 22 October, 2021;
originally announced October 2021.
-
NationalMood: Large-scale Estimation of People's Mood from Web Search Query and Mobile Sensor Data
Authors:
Tadashi Okoshi,
Wataru Sasaki,
Hiroshi Kawane,
Kota Tsubouchi
Abstract:
The ability to estimate current affective statuses of web users has considerable potential towards the realization of user-centric opportune services. However, determining the type of data to be used for such estimation as well as collecting the ground truth of such affective statuses are difficult in the real world situation. We propose a novel way of such estimation based on a combinational use…
▽ More
The ability to estimate current affective statuses of web users has considerable potential towards the realization of user-centric opportune services. However, determining the type of data to be used for such estimation as well as collecting the ground truth of such affective statuses are difficult in the real world situation. We propose a novel way of such estimation based on a combinational use of user's web search queries and mobile sensor data. Our large-scale data analysis with about 11,000,000 users and 100 recent advertisement log revealed (1) the existence of certain class of advertisement to which mood-status-based delivery would be significantly effective, (2) that our "National Mood Score" shows the ups and downs of people's moods in COVID-19 pandemic that inversely correlated to the number of patients, as well as the weekly mood rhythm of people.
△ Less
Submitted 2 November, 2020; v1 submitted 1 November, 2020;
originally announced November 2020.
-
Early Warning of COVID-19 Hotspots using Mobility of High Risk Users from Web Search Queries
Authors:
Takahiro Yabe,
Kota Tsubouchi,
Satish V Ukkusuri
Abstract:
COVID-19 has disrupted the global economy and well-being of people at an unprecedented scale and magnitude. To contain the disease, an effective early warning system that predicts the locations of outbreaks is of crucial importance. Studies have shown the effectiveness of using large-scale mobility data to monitor the impacts of non-pharmaceutical interventions (e.g., lockdowns) through population…
▽ More
COVID-19 has disrupted the global economy and well-being of people at an unprecedented scale and magnitude. To contain the disease, an effective early warning system that predicts the locations of outbreaks is of crucial importance. Studies have shown the effectiveness of using large-scale mobility data to monitor the impacts of non-pharmaceutical interventions (e.g., lockdowns) through population density analysis. However, predicting the locations of potential outbreak occurrence is difficult using mobility data alone. Meanwhile, web search queries have been shown to be good predictors of the disease spread. In this study, we utilize a unique dataset of human mobility trajectories (GPS traces) and web search queries with common user identifiers (> 450K users), to predict COVID-19 hotspot locations beforehand. More specifically, web search query analysis is conducted to identify users with high risk of COVID-19 contraction, and social contact analysis was further performed on the mobility patterns of these users to quantify the risk of an outbreak. Our approach is empirically tested using data collected from users in Tokyo, Japan. We show that by integrating COVID-19 related web search query analytics with social contact networks, we are able to predict COVID-19 hotspot locations 1-2 weeks beforehand, compared to just using social contact indexes or web search data analysis. This study proposes a novel method that can be used in early warning systems for disease outbreak hotspots, which can assist government agencies to prepare effective strategies to prevent further disease spread.
△ Less
Submitted 25 October, 2020;
originally announced October 2020.
-
Syndromic surveillance using search query logs and user location information from smartphones against COVID-19 clusters in Japan
Authors:
Shohei Hisada,
Taichi Murayama,
Kota Tsubouchi,
Sumio Fujita,
Shuntaro Yada,
Shoko Wakamiya,
Eiji Aramaki
Abstract:
[Background] Two clusters of coronavirus disease 2019 (COVID-19) were confirmed in Hokkaido, Japan in February 2020. To capture the clusters, this study employs Web search query logs and user location information from smartphones. [Material and Methods] First, we anonymously identified smartphone users who used a Web search engine (Yahoo! JAPAN Search) for the COVID-19 or its symptoms via its comp…
▽ More
[Background] Two clusters of coronavirus disease 2019 (COVID-19) were confirmed in Hokkaido, Japan in February 2020. To capture the clusters, this study employs Web search query logs and user location information from smartphones. [Material and Methods] First, we anonymously identified smartphone users who used a Web search engine (Yahoo! JAPAN Search) for the COVID-19 or its symptoms via its companion application for smartphones (Yahoo Japan App). We regard these searchers as Web searchers who are suspicious of their own COVID-19 infection (WSSCI). Second, we extracted the location of the WSSCI via the smartphone application. The spatio-temporal distribution of the number of WSSCI are compared with the actual location of the known two clusters. [Result and Discussion] Before the early stage of the cluster development, we could confirm several WSSCI, which demonstrated the basic feasibility of our WSSCI-based approach. However, it is accurate only in the early stage, and it was biased after the public announcement of the cluster development. For the case where the other cluster-related resources, such as fine-grained population statistics, are not available, the proposed metric would be helpful to catch the hint of emerging clusters.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
Learning Fine Grained Place Embeddings with Spatial Hierarchy from Human Mobility Trajectories
Authors:
Toru Shimizu,
Takahiro Yabe,
Kota Tsubouchi
Abstract:
Place embeddings generated from human mobility trajectories have become a popular method to understand the functionality of places. Place embeddings with high spatial resolution are desirable for many applications, however, downscaling the spatial resolution deteriorates the quality of embeddings due to data sparsity, especially in less populated areas. We address this issue by proposing a method…
▽ More
Place embeddings generated from human mobility trajectories have become a popular method to understand the functionality of places. Place embeddings with high spatial resolution are desirable for many applications, however, downscaling the spatial resolution deteriorates the quality of embeddings due to data sparsity, especially in less populated areas. We address this issue by proposing a method that generates fine grained place embeddings, which leverages spatial hierarchical information according to the local density of observed data points. The effectiveness of our fine grained place embeddings are compared to baseline methods via next place prediction tasks using real world trajectory data from 3 cities in Japan. In addition, we demonstrate the value of our fine grained place embeddings for land use classification applications. We believe that our technique of incorporating spatial hierarchical information can complement and reinforce various place embedding generating methods.
△ Less
Submitted 5 February, 2020;
originally announced February 2020.
-
City2City: Translating Place Representations across Cities
Authors:
Takahiro Yabe,
Kota Tsubouchi,
Toru Shimizu,
Yoshihide Sekimoto,
Satish V. Ukkusuri
Abstract:
Large mobility datasets collected from various sources have allowed us to observe, analyze, predict and solve a wide range of important urban challenges. In particular, studies have generated place representations (or embeddings) from mobility patterns in a similar manner to word embeddings to better understand the functionality of different places within a city. However, studies have been limited…
▽ More
Large mobility datasets collected from various sources have allowed us to observe, analyze, predict and solve a wide range of important urban challenges. In particular, studies have generated place representations (or embeddings) from mobility patterns in a similar manner to word embeddings to better understand the functionality of different places within a city. However, studies have been limited to generating such representations of cities in an individual manner and has lacked an inter-city perspective, which has made it difficult to transfer the insights gained from the place representations across different cities. In this study, we attempt to bridge this research gap by treating \textit{cities} and \textit{languages} analogously. We apply methods developed for unsupervised machine language translation tasks to translate place representations across different cities. Real world mobility data collected from mobile phone users in 2 cities in Japan are used to test our place representation translation methods. Translated place representations are validated using landuse data, and results show that our methods were able to accurately translate place representations from one city to another.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
VLUC: An Empirical Benchmark for Video-Like Urban Computing on Citywide Crowd and Traffic Prediction
Authors:
Renhe Jiang,
Zekun Cai,
Zhaonan Wang,
Chuang Yang,
Zipei Fan,
Xuan Song,
Kota Tsubouchi,
Ryosuke Shibasaki
Abstract:
Nowadays, massive urban human mobility data are being generated from mobile phones, car navigation systems, and traffic sensors. Predicting the density and flow of the crowd or traffic at a citywide level becomes possible by using the big data and cutting-edge AI technologies. It has been a very significant research topic with high social impact, which can be widely applied to emergency management…
▽ More
Nowadays, massive urban human mobility data are being generated from mobile phones, car navigation systems, and traffic sensors. Predicting the density and flow of the crowd or traffic at a citywide level becomes possible by using the big data and cutting-edge AI technologies. It has been a very significant research topic with high social impact, which can be widely applied to emergency management, traffic regulation, and urban planning. In particular, by meshing a large urban area to a number of fine-grained mesh-grids, citywide crowd and traffic information in a continuous time period can be represented like a video, where each timestamp can be seen as one video frame. Based on this idea, a series of methods have been proposed to address video-like prediction for citywide crowd and traffic. In this study, we publish a new aggregated human mobility dataset generated from a real-world smartphone application and build a standard benchmark for such kind of video-like urban computing with this new dataset and the existing open datasets. We first comprehensively review the state-of-the-art works of literature and formulate the density and in-out flow prediction problem, then conduct a thorough performance assessment for those methods. With this benchmark, we hope researchers can easily follow up and quickly launch a new solution on this topic.
△ Less
Submitted 16 November, 2019;
originally announced November 2019.
-
Predicting Evacuation Decisions using Representations of Individuals' Pre-Disaster Web Search Behavior
Authors:
Takahiro Yabe,
Kota Tsubouchi,
Toru Shimizu,
Yoshihide Sekimoto,
Satish V. Ukkusuri
Abstract:
Predicting the evacuation decisions of individuals before the disaster strikes is crucial for planning first response strategies. In addition to the studies on post-disaster analysis of evacuation behavior, there are various works that attempt to predict the evacuation decisions beforehand. Most of these predictive methods, however, require real time location data for calibration, which are becomi…
▽ More
Predicting the evacuation decisions of individuals before the disaster strikes is crucial for planning first response strategies. In addition to the studies on post-disaster analysis of evacuation behavior, there are various works that attempt to predict the evacuation decisions beforehand. Most of these predictive methods, however, require real time location data for calibration, which are becoming much harder to obtain due to the rising privacy concerns. Meanwhile, web search queries of anonymous users have been collected by web companies. Although such data raise less privacy concerns, they have been under-utilized for various applications. In this study, we investigate whether web search data observed prior to the disaster can be used to predict the evacuation decisions. More specifically, we utilize a "session-based query encoder" that learns the representations of each user's web search behavior prior to evacuation. Our proposed approach is empirically tested using web search data collected from users affected by a major flood in Japan. Results are validated using location data collected from mobile phones of the same set of users as ground truth. We show that evacuation decisions can be accurately predicted (84%) using only the users' pre-disaster web search data as input. This study proposes an alternative method for evacuation prediction that does not require highly sensitive location data, which can assist local governments to prepare effective first response strategies.
△ Less
Submitted 18 June, 2019;
originally announced June 2019.
-
Universality of population recovery patterns after disasters
Authors:
Takahiro Yabe,
Kota Tsubouchi,
Naoya Fujiwara,
Yoshihide Sekimoto,
Satish V. Ukkusuri
Abstract:
Despite the rising importance of enhancing community resilience to disasters, our understanding on how communities recover from catastrophic events is limited. Here we study the population recovery dynamics of disaster affected regions by observing the movements of over 2.5 million mobile phone users across three countries before, during and after five major disasters. We find that, although the r…
▽ More
Despite the rising importance of enhancing community resilience to disasters, our understanding on how communities recover from catastrophic events is limited. Here we study the population recovery dynamics of disaster affected regions by observing the movements of over 2.5 million mobile phone users across three countries before, during and after five major disasters. We find that, although the regions affected by the five disasters have significant differences in socio-economic characteristics, we observe a universal recovery pattern where displaced populations return in an exponential manner after all disasters. Moreover, the heterogeneity in initial and long-term displacement rates across communities across the three countries were explained by a set of key universal factors including the community's median income level, population size, housing damage rate, and the connectedness to other cities. These universal properties of recovery dynamics extracted from large scale evidence could impact efforts on urban resilience and sustainability across various disciplines.
△ Less
Submitted 5 May, 2019;
originally announced May 2019.
-
Cross-comparative analysis of evacuation behavior after earthquakes using mobile phone data
Authors:
Takahiro Yabe,
Yoshihide Sekimoto,
Kota Tsubouchi,
Satoshi Ikemoto
Abstract:
Despite the importance of predicting evacuation mobility dynamics after large scale disasters for effective first response and disaster relief, our general understanding of evacuation behavior remains limited because of the lack of empirical evidence on the evacuation movement of individuals across multiple disaster instances. Here we investigate the GPS trajectories of a total of more than 1 mill…
▽ More
Despite the importance of predicting evacuation mobility dynamics after large scale disasters for effective first response and disaster relief, our general understanding of evacuation behavior remains limited because of the lack of empirical evidence on the evacuation movement of individuals across multiple disaster instances. Here we investigate the GPS trajectories of a total of more than 1 million anonymized mobile phone users whose positions are tracked for a period of 2 months before and after four of the major earthquakes that occurred in Japan. Through a cross comparative analysis between the four disaster instances, we find that in contrast with the assumed complexity of evacuation decision making mechanisms in crisis situations, the individuals' evacuation probability is strongly dependent on the seismic intensity that they experience. In fact, we show that the evacuation probabilities in all earthquakes collapse into a similar pattern, with a critical threshold at around seismic intensity 5.5. This indicates that despite the diversity in the earthquakes profiles and urban characteristics, evacuation behavior is similarly dependent on seismic intensity. Moreover, we found that probability density functions of the distances that individuals evacuate are not dependent on seismic intensities that individuals experience. These insights from empirical analysis on evacuation from multiple earthquake instances using large scale mobility data contributes to a deeper understanding of how people react to earthquakes, and can potentially assist decision makers to simulate and predict the number of evacuees in urban areas with little computational time and cost, by using population density information and seismic intensity which can be observed instantaneously after the shock.
△ Less
Submitted 8 November, 2018;
originally announced November 2018.