Geolife
Geolife Dataset¶
This GPS trajectory dataset was collected in (Microsoft Research Asia) Geolife project by 182 users in a period of over five years (from April 2007 to August 2012). A GPS trajectory of this dataset is represented by sequence of time-stamped points each of which contains the information of altitude, longitude, latitude. The original dataset contains 17,784 trajectories, ~25M Points with a total distance of 1,292,951 kilometers and a total duration of 50,176 hours. These trajectories were recorded by different GPS loggers and GPS phones, and have a variety of sampling rates. 91.5 percent of the trajectories are logged in a dense representation, e.g. every 1~5 seconds or every 5~10 meters per point. The original dataset was filtered and preprocessed, with further details available in the benchmark publication.
Attributes:
- latitude: Latitude in decimal degrees.
- longitude: Longitude in decimal degrees.
- altitude: Altitude in feet (-777 if not valid).
- time: Date and time as a string.
- trajectory_id: ID of trajectory that Point belongs to.
- user_id: ID of user that reported Point.
- crs: WGS 84
import folium
import h3
from IPython.display import display
from srai.datasets import GeolifeDataset
geolife = GeolifeDataset()
type(geolife.train_gdf), type(geolife.test_gdf)
(NoneType, NoneType)
Get data using .load() method -> Default config (Human Mobility Classification)
ds = geolife.load()
ds.keys()
dict_keys(['train', 'test'])
type(ds["train"]), type(ds["test"])
(geopandas.geodataframe.GeoDataFrame, geopandas.geodataframe.GeoDataFrame)
ds["train"].head()
user_id | trajectory_id | avg_altitude_per_hex | timestamp | geometry | h3_sequence_x | h3_sequence_y | |
---|---|---|---|---|---|---|---|
0 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 00011 | [492, 492, 490, 490, 491, 491, 491, 195, 92, 6... | [2008-10-28T08:38:26.000000, 2008-10-28T08:38:... | LINESTRING (116.29707 40.01229, 116.29727 40.0... | [8931aa5051bffff, 8931aa505c7ffff, 8931aa5051b... | [8931aa501a7ffff, 8931aa50cd3ffff, 8931aa52a6b... |
1 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 000122 | [-628, -628, 127, 133, 147, 165, 168, 187, 204... | [2009-05-30T11:44:45.000000, 2009-05-30T11:44:... | LINESTRING (116.31498 40.0087, 116.31508 40.00... | [8931aa52a53ffff, 8931aa52acbffff, 8931aa52a53... | [8931aa50b4fffff, 8931aa50b43ffff, 8931aa50b53... |
2 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 000123 | [79, 79, 94, 102, 98, 166, 138, 163, 112, 119,... | [2009-05-27T16:01:05.000000, 2009-05-27T16:01:... | LINESTRING (116.31975 40.00765, 116.31974 40.0... | [8931aa52a47ffff, 8931aa52a57ffff, 8931aa52a47... | [8931aa4724fffff, 8931aa409b7ffff, 8931aa4724f... |
3 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 000124 | [166, 166, 174, 285, 304, 199, 183, 131, 176, ... | [2009-06-20T10:45:27.000000, 2009-06-20T10:45:... | LINESTRING (116.32737 39.99988, 116.32732 39.9... | [8931aa50cd7ffff, 8931aa50cd3ffff, 8931aa501a7... | [8931aa52a0bffff, 8931aa52a57ffff, 8931aa52a47... |
4 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 000126 | [118, 118, 141, 58, 32, 111, 87, 11] | [2009-04-23T04:36:07.000000, 2009-04-23T04:36:... | LINESTRING (116.3272 39.99999, 116.32737 39.99... | [8931aa50cd7ffff, 8931aa52a6fffff, 8931aa52a63... | [8931aa52a47ffff, 8931aa52a47ffff] |
Creating your own train_test split based on trajectory duration (version TTE) or length version (HMC).
Downloading version all
without passing resolution, will return trajectories as linestring geometries.
ds = geolife.load(version="all")
ds.keys()
dict_keys(['train'])
ds["train"].head()
trajectory_id | user_id | - | altitude | dayNo | datetime | timestamp | geometry | |
---|---|---|---|---|---|---|---|---|
0 | 000103 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [71, 64, 119, 179, 184, 174, 190, 193, 195, 19... | [39996.1010416667, 39996.101099537, 39996.1011... | [2009-07-02T10:25:30.000000, 2009-07-02T10:25:... | [2009-07-02T10:25:30.000000, 2009-07-02T10:25:... | LINESTRING (116.30824 39.9966, 116.30829 39.99... |
1 | 00011 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [492, 490, 490, 490, 490, 490, 490, 491, 491, ... | [39749.0266898148, 39749.0267476852, 39749.026... | [2008-10-28T08:38:26.000000, 2008-10-28T08:38:... | [2008-10-28T08:38:26.000000, 2008-10-28T08:38:... | LINESTRING (116.29707 40.01229, 116.29727 40.0... |
2 | 000119 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [492, 489, 492, 492, 491, 490, 490, 488, 487, ... | [39978.2868402778, 39978.2869097222, 39978.286... | [2009-06-14T14:53:03.000000, 2009-06-14T14:53:... | [2009-06-14T14:53:03.000000, 2009-06-14T14:53:... | LINESTRING (116.18539 40.12348, 116.18543 40.1... |
3 | 000122 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [-628, -199, -41, 106, 97, 100, 105, 117, 118,... | [39963.1560763889, 39963.1561342593, 39963.156... | [2009-05-30T11:44:45.000000, 2009-05-30T11:44:... | [2009-05-30T11:44:45.000000, 2009-05-30T11:44:... | LINESTRING (116.31498 40.0087, 116.31508 40.00... |
4 | 000123 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [79, 94, 102, 104, 98, 102, 101, 106, 107, 107... | [39960.3340856481, 39960.3341435185, 39960.334... | [2009-05-27T16:01:05.000000, 2009-05-27T16:01:... | [2009-05-27T16:01:05.000000, 2009-05-27T16:01:... | LINESTRING (116.31975 40.00765, 116.31974 40.0... |
Passing resolution parameter is neccessary for generation of trajectory in h3 style.
Resolution
parameter is required to create h3 sequences from the linestring geometry.
ds = geolife.load(version="all", resolution=10)
ds.keys()
dict_keys(['train'])
ds["train"].head()
user_id | trajectory_id | h3_sequence | avg_altitude_per_hex | timestamp | geometry | |
---|---|---|---|---|---|---|
0 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 000103 | [8a31aa501107fff, 8a31aa50111ffff, 8a31aa50102... | [71, 71, 211, 201, 201, 187, 191, 172, 168, 15... | [2009-07-02T10:25:30.000000, 2009-07-02T10:25:... | LINESTRING (116.30824 39.9966, 116.30829 39.99... |
1 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 00011 | [8a31aa5051b7fff, 8a31aa505c5ffff, 8a31aa5051b... | [492, 492, 490, 195, 100, 97, 98, 107, 107, 10... | [2008-10-28T08:38:26.000000, 2008-10-28T08:38:... | LINESTRING (116.29707 40.01229, 116.29727 40.0... |
2 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 000119 | [8a31aacb386ffff, 8a31aacb1497fff, 8a31aacb14b... | [492, 492, 490, 453, 174, 141, 140, 147, 145, ... | [2009-06-14T14:53:03.000000, 2009-06-14T14:53:... | LINESTRING (116.18539 40.12348, 116.18543 40.1... |
3 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 000122 | [8a31aa52a517fff, 8a31aa52a537fff, 8a31aa52a52... | [-628, -628, -199, -41, 106, 136, 140, 146, 14... | [2009-05-30T11:44:45.000000, 2009-05-30T11:44:... | LINESTRING (116.31498 40.0087, 116.31508 40.00... |
4 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 000123 | [8a31aa52a467fff, 8a31aa52a46ffff, 8a31aa52a44... | [79, 79, 107, 97, 97, 114, 137, 169, 188, 176,... | [2009-05-27T16:01:05.000000, 2009-05-27T16:01:... | LINESTRING (116.31975 40.00765, 116.31974 40.0... |
train, test = geolife.train_test_split(
target_column="trajectory_id", task="TTE", test_size=0.2, n_bins=3
)
Created new train_gdf and test_gdf. Train len: 10968, test len: 2743
len(train), len(test)
(10968, 2743)
geolife.resolution
10
geolife.test_gdf.head()
user_id | trajectory_id | h3_sequence | avg_altitude_per_hex | timestamp | geometry | |
---|---|---|---|---|---|---|
0 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 000103 | [8a31aa501107fff, 8a31aa50111ffff, 8a31aa50102... | [71, 71, 211, 201, 201, 187, 191, 172, 168, 15... | [2009-07-02T10:25:30.000000, 2009-07-02T10:25:... | LINESTRING (116.30824 39.9966, 116.30829 39.99... |
2 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 000119 | [8a31aacb386ffff, 8a31aacb1497fff, 8a31aacb14b... | [492, 492, 490, 453, 174, 141, 140, 147, 145, ... | [2009-06-14T14:53:03.000000, 2009-06-14T14:53:... | LINESTRING (116.18539 40.12348, 116.18543 40.1... |
10 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 000138 | [8a31aa52a467fff, 8a31aa52a477fff, 8a31aa52a40... | [106, 106, 128, 182, 193, 207, 159, 172, 216, ... | [2009-06-23T12:36:35.000000, 2009-06-23T12:36:... | LINESTRING (116.3197 40.0077, 116.3197 40.0077... |
16 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 000169 | [8a31aa50cd67fff, 8a31aa50cd47fff, 8a31aa50cd6... | [153, 153, 172, 168, 167, 159, 131, 80, 164, 1... | [2009-05-13T03:55:58.000000, 2009-05-13T03:55:... | LINESTRING (116.32736 39.99976, 116.32737 39.9... |
17 | [000, 000, 000, 000, 000, 000, 000, 000, 000, ... | 0002 | [8a31aa52a18ffff, 8a31aa52a0a7fff, 8a31aa52a0a... | [492, 492, 495, 487, 454, 427, 401, 167, 162, ... | [2009-04-12T08:49:05.000000, 2009-04-12T08:49:... | LINESTRING (116.32177 40.01124, 116.32187 40.0... |
def visualize_h3_trajectories(
h3_sequences, map_center=(39.98899, 116.32702), zoom_start=12
):
"""
Visualize H3 sequences on a Folium map.
Args:
h3_sequences (List[List[str]]): A list of H3 sequences (trajectories).
map_center (Tuple[float, float]): Center of the map (lat, lon).
zoom_start (int): Initial zoom level.
"""
m = folium.Map(location=map_center, zoom_start=zoom_start, tiles="cartodbpositron")
colors = ["red", "blue", "green", "purple", "orange", "darkred", "lightblue"]
for i, sequence in enumerate(h3_sequences):
color = colors[i % len(colors)]
for h3_id in sequence:
boundary = h3.cell_to_boundary(
h3_id,
)
folium.Polygon(
locations=boundary, color=color, weight=2, fill=True, fill_opacity=0.3
).add_to(m)
return m
h3_sequences = train["h3_sequence"].tolist()
map_ = visualize_h3_trajectories(h3_sequences[10:20]) # visualize first 10 for speed
display(map_)