Palmto_gen package

Module contents

class Palmto_gen.ConvertToToken(df, area, cell_size)[source]

Bases: object

assign_ids(grid, n_rows)[source]

Assign each cell an unique ID.

Assignes each grid cell a unique identifier based on its position in the grid. IDs are tuples of (column_index, row_index) starting from 0. The assignment follows column-major order.

Parameters:

grid (gpd.GeoDataFrame) – area grid returned from create_grid()
n_rows – number of rows in the grid

Returns:

The input object with an additional “ID” column where each row contains a tuple of (col_index, row_index) for each cell.

Return type:

grid(gpd.GeoDataFrame)

create_grid()[source]

Creates a grid of cell size ‘n’ over a given area.

Generates a regular grid of cells with the specified cell size (in meters) covering the entire bounding box of the study area. The grid cells are created as Shapely box geometries and stored in a GeoDataFrame.

This method converts the cell size from meters to degrees based on the geographic location, accounting for the Earth’s curvature.

Returns:

A tuple containing:

cell(gpd.GeoDataFrame): object with grid cells as box
geometries in the ‘geometry’ column. CRS is EPSG:4326.
n_rows(int): number of rows in the grid.
cell.shape[0]: total number of cells created.

Return type:

tuple

create_tokens()[source]

Convert raw coordinate pairs into tokens of (row_id, col_id).

Creates a grid over a given area where trajectories are sourced, assign unique IDs to cells in the grid, compute cell centers and merge original coordinates with their corresponding cell IDs based on which cell they fall into.

Returns:

A tuple containing:

grid_center(gpd.GeoDataFrame): object containing a “geometry” and “ID” column, with
the former representing a cell by its centroid.
grouped_df(pd.DataFrame): object containing three columns – “trip_id”, “geometry”
and “ID”. “geometry” represents a trajectory with a sequence of Point objects.

Return type:

tuple

find_grid_center(grid)[source]

Finds the centroid of each cell in the grid

Calculates the geometric center point of each grid cell. It first projects the grid to a flat plane using EPSG:3857 reference system for accurate geometric calculation, then converts the result back to EPSG:4326 system to maintain consistency with the original reference system.

Parameters:: grid (gpd.GeoDataFrame) – an object with cell geometry and ID columns.
Returns:: a new object with “geometry” and “ID” columns. The former now represents a cell box with its centroid.
Return type:: grid_center(gpd.GeoDataFrame)

merge_with_polygon(grid)[source]

Performs spatial joins between trajectory points and grid cells.

Assigns each trajectory point to its corresponding grid cell using a spatial join operation. Points are matched to grid cells based on which cell polygon they fall within. Points that don’t fall within any grid cell are removed from the result

Parameters:: grid (gpd.GeoDataFrame) – an object with cell geometry and ID columns.
Returns:: the trajectory points GeoDataFrame with additional “ID” column containing grid cell ID where each point is located.
Return type:: merged_df(gpd.GeoDataFrame)

class Palmto_gen.DisplayTrajs(original_trajs, generated_trajs)[source]

Bases: object

display_maps()[source]

Display original and generated trajectories side-by-side in interactive Folium maps.

Creates two interactive maps showing original trajectories (left) and generated trajectories (right) for visual comparison. Each trajectory is rendered as a blue polyline on its respective map. The maps are displayed in a responsive HTML layout within Jupyter notebooks or similar environments that support HTML rendering.

merge_grid_with_points(grid, df, num_cells)[source]

Merges trajectory points with grid cells to determine which region each point belongs to.

Performs a spatial join between trajectory points and grid cells, assigning each point to its corresponding grid region. The method explodes the trajectory DataFrame to individual points, converts them to a GeoDataFrame, and then performs a spatial join with the grid to identify which grid cell contains each point.

Parameters:

grid (gpd.GeoDataFrame) – A GeoDataFrame containing the grid cells with their geometries. Each cell represents a spatial region.
df (pd.DataFrame) – A DataFrame containing trajectory data with a ‘geometry’ column that contains lists of coordinate points for each trajectory.
num_cells (int) – The total number of cells in the grid. Used to assign sequential region IDs from 0 to num_cells-1.

Returns:

A merged GeoDataFrame where each row represents a single trajectory point with the following additional columns:

’Region’: The ID of the grid cell containing the point

’point_region’: The geometry (polygon) of the grid cell containing the point, or ‘nan’ if the point doesn’t fall within any grid cell.

Return type:

gpd.GeoDataFrame

plot_heat_map(df, area, ax, cell_size)[source]

Creates a heatmap visualization showing the density of trajectory points across grid cells.

Generates a grid over the specified area, counts the number of trajectory points falling within each grid cell, and visualizes this density as a heatmap using a color gradient. The heatmap helps identify areas of high and low trajectory activity.

Parameters:

df (pd.DataFrame) – A DataFrame containing trajectory data with a ‘geometry’ column that contains lists of coordinate points for each trajectory.
area (gpd.GeoDataFrame) – A GeoDataFrame defining the geographical area to be analyzed. Used to determine grid boundaries and overlay the area outline on the plot.
ax (matplotlib.axes.Axes) – The matplotlib axes object on which to draw the heatmap. Allows integration with existing figure layouts.
cell_size (int) – The side length of each grid cell in meters. Determines the spatial resolution of the heatmap - smaller values create finer grids with more detail.

plot_map(trajs)[source]

Creates an interactive Folium map displaying trajectory paths as polylines.

Generates an interactive web map centered on the first trajectory point and renders all trajectories as blue polylines. The map allows users to zoom, pan, and explore the trajectory patterns interactively.

Parameters:

trajs (list) – A list of trajectories where each trajectory is a list of shapely Point objects or similar geometry objects with x (longitude) and y (latitude) attributes.

Returns:

A Folium map object containing all trajectories visualized: as blue polylines. The map can be displayed in Jupyter notebooks or saved as HTML.

Return type:

folium.Map

class Palmto_gen.NgramGenerator(sentence_gdf)[source]

Bases: object

create_ngrams()[source]

Extract bigrams and trigrams from the original and reversed trajectory sequences.

Sentences, converted to list from the “ID” column of input dataframe, are reversed before bigrams and trigrams are extracted from both the original and reversed sentences. Each bigram dictionary also keeps count of unqiue bigram and trigrams.

Returns:: a dictionary of four dictionaries. Each inner dictionary is comprised of items that has a tuple of cell IDs as its key and its number of occurance as the value. start_end_points(list): a list of lists, as returned by find_start_end_points().
Return type:: ngrams(dict)

find_start_end_points()[source]

Extract start and end bigrams from trajectory sequences.

Identifies the starting and ending positions of every trajectory by extracting the first and last two grid cells. Duplicate consecutive cells are first removed to ensure meaningful start/end points.

Returns:

a list of lists. Each inner contains two tuples: the first one: represents the start bigram of a trip and the second one the end bigram of a trip. Only trips with more than three unique consecutive cells are included in the result.

Return type:

start_end_points(list)

reverse_sentences(sentences)[source]

Reverse trajectory sequences.

Parameters:

sentences (list) – a list of lists, with the inner list consisting of a sequence of cell IDs.

Returns:

a list of lists, with the inner list not containing a reversed version: of original sequences.

Return type:

reversed_sentences(list)

class Palmto_gen.TrajGenerator(ngrams, start_end_points, n, grid)[source]

Bases: object

static calculate_distance(point1, point2)[source]

Calculate the Euclidean distance between two points in a 2D plane.

Parameters:

point1 (tuple) – first point as a tuple of (x, y) coordinates;
point2 (tuple) – second point as a tuple of (x, y) coordinates;

Returns:

always returns a non-negative value.

Return type:

Float

convert_sentence_to_traj(generated_sentences)[source]

Convert tokenized trajectory sentences into geographic coordinate sequences.

Transforms grid-based token representations (ID tuples) into actual geographic trajectories by mapping each token to its corresponding grid cell centroid. This creates smooth paths through the geographic space using the pre-computed cell center points stored in the grid_center GeoDataFrame.

Parameters:

generated_sentences (list) – A list of trajectory sentences, where each sentence is a list of tokens. Each token is a tuple (column, row) representing a grid cell ID, e.g., [[(0,1), (1,1), (2,1)], [(3,3), (3,4), (4,4)]].

Returns:

A list of trajectories, where each trajectory is a list of Shapely Point: objects representing the geographic coordinates. Each Point corresponds to the centroid of the grid cell identified by the token. Invalid tokens are silently skipped.

Return type:

all_points(list)

find_next_tokens(left, right, path_sentence)[source]

Find the next token pairs to extend a trajectory by analysing trigram frequency and spatial distance

This method identifies potential next tokens for both left and right sides of a growing trajectory. It uses trigram frequency data to find the next probable points, then selects token pairs based on their spatial proximity to maintain coherence. This method ensures no repeated tokens in the path.

Parameters:

left (list) – a list of two tokens representing left edge of current path;
right (list) – a list of two tokens representing right edge of current path;
path_sentence (list) – current path as a list of tokens. Used to prevent selecting tokens that would create loops in a trajectory;

Returns:

a list of 3 token pairs, where each element is a token of ((left_point, right_point)) that: represents an extension of current path.

Return type:

points(list)

generate_sentences_using_origin(length, seed=None)[source]

Generate a trajectory of specified length starting from a random origin point using trigram language model.

Creates a trajectory by starting with an origin point pair and extending it token by token using weighted random selection based on trigram frequencies. This method follows a traditional n-gram language model approach where the next token is probabilistically chosen based on the frequency distribution of observed trigrams in the training data.

Parameters:

length (int) – Target length of the trajectory in number of tokens/points. The actual length may be shorter if no valid continuations exist.
seed (int, optional) – Random seed for reproducible trajectory generation. If provided, ensures deterministic origin selection from available start points. Defaults to None for random selection.

Returns:

A trajectory as a list of tokens (coordinate tuples), starting from: the selected origin. Length will be min(length, available_path_length). May be shorter than requested if the trajectory reaches a dead end.

Return type:

text(list)

generate_sentences_using_origin_destination()[source]

Generate a complete trajectory by connecting origin and destination points through spatial proximity.

This method creates a trajectory by starting with randomly selected origin-destination pairs and iteratively filling in the path between them. It uses a bidirectional growth approach, extending from both ends simultaneously while maintaining spatial coherence through trigram frequencies and distance minimization. The process continues until the growing ends meet close enough that they can be connected by a single intermediate token.

Returns:

A complete trajectory as a list of tokens (coordinate tuples) representing: a path from origin to destination. Returns empty list if unable to generate a valid path after 3 attempts.

Return type:

path_sentence(list)

generate_trajs_using_origin(sentence_length, seed=None)[source]

Generate synthetic trajectories of specified length from origin points and return in multiple formats.

Creates a specified number of trajectories by repeatedly generating paths from randomly selected origin points using the trigram language model approach. Each trajectory extends from its origin for approximately the target length. The method filters out trajectories that are significantly shorter than requested (more than 5 tokens short) to ensure quality. Results are returned in two formats for different use cases.

Parameters:

sentence_length (int) – Target length for each trajectory in number of tokens/points. Trajectories shorter than (sentence_length - 5) are rejected and regenerated.
seed (int, optional) – Random seed for reproducible batch generation. If provided, generates deterministic set of trajectories. Defaults to None for random generation.

Returns:

A pair of DataFrames containing the same trajectories in different formats:

df (DataFrame): Trajectories as coordinate lists with columns:
- ’trip_id’: Unique identifier (1 to n)
- ’geometry’: List of [x, y] coordinate pairs
gdf (DataFrame): Trajectories as Shapely geometries with columns:
- ’trip_id’: Unique identifier (1 to n)
- ’geometry’: List of Shapely Point objects

Return type:

tuple

generate_trajs_using_origin_destination()[source]

Generate synthetic trajectories using origin-destination pairs and return in multiple formats.

Creates a specified number of synthetic trajectories by repeatedly calling the origin-destination generation algorithm. Each trajectory connects randomly selected start and end points through spatially coherent paths. The method ensures all generated trajectories are valid (non-empty) and converts them from token sequences to geographic coordinates. Results are returned in two formats for different use cases.

Returns:

A pair of DataFrames containing the same trajectories in different formats:

df (DataFrame): Trajectories as coordinate lists with columns:
- ’trip_id’: Unique identifier (1 to n)
- ’geometry’: List of [x, y] coordinate pairs
gdf (GeoDataFrame): Trajectories as Shapely geometries with columns:
- ’trip_id’: Unique identifier (1 to n)
- ’geometry’: List of Shapely Point objects

Return type:

tuple

static start_path(start, end)[source]

Create an initial 4-point trajectory by inserting closest points in the middle.

This method finds the two points (one from start and one from end) that are closest to each other in Euclidean space, then arranges all four points to form a smooth initial segment.

Agrs:

start(tuple): a tuple of two points representing the start of a trip:

First point: starting point as (x, y) coordinates
Second point: second point as (x, y) coordinates

end(tuple): a tuple of two points representing the end of a trip:

First point: second-to-last point as (x, y) coordinates
Second point: last point as (x, y) coordinates

Returns:: an intial trip segment of (outer_point1, close_point1, close_point2, outer_point2).
Return type:: path_start(list)

Palmto_gen.convert_to_points(coord_list)[source]

Convert coordinate pairs into Shapely Point object.

Parameters:: coord_list (list) – coordinate pairs in (lon, lat) format.
Returns:: coordinates-converted Shapely points.
Return type:: list

Palmto_gen.process_data(df)[source]

Convert list-formatted trajectories to individal Shapely Point.

Parameters:: df (pd.DataFrame) – an object that contains at least a “geometry” column.
Returns:: an object compliant with WGS84 reference system, ie. (lon, lat) pairs.
Return type:: gpd.GeoDataFrame

Palmto_gen.process_trigrams(trigrams)[source]

Arrange trigram tuples and their count of occurance in a different format.

Create a dictionary that has the first two tokens in a trigram tuple as its key and the last token, as well as the occurance count of the trigram as its value. This arrangement facilitates next-point prediction through a statistical approach.

Parameters:: trigrams (dict) – a dictionary of trigram tuples and their occurance count in the format of {(token_1, token_2, token_3): count}.
Returns:: an rearranged trigram dictionary, formatted as {(token_1, token_2): [(token_3, conut), …]}
Return type:: trigrams_dict(dict)

Palmto_gen.process_trigrams_2(trigrams)[source]

Reorganizes trigrams in an alternative format.

Transforms a trigram dictionary into a lookup structure where pairs of (first_token, third_token) are mapped to a list of second_tokens. This is useful for finding “bridge” points between two non-adjacent grid cells.

Parameters:

trigrams (dict) – a dictionary of trigram tuples and their occurance count in the format of {(token_1, token_2, token_3): count}.

Returns:

a dictionary mapping (first_token, third_token) tuples to a list of: middle tokens.

Return type:

trigram_dict_2(dict)