Analyzing Employee Arrival Patterns and Delays Using Geospatial Data

Analyzing Employee Arrival Patterns and Delays Using Geospatial Data

Introduction

In this analysis, we explore employee work arrival patterns using geospatial data to understand delays and their relationship with distance from the workplace. The dataset includes employee IDs, arrival times, expected arrival times, and geographic locations.

Key Findings

1. Data Preparation and Merging

We started by merging two datasets:

  • Work arrival times (including employee ID, date, actual/expected arrival times)
  • Employee locations (employee ID and geographic coordinates)
import geopandas as gpd
import pandas as pd

# Load and merge datasets
work_arrival_times = pd.read_csv('work_arrival_times.csv')
locations = pd.read_csv('locations.csv')
employees = pd.merge(work_arrival_times, locations, on='employee_id')

2. Calculating Delays

We converted time columns to datetime format and calculated delay in minutes:

employees['work_arrival_datetime'] = pd.to_datetime(employees['date'] + ' ' + employees['work_arrival_time'].astype(str))
employees['expected_arrival_datetime'] = pd.to_datetime(employees['date'] + ' ' + employees['expected_arrival'].astype(str))
employees['delay_minutes'] = (employees['work_arrival_datetime'] - employees['expected_arrival_datetime']).dt.total_seconds() / 60

3. Geospatial Analysis

We converted location data to geometric points and calculated distances from the workplace:

from shapely.wkt import loads
employees['geometry'] = employees['location'].apply(loads)
employees_gdf = gpd.GeoDataFrame(employees, geometry='geometry')

# Calculate distance to work location
work_location = "POINT (51.439152 35.715128)"
work_point = loads(work_location)
employees_gdf['distance_to_work_meters'] = employees_gdf['geometry'].apply(lambda geom: geom.distance(work_point) * 111139

4. Key Statistics

  • Average delay: 51.96 minutes
  • Average distance: 15,695.38 meters

5. Visualizations

Delay Distribution

We created a histogram showing the distribution of employee delays:

Delay Distribution Histogram

The histogram reveals most delays cluster around 50-80 minutes, with some extreme cases over 100 minutes.

Geospatial Mapping

We visualized employee locations relative to the workplace using Folium:

import folium
from folium.plugins import MarkerCluster

# Create map centered on work location
m = folium.Map(location=[35.715128, 51.439152], zoom_start=12)

# Add work location marker
folium.Marker(
    [35.715128, 51.439152],
    popup="Work Location",
    icon=folium.Icon(color="red", icon="info-sign")
).add_to(m)

# Add employee locations
for idx, row in employees_gdf.iterrows():
    folium.Marker(
        [row['geometry'].y, row['geometry'].x],
        popup=f"Employee ID: {row['employee_id']}",
        icon=folium.Icon(color="blue", icon="info-sign")
    ).add_to(m)

m

The map shows employee locations relative to the workplace, allowing us to visually assess if distance correlates with delays.

Insights and Recommendations

  1. Distance-Delay Relationship: The analysis shows employees travel an average of 15.7km to work. While we didn’t calculate correlation, visualizing this relationship could help determine if longer commutes lead to more delays.
  2. Delay Patterns: The consistent delays (mostly 50-80 minutes) suggest systemic issues rather than random occurrences. Possible factors include:
  • Traffic patterns at arrival times
  • Public transportation schedules
  • Workplace parking availability
  1. Recommendations:
  • Implement flexible start times for employees with longer commutes
  • Provide transportation subsidies or shuttle services
  • Analyze traffic patterns to suggest optimal routes
  • Consider remote work options for roles that permit it

Technical Notes

The analysis used:

  • Pandas for data manipulation
  • Geopandas for geospatial operations
  • Shapely for geometric calculations
  • Matplotlib for visualizations
  • Folium for interactive mapping

This approach demonstrates how combining temporal and geospatial data can provide valuable insights into workforce patterns and potential operational improvements.

Next Steps

Future analysis could:

  1. Calculate correlation between distance and delay times
  2. Incorporate traffic data for more precise commute time estimates
  3. Analyze delays by day of week to identify patterns
  4. Survey employees about their commute experiences

This type of analysis can help organizations make data-driven decisions about workplace policies and employee support systems.