This repository contains an analysis of train delays and incidents data from Infrabel, with a focus on the impact of the European Train Control System (ETCS). The project provides both a comprehensive analysis of train delays and the impact of incidents, and it demonstrates the effects of ETCS on punctuality and recovery times.
The analysis includes:
- Exploratory Data Analysis (EDA) with visualizations.
- Predictive modeling to assess the influence of ETCS on delays.
- A Streamlit dashboard that provides an interactive report.
All CSV datasets used in the analysis are provided in the repository, and all visualizations can be reproduced using the code in main.py
. Additionally, pre-generated visualizations are included for easy setup of the Streamlit app.
Infrabel-Data-Analysis/
│
├── __pycache__/ # Python cache files
├── myvenv/ # Virtual environment
├── Data # All required datasets
├── analysis.py # Python file for analysis functions
├── preprocess.py # Python file for data preprocessing functions
├── visualization.py # Python file for visualizations
├── main.py # Main analysis script
├── streamlit.py # Streamlit app script
├── Infrabel_Data_Analysis.ipynb # Jupyter Notebook with analysis
├── requirements.txt # Python dependencies
├── Images/*.png # Pre-generated visualizations (for Streamlit app)
└── *.html # Pre-generated HTML maps for the Streamlit app
To get started with the project, you need to set up a Python virtual environment. Follow the steps below:
-
Clone the repository:
git clone https://github.com/shreyab375/Infrabel-Data-Analysis.git cd Infrabel-Data-Analysis
-
Create a virtual environment:
python3 -m venv myvenv
-
Activate the virtual environment:
- On Windows:
myvenv\Scripts\activate
- On MacOS/Linux:
source myvenv/bin/activate
- On Windows:
After activating the virtual environment, install the required dependencies from the requirements.txt
file:
pip install -r requirements.txt
You can execute the main analysis by running main.py
. This will preprocess the datasets, perform analysis, generate visualizations, and build a predictive model.
python main.py
The interactive Streamlit app provides a report summarizing the results of the analysis with visualizations. To launch the app, execute:
streamlit run streamlit.py
This will open a web browser with the dashboard that includes:
- Visualizations showing delays, incidents, and ETCS impact.
- A summary of the key findings.
- Interactive maps and charts.
For a step-by-step walkthrough of the analysis, you can also check the Jupyter notebook:
jupyter notebook Infrabel_Data_Analysis.ipynb
- Delays Analysis: Provides insights into the relationship between delays, stopping places, and train lines.
- ETCS Impact: Demonstrates the benefits of ETCS in improving recovery times after incidents.
- Incident Hotspots: Identifies areas prone to incidents and their impact on delays.
- Predictive Modeling: Assesses the influence of different factors on train delays, including ETCS.
Regular data.csv
: Contains regular train delay data.Incidents.csv
: Incident data affecting train operations.ETCS.csv
: Data regarding the deployment of ETCS across different lines.
The repository includes several pre-generated visualizations (*.png
and *.html
files) to streamline the deployment of the Streamlit app. These visualizations include maps, charts, and figures illustrating the key insights from the analysis.