Skip to contents

This packages helps distribute demand data from the São Paulo metro system inside R. Data is sourced from multiple operators and spans 2012-2025. While most of the data is already public, it’s scattered across multiple poorly structured CSV/PDF files.

Information on lines 1, 2, 3, 5, and 15 are sourced from the open data portal from METRÔ, while lines 4 and 5 (post-2018) are sourced from Dataverse (Insper).

All datasets are returned as tibble objects and are “lazy” datasets, meaning they are bundled with the package and don’t need to be downloaded. The data is also cleaned and standardized to make it easier to work with.

Line Coverage

The package currently covers all metro lines in São Paulo. In the future it may be expanded to include trains as well.

Line Name Operator Period Status
1 Azul (Blue) METRÔ 2017–2026 Available
2 Verde (Green) METRÔ 2017–2026 Available
3 Vermelha (Red) METRÔ 2017–2026 Available
4 Amarela (Yellow) ViaQuatro 2012–2025 Available
5 Lilás (Lilac) ViaMobilidade 2017–2025 Available
15 Prata (Silver) METRÔ 2017–2026 Available

Installation

The package will be available on CRAN. Once released, install with:

install.packages("metrosp")

To install the development version from GitHub, use:

# install.packages("remotes")
remotes::install_github("viniciusoike/metrosp")

Datasets

The table below describes all datasets that are shipped with the package. The main datasets are: passengers_entrance, passengers_transported, station_averages, and station_daily. Other datasets are auxiliary tables aimed at facilitating analysis and visualization.

Dataset Description Frequency Spatial
passengers_entrance Average passenger entries by line Monthly No
passengers_transported Average passengers transported by line Monthly No
station_averages Average weekday passenger entries by station Monthly No
station_daily Daily passenger entries by station Daily No
metro_lines Metro line reference table (names, colors, operators) No
metro_colors Named vector of official metro line colors No
lines Metro and train line routes (current + planned) Yes
stations Metro and train station locations (current + planned) Yes

Usage

library(metrosp)
# To work with spatial datasets (lines, stations)
library(sf)
# For better tables load dplyr or tibble
library(dplyr)

# Passenger entries by line
passengers_entrance

# Station-level weekday averages
station_averages

# Spatial line routes
lines

Coming soon

  • Better data documentation.
  • Updated information for 2026.
  • Improved data quality checks.

Data sources