This repository contains code that processes MNO Data to generate population and mobility insights indicators using the Spark framework.
- π Description
- ποΈ Repository Structure
- π Documentation
- π Licensing
π ΏοΈ Pipeline- π οΈ Mandatory Requirements
- π¦ Synthetic data
- π Quickstart
This repository contains a python application that uses the PySpark library to process Big Data pipelines of MNO Data and generate multiple stadistical products related to mobility and sociodemographic analysis.
The code stored in this repository is aimed to be executed in a PySpark compatible cluster and to be deployed in cloud environments like AWS, GCP or Azure. Nevertheless, the code can be launched in local environments using a single node Spark configuration once all the required libraries have been correctly set.
For an easy deployment in local environments, configuration for creating a docker container with all the setup done is provided in this repository.
The repository contains the following directories:
Directory | Type | Description |
---|---|---|
.devcontainer | Directory with config files for setting up a dev-environment using Dev-Containers. | |
.vscode | Directory containing config files for developers using VsCode. | |
docs | Documentation source files that will be used for the documentation site. Mainly markdown files. | |
license | Directory containing the Software Bill of Materials (SBOM) and associated licensing documentation for software dependencies. | |
multimno | Main directory of the repository. It contains the Python source code of the application. | |
pipe_configs | Directory containing examples of configuration files for the execution of the pipeline. | |
sample_data | Directory containing Synthetic MNO-Data to be used to test the software. | |
resources | Directory containing requirements files and development related configuration and script files. | |
tests | Directory containing test code and test files for the testing execution. |
The multimno documentation is divided into two main documents.
- Technical documentation: PDF file deatiling the software requirements, design and data objects.
- User/Developer manual: Webpage containing the user and developer manuals, including the contribute guide.
A user manual is provided composed of three sections:
- Configuration: Section containing the explanation of all the configuration files used by the software.
- Setup Guide: How to prepare the system for the software execution.
- Execution Guide: How to execute the software.
Please follow the contribute guide to see the rules and guidelines on how to contribute to the multimno repository.
Please follow the development guidelines to setup a dev-environment and see the recommended best practices for development, testing and documentation.
Multimno software is licensed under the European Union Public License (EUPL) 1.2 as declared by its LICENSE file. To ensure transparency in its dependencies, a Software Bill of Materials (SBOM) is provided at license/sbom.json. This SBOM was generated on March 18, 2025, using CycloneDX.
The SBOM is generated using CycloneDX, and an accompanying Python script is provided to facilitate further analysis. This script processes the SBOM file and produces the licenses.csv
file and the unique_licenses.txt
. All license related files are stored at the licenses/
directory, containing:
licenses.csv
β A concise CSV file containing the list of dependencies, including their versions and associated licenses.sbom.json
β A comprehensive SBOM detailing all dependencies and their respective licenses.unique_licenses.txt
β A file enumerating the distinct licenses used within the software.
As of March 18, 2025, the licensing data has been included in the repository and validated against the EUPL matrix of compatible open-source licenses.
The pipeline of Big Data processing performed by the software can be found at the following document: MultiMNO Pipeline
Please verify that your system fullfils the System Requirements. in order to assert that your system can execute the code.
MNO synthetic data is given in the repository under the sample_data/lakehouse/bronze
directory. This data
has been generated synthetically and contains the following specs:
-
π Spatial scope: All data has been generated in a bounding box that covers the metropolitan area of Madrid. The bounding box parameters are as follows:
- latitude_min = 40.352
- latitude_max = 40.486
- longitude_min = -3.751
- longitude_max = -3.579
-
π Temporal scope : Data has been generated for 9 days, from 2023-01-01 to 2023-01-09 both included.
-
πΆββοΈUsers: 100 different users.
-
π‘Network: 500 different cells.
Use the following commands for a fast setup of an execution environment using docker.
Please check the Setup Guide for a more indepth detail of the system setup to execute the code.
Build docker image
docker build -t multimno:1.0-prod --target=multimno-prod .
Run an example pipeline within a container:
docker run --rm --name=multimno-container -v "${PWD}/sample_data:/opt/data" -v "${PWD}/pipe_configs:/opt/app/pipe_configs" multimno:1.0-prod pipe_configs/pipelines/pipeline.json
This command will:
- Create a docker container.
- mount the
sample_data
directory in/opt/data
within the container. - mount the
pipe_configs
directory in/opt/app/pipe_configs
within the container. - Execute a pipeline stored in
/opt/app/pipe_configs/pipelines/pipeline.json
within the container. This is the same file as the one in the repository. - Delete the container once the execution finishes.
NOTE: It is necessary to adjusts paths in the pipeline.json and in the general_configuration.ini file if the destination paths are altered.