EuCanImage FHIR ETL Implementation
This repository contains the ETL implementation for EuCanImage, encouraging semantic interoperability of the clinical data obtained in the studies by transforming it into a machine-readable format following FHIR standards. This parser uses FHIR Resources in order to create the dictionaries following a FHIR compliant structure.
- Code Language is written in Python 3.11.
- The outputs are JSON files compliant with FHIR 4.3 schemas.
- This script is specifically created for the Extract, Transform and Load implementation for EuCanImage, and will follow the structures obtained from the REDCap databases within the study. To create your own implementation in a different study, you may use the previously mentioned FHIR Resources.
Data conversion process:
This code followed the structure to go through the following steps:
- Importing and transforming CSV with patient data
- Defining dictionaries for ontologies and functions to populate FHIR dictionaries
- Transforming dictionaries into FHIR resources
- Grouping FHIR resources into a defined bundle/envelope of resources
- Exporting as json file
Input & Output
- CSV file for each use case (CSV folder)
- JSON file following FHIR standards (OUTPUT folder)
Installation and Guide
The first step is to clone or download the repository to your computer
git clone https://github.com/EGA-archive/EuCanImage-FHIR.git
Requirements
- Python 3.11.2
- FHIR Resources 6.5.0
- pandas 2.1.3
- numpy 1.26.2
In order to use these scripts, you will need to have access to Python 3.11 in your systems.
To install the libraries used for this study, it can easily be done with pip install
. The latest versions of each library should not cause any incompatibility.
pip install fhir.resources
pip install pandas
pip install numpy
Instructions
The steps are the same on each Use Case, so we will be using Use Case 1 as an example for the steps to follow.
First of all, you will need to provide with a CSV file that follows the structure of the eCRF of the study. Each use case will have its own eCRF. Save the CSV file in the CSV folder of the specific use case you will be using.
Next, in the beginning of each python file (For example, for Use Case 1 it would be UC1-ETL.py, you will need to change the variable relative_path_csv
to change the name of the file matching the one of the input.
relative_path_csv = "/UC1_Hepatocellular_Carcinoma/CSV/UseCase1_testdata.csv"
Then, you can run the parser in the terminal, changing PATH-TO-FOLDER
to the specific folder the parser is in, unless the terminal is run in the folder itself.
python PATH-TO-FOLDER/UC1-ETL.py
Once it is finished, you will have all of the parsed JSON files in the OUTPUT folder
Version History
main @ 294527d (earliest) Created 2nd Sep 2024 at 14:42 by Aldar Cabrelles
Update UC7-ETL.py
Ontology use fix for BRCA1 and BRCA2
Frozen
main
294527d
Creator
Submitter
Views: 317 Downloads: 51
Created: 2nd Sep 2024 at 14:42
Last updated: 9th Sep 2024 at 14:41
This item has not yet been tagged.
None