EuCanImage FHIR ETL Implementation I: Hepatocellular Carcinoma
main @ 294527d

Workflow Type: Python

EuCanImage FHIR ETL Implementation

This repository contains the ETL implementation for EuCanImage, encouraging semantic interoperability of the clinical data obtained in the studies by transforming it into a machine-readable format following FHIR standards. This parser uses FHIR Resources in order to create the dictionaries following a FHIR compliant structure.

  • Code Language is written in Python 3.11.
  • The outputs are JSON files compliant with FHIR 4.3 schemas.
  • This script is specifically created for the Extract, Transform and Load implementation for EuCanImage, and will follow the structures obtained from the REDCap databases within the study. To create your own implementation in a different study, you may use the previously mentioned FHIR Resources.

Data conversion process:

This code followed the structure to go through the following steps:

  • Importing and transforming CSV with patient data
  • Defining dictionaries for ontologies and functions to populate FHIR dictionaries
  • Transforming dictionaries into FHIR resources
  • Grouping FHIR resources into a defined bundle/envelope of resources
  • Exporting as json file

Input & Output

  • CSV file for each use case (CSV folder)
  • JSON file following FHIR standards (OUTPUT folder)

Installation and Guide

The first step is to clone or download the repository to your computer

git clone https://github.com/EGA-archive/EuCanImage-FHIR.git

Requirements

In order to use these scripts, you will need to have access to Python 3.11 in your systems.

To install the libraries used for this study, it can easily be done with pip install. The latest versions of each library should not cause any incompatibility.

pip install fhir.resources
pip install pandas
pip install numpy

Instructions

The steps are the same on each Use Case, so we will be using Use Case 1 as an example for the steps to follow.

First of all, you will need to provide with a CSV file that follows the structure of the eCRF of the study. Each use case will have its own eCRF. Save the CSV file in the CSV folder of the specific use case you will be using.

Next, in the beginning of each python file (For example, for Use Case 1 it would be UC1-ETL.py, you will need to change the variable relative_path_csv to change the name of the file matching the one of the input.

relative_path_csv = "/UC1_Hepatocellular_Carcinoma/CSV/UseCase1_testdata.csv"

Then, you can run the parser in the terminal, changing PATH-TO-FOLDER to the specific folder the parser is in, unless the terminal is run in the folder itself.

python PATH-TO-FOLDER/UC1-ETL.py

Once it is finished, you will have all of the parsed JSON files in the OUTPUT folder

Version History

main @ 294527d (earliest) Created 2nd Sep 2024 at 14:42 by Aldar Cabrelles

Update UC7-ETL.py

Ontology use fix for BRCA1 and BRCA2


Frozen main 294527d
help Creators and Submitter
Creator
Submitter
Citation
Cabrelles, A. (2024). EuCanImage FHIR ETL Implementation I: Hepatocellular Carcinoma. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1112.1
Activity

Views: 401   Downloads: 63

Created: 2nd Sep 2024 at 14:42

Last updated: 9th Sep 2024 at 14:41

help Tags

This item has not yet been tagged.

help Attributions

None

Total size: 1020 KB
Powered by
(v.1.16.0-main)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH