University of Oxford
Browse

File(s) under embargo

1

month(s)

26

day(s)

until file(s) become available

Code for “Current water contact and Schistosoma mansoni infection have distinct determinants: a data-driven population-based study in rural Uganda”

software
posted on 2024-09-10, 16:29 authored by Fabian ReitzugFabian Reitzug

Details on how to rerun the analysis pipeline described in:

“Current water contact and Schistosoma mansoni infection have distinct determinants: a data-driven population-based study in rural Uganda” by Fabian Reitzug, Narcis B. Kabatereine, Anatol M. Byaruhanga, Fred Besigye, Betty Nabatte, Goylette F. Chami

System requirements

This code was run on the University of Oxford high-performance Biomedical Research Computing (BMRC) computing cluster on August 24, 2024 on 1 CPU core with 24 GB RAM (approximate run time 24 hours).

The following software modules on the BMRC cluster were required:

  • R/4.1.0
  • SQLite/3.38.3-GCCcore-11.3.0
  • PROJ/9.0.0-GCCcore-11.3.0
  • GEOS/3.10.3-GCC-11.3.0
  • GDAL/3.3.0-foss-2021a
  • rgdal/1.5-23-foss-2021a-R-4.1.0
  • MPFR/4.1.0-GCCcore-11.3.0

Installation guide

To run this code, installation of R >= 4.1.0 is required.

All required R packages are loaded in /code/prep/01_paths_pkgs.R (any packages not installed already can be installed via the install.package function).

Typical install time on a normal desktop computer should be less than 30 minutes.

Demo

Instructions to run on data

The following scripts may need to be modified to successfully run the scripts on a local computer:

  • Set the working directory to the code directory using the setwd command in R.
  • Set the directory paths so that they point to the directory where the demo data is located.

The entire analysis pipeline can be run by executing the /code/RUN.R script, which runs all scripts required to reproduce the results.

Expected output

  • Variable selection output: Outputs from the variable selection process (via likelihood ratio tests and Bayesian variable selection) are saved in the /code/out/var_sel/ directory (the variable selection is run on the confidential raw data, thus only selection outputs are publicly available).
  • Main figures: Figs. 3-9 are written to the /code/out/main/ directory (Fig. 1 is not created programmatically, and Fig. 2 has latitude and longitude columns and requires an external waterbody dataset that is not included with the demo data).
  • Supplementary tables: All supplementary tables are written to the /code/out/main/s_tabs/ directory.
  • Supplementary figures: All supplementary figures are written to the /code/out/main/s_figs/ directory.
  • Supplementary file: All supplementary figures and tables are wrapped together using LaTeX (by means of the /code/out/s_file/s_file.Rnw script which generates a PDF saved in the same folder).

Expected run time for demo on a "normal" desktop computer

  • Expected runtime of the project should be less than two hours.

Instructions for use

To run the code on a different dataset with a similar structure, the following modifications would be required:

  • The /code/prep/03_read.R file would need to be modified to load in the desired datasets.
  • All preprocessing steps for the data (subsequent to loading) should be done by scripts in the /code/prep/ directory, which is aimed to contain all data preparation scripts.
  • A data dictionary of the same format as the one saved in /code/dict/ (in .csv format) would be required to label the main datasets and specify the variables which should be included in the candidate variable set (this is done in the /code/prep/12_dict.R and the /code/prep/13_applylabs.R scripts).

Funding

A DPhil scholarship was awarded from the Nuffield Department of Population Health (NDPH) to Fabian Reitzug. Grants from the Wellcome Trust Institutional Strategic Support Fund (204826/Z/16/Z), NDPH Pump Priming Fund, John Fell Fund, Robertson Foundation Fellowship, and UKRI EPSRC Award (EP/X021793/1) were awarded to Goylette F. Chami.

History

Usage metrics

    University of Oxford

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC