Import Data • RITHMS

library(RITHMS)

This vignette presents the steps required to import your own data into the package. This mainly involves creating the founder-object combining genetic and microbiotic information, thanks to the MoBPS package.

library(data.table)
library(dplyr)
library(glue)
library(magrittr)
library(tibble)
library(MoBPS)

About accepted formats

The MoBPS package natively accepts two file formats to generate the starting population with the creating.diploid() function: PED/MAP format and VCF. Thus, it is possible to start a simulation from this kind of format, just by specify the type with the file_type argument in the read_input_data() function.

# Create founder object from PED/MAP set
founder_object <- read_input_data("path/to/microbiome.txt", "path/to/pedmap/'prefix'", biome_id_column = "sample_id", file_type = "pedmap")

#Create founder object from VCF
founder_object <- read_input_data("path/to/microbiome.txt", "path/to/vcf/'prefix'", biome_id_column = "sample_id", file_type = "vcf")

Microbiotic data is expected in the form of count table, with OTUs in columns and individuals in rows. (A column dedicated to individual identifiers is required). This data is filtered according to a prevalence threshold thanks to the threshold argument in the read_input_data().

Import simple genotype tables

It is possible to provide RITHMS with a simple genotype matrix encoded in the 0,1,2 format, with a prior file transformation, which we detail below. The VCF file can be written by specifying the output_type and output_path arguments.

n_ind <- 50
n_snp <- 50

# Create a little genotype matrix
set.seed(123)
geno_matrix <- matrix(sample(0:2, size = n_ind*n_snp, replace = TRUE),
                      nrow = n_snp, ncol = n_ind)

# Transform genotype matrix into VCF
vcf <- transform_geno_into_vcf(geno_matrix, output_type = "dataframe", output_path = "./output/path/file.vcf")
head(vcf[1:3,1:7])

Then, you can continue with the classical import.

#Create founder object from VCF
founder_object <- read_input_data("path/to/microbiome.txt", "path/to/vcf/'prefix'", biome_id_column = "sample_id", file_type = "vcf")