Installation

Two general things are need for program usage:

Environment setup
Input file preparation

Setting up your environment

In the terminal, go to directory of choice and clone this repo:

git clone https://github.com/acolorado1/DietMicrobeNet.git       # clone repo
cd DietMicrobeNet                                                # move into this project directory

Create environment with yaml file provided:

conda env create -f DMnet_env.yaml                              # create environment
conda activate DietMicrobeNet                                   # activate environment 
pip install -e .                                                # set up directory structure

Download FooDB and HMDB database information

For the following scripts to run you will need four files taken from FooDB and HMDB located in a public drive, and one file for all food item compounds found in FooDB.

To do this run:

wget 'https://olucdenver-my.sharepoint.com/:x:/g/personal/angelasofia_burkhartcolorado_cuanschutz_edu/ESXx7vpypQFOt4iVv6x-ErkBykpAVS1fppQjYZkrxkDnAA?download=1' -O Data/CompoundExternalDescriptor.csv
wget 'https://olucdenver-my.sharepoint.com/:x:/g/personal/angelasofia_burkhartcolorado_cuanschutz_edu/EYJUYQWmY9VDlYZIAXpzpvEBzhrnViFZQjrikXIla_aPPg?download=1' -O Data/Content.csv
wget 'https://olucdenver-my.sharepoint.com/:x:/g/personal/angelasofia_burkhartcolorado_cuanschutz_edu/EXyRAlYs1htNlcwz5T67BxQBGO7HfOjmfIBlkOydM0BIAw?download=1' -O Data/Food.csv
wget 'https://olucdenver-my.sharepoint.com/:x:/g/personal/angelasofia_burkhartcolorado_cuanschutz_edu/EbY2fD3JTcNLomKFqQhY5jABAXN-60A80PmkngRynazocg?download=1' -O Data/hmdb.csv
wget 'https://olucdenver-my.sharepoint.com/:x:/g/personal/angelasofia_burkhartcolorado_cuanschutz_edu/EZ1pyHd616RFkR9zG6kenuoBhZDroHYTbaGmEfwpxFOHLg?download=1' -O Data/AllFood/food_meta.csv

Needed Files for Input

There are three types of files that are needed to run the program:

File containing a list of KOs that were found in a sample
File containing KO metadata
File containing a list of food items that combined represent diet

List of KOs

Needs to be named noquote_ko.txt
File needs to have no quotes or commas

Example of the file:

KO Metadata

Needs to be named ko_taxonomy_abundance.csv
Should have three columns "KO", "taxonomy", and a column representing read abundance (in this case it is "Abundance_RPKs")
ONLY the read abundance column is mutable meaning this CSV must have these EXACT column names for KO and taxonomy
If you do not have taxonomy or abundance information leave the column blank, downstream process will eliminate empty values

Example of the file:

"KO","taxonomy","Abundance_RPKs"
"K00001","g__Bifidobacterium.s__Bifidobacterium_bifidum",30.025907407
"K00001","g__Bifidobacterium.s__Bifidobacterium_longum",0
"K00001","unclassified",0
"K00002","g__Blautia.s__Blautia_obeum",41.8831170812
"K00002","g__Blautia.s__Blautia_sp",0
"K00002","g__Blautia.s__Blautia_sp_AF19_10LB",0
"K00002","unclassified",0

Food items

These are typically downloaded in the second step of this workflow (see 2. Find Food Items)
Named either foodb_foods_dataframe.csv or kegg_organisms_dataframe.csv
If you want to include all possible food items you can use Data/AllFood/food_meta.csv (see 2. Find Food Items)