Inter-Sample Comparison
Comparing N Graphs/Pattern Outputs
After running all previous steps you will end up with a file /graph/graph_results.csv which contains the results of three different queries to find instances of microbial metabolism of dietary compounds.
In order to find similarities and differences between the graphs/patterns we looked at the genes invovled in this metabolism and compared them using Jaccard Similarity. This is vizualized in two ways, a heatmap of similarity scores, and a dendrogram to identify clusters using SciPy's higherarchical clustering algorithm. Additionally, a summary text file is written to show common genes between all graph for each pattern type and unique genes to each graph for each pattern type. The statistical test included in the summary is a PERMANOVA which performs 5,000 permutations and a seed of 5.
Running Comparison
To get a list of optional and required arguments run python GraphComparison.py -h:
python src/GraphComparison.py -h
usage: GraphComparison.py [-h] -m METADATA -p PATHS -n NAMES [-s] [-g GROUPS] -o OUTPUT [--ko_column KO_COLUMN]
Compare graph results across samples using KOs and Jaccard similarity.
options:
-h, --help show this help message and exit
-m, --metadata METADATA
Metadata CSV containing file paths and names
-p, --paths PATHS Name of column containing file paths
-n, --names NAMES Name of column containing names of graphs (e.g., sampleID)
-s, --stat_test If statistical test for group comparison wanted include this parameter
-g, --groups GROUPS Names of columns for use in PERMANOVA, if multiple separate by a comma e.g., cohort,diet,location
-o, --output OUTPUT Output directory for plots and summary files
--ko_column KO_COLUMN
Name of KOs column in graph CSVs (default: 'KOs')
Tip
All graph_results.csv should be formatted the same unless there has been a change done by the user. Thus all the ko_columns column should use the default column name unless they have been changed.
Note
Example metadata can be located in the Data file called Example_GraphComparison_Metadata.csv
Example usage w/o stats:
python src/GraphComparison.py \
-m "path/to/metadata.csv" \
-p "paths_column_name" \
-n "names_column_name" \
-o "path/to/output/dir/" \
Example usage w/ stats:
python src/GraphComparison.py \
-m "path/to/metadata.csv" \
-p "paths_column_name" \
-n "names_column_name" \
-s \
-g "groups_column_name" \
--o "path/to/output/dir/" \