Custom Host-finding Results Manifest

Custom host-finding includes the following deliverables:

  • filtered_master_table.tsv
  • intra_contig_linkages.tsv
  • inter_contig_linkages.tsv
  • min_copy_count_roc_plot.png
  • unfiltered_master_table.tsv
  • reports / reports is a directory that contains the following tables and figures for both the filtered and unfiltered datasets
    • contigs_histogram_adjusted_inter_vs_intra_ratio.png
    • contigs_histogram_mobile_element_copies_per_cell.png
    • contigs_host_count_histogram.png
    • contigs_matrix_adjusted_inter_vs_intra_ratio.tsv
    • contigs_matrix_mobile_element_copies_per_cell.tsv
    • contigs_heatmap_adjusted_inter_vs_intra_ratio.png1
    • contigs_heatmap_mobile_element_copies_per_cell.png1

File descriptions

  • intra_contig_linkages.tsv: Counts of Hi-C links on the same contig.
  • inter_contig_linkages.tsv: Counts of Hi-C links between two different contigs.
  • min_copy_count_roc_plot.png: Receiver operating characteristic (ROC) curve is used to determine the optimal copy count cut-off value.
  • unfiltered_master_table.tsv: This table shows metadata for all mobile-element to host associations, where each row is an association between a mobile element (mobile_contig_name) and a host (cluster_name).
  • filtered_master_table.tsv: This table shows metadata for all mobile-element to host associations after filtering, where each row is an association between a mobile element (mobile_element) and a host (cluster_name). Mobile element-host linkages are filtered as described in Uritskiy 2021, “Accurate viral genome reconstruction and host assignment with proximity-ligation sequencing”. Adapted from Uritskiy 20212: “Mobile element-host linkages are filtered to keep only connections with at least 2 Hi-C read links between the mobile element and host MAG, a connectivity ratio of 0.1, and intra-MAG connectivity of 10 links to remove false positives. For the final threshold value, a receiver operating characteristic (ROC) curve is used to determine the optimal copy-icount cut-off value. The optimal cut-off was determined from the ROC curve as the value that produces the point to the top left of the plot, or the cut-off that removed the maximum number of mobile element-host links while still finding at least one host for the maximum number of mobile elements.”
Column Description
mobile_contig_name the name of the mobile element contig
mobile_contig_length (bp) the length of the mobile element contig
mobile_contig_read_count (reads) number of Hi-C reads aligning to mobile element contig
mobile_contig_read_depth (reads/kbp) Hi-C read coverage depth of module element contig in entire sample
mobile_contig_read_depth_in_this_cluster (reads/kbp) Hi-C read coverage depth of the module element contig in this MAG (in cases where it is linked to multiple MAGs)
cluster_name host MAG name
cluster_length (bp) host MAG length
cluster_read_count (reads) number of Hi-C reads aligning to host MAG
cluster_read_depth (reads/kbp) Hi-C read coverage depth of host MAG
intra_read_count (reads) number of HiC reads inter-linking contigs in the MAG
intra_linkage_density (reads/kbp^2) density of HiC inter-linking between contigs in the MAG (this is the “expected” linkage)
inter_read_count (reads) number of HiC reads linking the module element to the potential host MAG
raw_inter_linkage_density (reads/kbp^2) density of HiC inter-linking between the mobile contig and the MAG (this is the “actual” linkage)
raw_inter_vs_intra_ratio ratio between the “actual” and “expected” HiC linkage
mobile_element_copies_per_cell estimated number of copies of the mobile element in this MAG
adjusted_inter_connective_linkage_density (reads/kbp^2) density of HiC inter-linking between the mobile contig and the MAG, adjusted for the copy count
adjusted_inter_vs_intra_ratio ratio between the “actual” and “expected” HiC linkage adjusted for copy count (if this is even somewhat close to 1, this is a real host)
  1. Please note that samples with many mobile element-host pairs will not have the heatmap figures due to visualization limitations.  2

  2. Uritskiy, G. et al. Accurate viral genome reconstruction and host assignment with proximity-ligation sequencing. 2021.06.14.448389 Preprint at (2021).