Overview

HiCtool is a bioinformatic tool, which integrates several software to perform a standardized and flexible Hi-C data analysis, from the raw data downloading to the normalization and visualization of intrachromosomal heatmaps and the identification of topological domains (Jesse R. Dixon et al., 2012). The aim of HiCtool is to enable users without any programming or bioinformatic expertise to work with Hi-C data and compare different datasets in a consistent way.

HiCtool is a pipeline divided into three main sections:

    1. Data preprocessing
    1. Data analysis and visualization
    1. Topological domain analysis

HiCtool provides a complete and exhaustive pipeline that leads the user, even beginner, easily and quickly to the results. For each step of the analysis, the software that is used, the inputs and the outputs are specified. In addition, each section contains an explanation that briefly summarizes what each step is about, to make all the process clear and user-friendly. Therefore, the big achievement is that you do not need to read any other software documentation but only follow the few steps listed in the related section, providing your specific input data.

About data preprocessing, HiCtool provides a complete pipeline from the downloading of the raw data to the final BAM files that are used for the following analysis steps. In addition, instructions on how to generate a fragment end BED file to correct biases are provided.

The data analysis and visualization section provides the pipeline to normalize the data and plot the heatmaps. The normalization has been done using the Python package HiFive (Michael Sauria et al., 2015), while for plotting Matplotlib is used, with the possibility also to add a histogram of the distribution of the data. Observed, expected and normalized contact counts can be plotted. In addition, we provide the possibility of plotting “observed over expected” contact heatmaps, where the expected counts are calculated considering both the learned correction parameters and the distance between read pairs, given the property that the average intrachromosomal contact probability for pairs of loci decreases monotonically with increasing of their linear genomic distance (Lieberman-Aiden et al., 2009).

Finally, the topological domain analysis section provides the code to calculate both the observed DI and the “true DI” using a Hidden Markov Model. Also the code to calculate topological domain coordinates is provided, therefore the user can infer systematically about the location of topological domain and boundaries over the genome.

Below is an illustration of Hi-C, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing (Lieberman-Aiden et al., 2009).

_images/hic_data_process.png

1. DNA is cross-linked with formaldehyde, resulting in covalent links between spatially adjacent chromatin segments (DNA fragments shown in dark blue and red; proteins which can mediate such interactions are shown in light blue and cyan).

2. DNA is digested with a restriction enzyme (HindIII) that leaves a 5’ overhang (restriction site marked by dashed lines in the first picture).

3. The 5’ overhangs are filled with nucleotides, one of which is biotinylated (purple dot).

4. The resulting fragments are ligated under dilute conditions that favor ligation events between the cross-linked DNA fragments. So now the DNA sample contains ligation products consisting of fragments that were originally in close spatial proximity in the nucleus, marked with biotin at the junction.

5. A Hi-C library is created by shearing the DNA and selecting the biotin-containing fragments with streptavidin beads (gray dots).

6. The library is analyzed by using parallel DNA sequencing, producing a catalog of interacting fragments.

Installation

HiCtool is in a pipeline format to allow extreme flexibility and easy usage. You do not need to install anything besides the following Python libraries, packages and software. Everything is open source.

1. Python libraries [for python >2.7]:

2. Python packages:

3. Other software needed:

Reference

If you use HiCtool in your research, please cite the manuscript on bioRxiv.

Support

For issues related to the use of HiCtool or if you want to report a bug, please contact Riccardo Calandrelli <rcalandrelli@eng.ucsd.edu>.