Overview

HiCtool is a bioinformatic tool, which integrates several software to perform a standardized Hi-C data analysis, from the raw data downloading to the visualization of intrachromosomal heatmaps (Lieberman-Aiden et al., 2009) and the identification of topological domains (Jesse R. Dixon et al., 2012). The aim of HiCtool is to allow working on and comparing different datasets in a consistent way.

HiCtool implements three analysis steps: data preprocessing, data analysis and visualization, and topological domains analysis, related to Directionality Index (Jesse R. Dixon et al., 2012).

HiCtool provides a complete and exhaustive pipeline that leads the user, even beginner, easily and quickly to the results. For each step of the analysis, the software that is used, the inputs and the outputs are specified. In addition, each section contains an explanation that briefly summarizes what each step is about, to make all the process clear and user-friendly. Therefore, the big achievement is that you do not need to read any other software documentation but only follow the few steps listed in the related section, providing your specific input data.

About preprocessing of the data, HiCtool provides a complete pipeline from the downloading of the data to the final bam files that are used for the following analysis steps.

The data analysis and visualization section provides the pipeline to normalize the data and plot the heatmaps. The normalization has been done using the HiFive Python package (Michael Sauria et al., 2015), while for plotting the Python Imaging Library (PIL) is used. This results in a better visualization and understanding, also with the possibility to add a colorbar and a histogram of the output data.

Finally, the Topological Domains analysis section provides the code to calculate the DI and visualize it. It allows the user to calculate both the observed DI and the “true DI” using a Hidden Markov Model. Also the code to identify topological domains coordinates is provided, therefore the user can infer systematically about the location of topological domains and boundaries over the genome.

Below is an illustration of Hi-C, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing (Lieberman-Aiden et al., 2009).

_images/hic_data_process.png

1. DNA is cross-linked with formaldehyde, resulting in covalent links between spatially adjacent chromatin segments (DNA fragments shown in dark blue and red; proteins which can mediate such interactions are shown in light blue and cyan).

2. DNA is digested with a restriction enzyme (HindIII) that leaves a 5’ overhang (restriction site marked by dashed lines in the first picture).

3. The 5’ overhangs are filled with nucleotides, one of which is biotinylated (purple dot).

4. The resulting fragments are ligated under dilute conditions that favor ligation events between the cross-linked DNA fragments. So now the DNA sample contains ligation products consisting of fragments that were originally in close spatial proximity in the nucleus, marked with biotin at the junction.

5. A Hi-C library is created by shearing the DNA and selecting the biotin-containing fragments with streptavidin beads (gray dots).

6. The library is analyzed by using parallel DNA sequencing, producing a catalog of interacting fragments.

Installation

1. Python libraries [for python >2.7]:

2. Python packages:

3. Other software needed:

Reference

If you use HiCtool in your research, please cite the manuscript on bioRxiv.

Support

For issues related to the use of HiCtool or if you want to report a bug, please contact Riccardo Calandrelli <rcalandrelli@eng.ucsd.edu> and Qiuyang Wu <qiw034@eng.ucsd.edu>.