-
Notifications
You must be signed in to change notification settings - Fork 22
mult ex1
We'll use a ChIP-seq bigWig from ENCODE, in this case precipitated DNA from the chromatin IP using the antibody to H4K20me1, in human embryonic stem cells (h1-ESC) (link).
We want to adjust this signal using the mappability, a signal from 0 to 1, with 1 being perfectly-mappable regions of the genome. (link). Multiplying the ChIP signal by the mappability improves the signal by removing some artifacts introduced by badly-mappable regions. For this example, we'll use the 36 bp mappability track because it'
First we need to download the data. For this, paraFetch (linux x86_64 binary), is particularly useful:
$ paraFetch 30 10 http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHistoneH1hescH4k20me1StdSig.bigWig esc-H4k20me1.bw
$ paraFetch 30 10 http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeMapability/wgEncodeCrgMapabilityAlign36mer.bw.gz 36mer.bw.gz
The mappability bigWig is zipped for some odd reason, so needs to be unzipped:
$ gunzip 36mer.bw.gz
It's good to get a feel for these bigWigs with some general information. This can be retrieved with the bigWigInfo tool:
$ bigWigInfo esc-H4k20me1.bw
version: 4
isCompressed: yes
isSwapped: 0
primaryDataSize: 413,125,101
primaryIndexSize: 2,852,828
zoomLevels: 10
chromCount: 24
basesCovered: 2,262,532,085
mean: 2.733687
min: 0.040000
max: 17036.900391
std: 8.028273
$ bigWigInfo 36mer.bw
version: 4
isCompressed: yes
isSwapped: 0
primaryDataSize: 1,173,927,845
primaryIndexSize: 9,280,784
zoomLevels: 10
chromCount: 25
basesCovered: 2,858,018,176
mean: 0.789624
min: 0.000002
max: 1.000000
std: 0.385430
To multiply these two bigWigs together, we will use the bwtool paste command, combined with a python script that performs the calculation and outputs a WIG file, which we will then convert to bigWig. The python program bwtool_paste_mult.py is very simple:
# multiplies two columns of a bwtool paste -wigtype=fix
import sys
import numpy
for line in sys.stdin:
line = line.rstrip()
if line.startswith("fixedStep"):
# no multiplying on WIG header lines
print line
else:
# multiply
words = line.split('\t')
first = float(words[0])
second = float(words[1])
product = first * second
# decimal precision to .01 (might want more)
print "%0.2f" % (product)To perform the multiplication:
$ time bwtool paste esc-H4k20me1.bw 36mer.bw -skip-NA -wigtype=fix | python bwtool_paste_mult.py > mappable-esc-H4k20me1.wig
then a new bigWig must be constructed:
$ time wigToBigWig mappable-esc-H4k20me1.wig