Completed on 10 Oct 2016 by Anshul Kundaje. Sourced from http://biorxiv.org/content/early/2016/08/15/069682.
Login to endorse this review.
Very nice paper. A few questions and clarifications.
1. Whats the negative set you used in the TFBS prediction evaluation (supp. fig 2). Its not clear from reading the methods.
2. Also was evaluation of each method done on held out chromosomes for that specific method i.e. chromosomes not used in training? E.g. DeepSEA holds out chr8 and 9 and trains on data from all other chromosomes for all data types across a range of cell types. So if you are evaluating performance of DeepSEA on sites in the training chromosomes, its not going to reflect test performance but rather training performance. Same goes for all other methods, unless you retrained them on all common training/test settings.
3. Also please avoid reporting auROCs for TFBS prediction evaluation or for that matter any unbalanced prediction problem on the genome. They can be very misleading. auROCs of >0.9 can translate to terrible auPRCs (< 0.2) and very poor recall at reasonable FDRs (e.g. < 1% recall at 50% FDR). Could you please report auPRCs and recall at reasonable FDR thresholds?