Predicting the DNA binding specificity of transcription factors variants using family-level biophysically interpretable machine learning
Authors: Shaoxun Liu, Pilar Gomez-Alcala, Christ Leemans, William J. Glassford, Lucas A.N. Melo, Richard S. Mann, Harmen J. Bussemaker
📄 Publication: Nucleic Acids Research (2025)
🧬 Lab Website: The Bussemaker Lab
To reproduce the analysis, you must install the FamilyCode package from GitHub.
# Install devtools if you haven't already
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
# Install FamilyCode
devtools::install_github('BussemakerLab/FCpackage')
# Load the library
library(FamilyCode)Before running the processing scripts, you must acquire the raw data.
- Download: Visit CisBP (v3.00).
- Select Family: Choose "bHLH" or "Homeodomain" (download one family at a time).
- Select Options: Check the boxes for "Z-score" and "TF info".
- Download: Click the "Download Family Archive" button.
- Setup: * Uncompress the ZIP files.
- Place
Zscores.txtinto the correspondingrawData/HD/orrawData/bHLH/directories.
- Place
Note: In case of CisBP updates that cause parsing issues, we have provided working versions of
Zscores.txtandTF_Information.txt(bHLH: Jan 2019, HD: Jun 2024).
To reproduce the figures from the paper using pre-computed data:
- Script:
FamilyCodeFigures.Rmd - Description: This script uses pre-computed files located in the
intermediateDatafolder.
If you wish to regenerate the intermediate data from scratch, please run the processing scripts below.
Run the following R Markdown files to process raw input data (SELEX-seq or PBM) and perform FamilyCode predictions.
| Family | Input Data | Script Name | Output / Notes |
|---|---|---|---|
| bHLH | SELEX-seq | FamilyCodeOnSELEX_bHLH.Rmd |
Generates Supplemental Data S2-S5 |
| bHLH | PBM | FamilyCodeOnPBM_bHLH.Rmd |
|
| bHLH | SELEX + PBM | FamilyCodeOnSELEX+PBM_bHLH.Rmd |
|
| HD | SELEX-seq | FamilyCodeOnSELEX_HD.Rmd |
|
| HD | PBM | FamilyCodeOnPBM_HD.Rmd |
|
| HD | SELEX + PBM | FamilyCodeOnSELEX+PBM_HD.Rmd |
Generates Supplemental Data S6-S7 |
| File | Description | Related Figure |
|---|---|---|
| S1 | JSON configuration file used for all ProBound analyses | — |
| S2 | DNA recognition models for the 52 bHLH factors analyzed | — |
| S3 | Interactive 3D representations of tetrahedrons (Open HTML in browser) | Fig 1A-C |
| S4 | Empirical cumulative distribution of tetrahedral position along PC direction | Fig 3B |
| S5 | Statistical significance of ANOVA test of PCs at DNA position –2/+2 and –3/+3 | Fig 4C |
| S6 | DNA recognition models for the 414 HD factors analyzed | — |
| S7 | Interactive 3D representations of tetrahedrons for HD examples (Open HTML in browser) | Fig 5B |