Skip to content

BussemakerLab/FamilyCode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predicting the DNA binding specificity of transcription factors variants using family-level biophysically interpretable machine learning

Authors: Shaoxun Liu, Pilar Gomez-Alcala, Christ Leemans, William J. Glassford, Lucas A.N. Melo, Richard S. Mann, Harmen J. Bussemaker

📄 Publication: Nucleic Acids Research (2025)
🧬 Lab Website: The Bussemaker Lab


📦 Installation

To reproduce the analysis, you must install the FamilyCode package from GitHub.

# Install devtools if you haven't already
if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")

# Install FamilyCode
devtools::install_github('BussemakerLab/FCpackage')

# Load the library
library(FamilyCode)

⬇️ Data Setup (CisBP)

Before running the processing scripts, you must acquire the raw data.

  1. Download: Visit CisBP (v3.00).
  2. Select Family: Choose "bHLH" or "Homeodomain" (download one family at a time).
  3. Select Options: Check the boxes for "Z-score" and "TF info".
  4. Download: Click the "Download Family Archive" button.
  5. Setup: * Uncompress the ZIP files.
    • Place Zscores.txt into the corresponding rawData/HD/ or rawData/bHLH/ directories.

Note: In case of CisBP updates that cause parsing issues, we have provided working versions of Zscores.txt and TF_Information.txt (bHLH: Jan 2019, HD: Jun 2024).

📊 Figure Generation

To reproduce the figures from the paper using pre-computed data:

  • Script: FamilyCodeFigures.Rmd
  • Description: This script uses pre-computed files located in the intermediateData folder.

If you wish to regenerate the intermediate data from scratch, please run the processing scripts below.

⚙️ Data Processing Scripts

Run the following R Markdown files to process raw input data (SELEX-seq or PBM) and perform FamilyCode predictions.

Family Input Data Script Name Output / Notes
bHLH SELEX-seq FamilyCodeOnSELEX_bHLH.Rmd Generates Supplemental Data S2-S5
bHLH PBM FamilyCodeOnPBM_bHLH.Rmd
bHLH SELEX + PBM FamilyCodeOnSELEX+PBM_bHLH.Rmd
HD SELEX-seq FamilyCodeOnSELEX_HD.Rmd
HD PBM FamilyCodeOnPBM_HD.Rmd
HD SELEX + PBM FamilyCodeOnSELEX+PBM_HD.Rmd Generates Supplemental Data S6-S7

📂 Supplemental Data Files

File Description Related Figure
S1 JSON configuration file used for all ProBound analyses
S2 DNA recognition models for the 52 bHLH factors analyzed
S3 Interactive 3D representations of tetrahedrons (Open HTML in browser) Fig 1A-C
S4 Empirical cumulative distribution of tetrahedral position along PC direction Fig 3B
S5 Statistical significance of ANOVA test of PCs at DNA position –2/+2 and –3/+3 Fig 4C
S6 DNA recognition models for the 414 HD factors analyzed
S7 Interactive 3D representations of tetrahedrons for HD examples (Open HTML in browser) Fig 5B