This repository is the official implementation of โSelf-Supervised Music-Motion Synchronization Learning for Music-Driven Conducting Motion Generationโ, by Fan Liu, Delong Chen, Ruizhi Zhou, Sai Yang, and Feng Xu. This repository also provide the access to the ConductorMotion100 dataset, which consists of 100 hours of orchestral conductor motions and aligned music Mel spectrogram.
The above figure gives a high-level illustration of the proposed two-stage approach. The contrastive learning and generative learning stage are bridged by transferring learned music and motion encoders, as noted in dotted lines. Our approach can generate plausible, diverse, and music-synchronized conducting motion.
Updates๐
- Mar 2021. Demo Video (preliminary version) released at bilibili.
- Apr 2021. ICME 2021 Demo Video released at bilibili.
- Apr 2021. Demo Video (with Dynamic Frequency Domain Decomposition) released.
- Jun 2021. The recording of graduation thesis defense released. The graduation thesis is awarded as Outstanding Graduation Thesis of Hohai University (ๆฒณๆตทๅคงๅญฆไผ็งๆฏไธ่ฎบๆ) and First-class Outstanding Graduation Thesis of Jiangsu Province (ๆฑ่็ไผ็งๆฏไธ่ฎบๆไธ็ญๅฅ)!
- Jul 2021. The VirtualConductor project is awarded as Best Demo of IEEE International Conference on Multimedia and Expo (ICME) 2021!
- Mar 2022. ConductorMotion100 is made publicly available, as a track in the โProspective Cupโ competition (่ฟ่งๆฏ) hold by JSCS (ๆฑ่็่ฎก็ฎๆบๅญฆไผ). Please see here for details.
- May 2022. Our paper is published at Journal of Computer Science and Technology (JCST). Check our paper!
- Nov 2022. Code for JCST paper is released.
- 
Clone this repo: git clone https://github.com/ChenDelong1999/VirtualConductor.git cd VirtualConductor
- 
Create a conda virtual environment and activate it: conda create -n VirtualConductor python=3.6 -y conda activate VirtualConductor 
- 
Install CUDA Toolkit 11.3(link) andcudnn==8.2.1(link), then installPyTorch==1.10.1:conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -y # if you prefer other cuda versions, please choose suitable pytorch versions # see: https://pytorch.org/get-started/locally/ 
- 
Install other requirements: conda install ffmpeg -c conda-forge -y pip install librosa matplotlib scipy tqdm moviepy opencv-python tensorboard 
- Copy your music file to /test/test_samples/folder. We have prepared some for you.
- You need the pretrained weights of a  M2S-GAN to generate motions. We have prepared a pretrained checkpoint, which is placed at checkpoints/M2SGAN/M2SGAN_official_pretrained.pt.
- Now, by run the following comment, the test_unseen.pywill do the following:- 
enumerate all samples in /test/test_samples/folder,
- 
extract Mel spectrogram from music, 
- 
generate conducting motions, and 
- 
save result videos to /test/result/python test_unseen.py --model 'checkpoints/M2SGAN/M2SGAN_official_pretrained.pt'
 
- 
The ConductorMotion100 dataset can be downloaded in the following ways:
- The training set๏ผhttps://pan.baidu.com/s/1Pmtr7V7-9ChJqQp04NOyZg?pwd=3209
- The validation set๏ผhttps://pan.baidu.com/s/1B5JrZnFCFvI9ABkuJeWoFQ?pwd=3209
- The test set๏ผhttps://pan.baidu.com/s/18ecHYk9b4YM5YTcBNn37qQ?pwd=3209
You can also access the dataset via Google Drive
There are 3 splits of ConductorMotion100: train, val, and test. They respectively correspond to 3 .rar files. After extract them to <Your Dataset Dir> folder, the file structure will be:
tree <Your Dataset Dir>
<Your Dataset Dir>
    โโโโtrain
    โ   โโโโ0
    โ   โ       mel.npy
    โ   โ       motion.npy
    |  ...
    โ   โโโโ5268
    โ           mel.npy
    โ           motion.npy
    โโโโval
    โ   โโโโ0
    โ   โ       mel.npy
    โ   โ       motion.npy
    |  ...
    โ   โโโโ290
    โ           mel.npy
    โ           motion.npy
    โโโโtest
        โโโโ0
        โ       mel.npy
        โ       motion.npy
       ...
        โโโโ293
                mel.npy
                motion.npy
Each mel.npy and motion.npy are corresponded to 60 seconds of Mel spectrogram and motion data. Their sampling rates are respectively 90 Hz and 30 Hz. The Mel spectrogram has 128 frequency bins, therefore mel.shape = (5400, 128). The motion data contains 13 2d keypoints, therefore motion.shape = (1800, 13, 2)
We provide codes to load and visualize the dataset, as in utils/dataset.py. You can run this file by:
python utils/dataset.py --dataset_dir <Your Dataset Dir>Then the script will enumerate all the samples in the dataset. You will get:
During training, use tensorboard --logdir runs to set up tensorboard logging. Model checkpoints will be saved to /checkpoints/ folder.
- 
Step 1 - 
Start contrastive learning stage, train the M2S-Net: python M2SNet_train.py --dataset_dir <Your Dataset Dir> It takes ~36 hours with a Titan Xp GPU. With tensorboard ( tensorboard --logdir runs), you can visualize the training procedure:We also provide the visualization of the features extracted by M2S-Net  
 
- 
- 
Step 2 (optional) 
- 
Step 3 
For more details of the "Prospective Cup" competition, please see here.
Copyright (c) 2022 Delong Chen. Contact me for commercial use (or rather any use that is not academic research) (email: [email protected]). Free for research use, as long as proper attribution is given and this copyright notice is retained.
- 
Delong Chen, Fan Liu*, Zewen Li, Feng Xu. VirtualConductor: Music-driven Conducting Video Generation System. IEEE International Conference on Multimedia and Expo (ICME) 2021, Demo Track (Best Demo). @article{chen2021virtualconductor, author = {Delong Chen and Fan Liu and Zewen Li and Feng Xu}, title = {VirtualConductor: Music-driven Conducting Video Generation System}, journal = {CoRR}, volume = {abs/2108.04350}, year = {2021}, url = {https://arxiv.org/abs/2108.04350}, eprinttype = {arXiv}, eprint = {2108.04350} } 
- 
Fan Liu, Delong Chen*, Ruizhi Zhou, Sai Yang, and Feng Xu. Self-Supervised Music-Motion Synchronization Learning for Music-Driven Conducting Motion Generation. Journal of Computer Science and Technology. @article{liu2022self, author = {Fan Liu and Delong Chen and Ruizhi Zhou and Sai Yang and Feng Xu}, title = {Self-Supervised Music Motion Synchronization Learning for Music-Driven Conducting Motion Generation}, journal = {Journal of Computer Science and Technology}, volume = {37}, number = {3}, pages = {539--558}, year = {2022}, url = {https://doi.org/10.1007/s11390-022-2030-z}, doi = {10.1007/s11390-022-2030-z} } 





