Installation is made easy due to conda environments. Simply run conda env create -f environment.yml from the root project directory and conda will create an environment called lecture2notes with all the required packages from environment.yml.

..note:: Read the paper for more in-depth explanations regarding the background, methodology, and results of this project.

Info About Optional Components

Certain functions in the End-To-End file require additional downloads. If you are not using the transcribe feature of the End-To-End approach then this notice can safely be ignored. These extra files may not be necessary depending on your configuration. To use the similarity function to compare two transcripts a spacy model is needed, which you can learn more about on the spacy starter models and core models documentation.

The default transcription method in the End-To-End process is to use vosk. You need to download a vosk model from the models page (Google Drive Mirror) to use this method or you can specify a different method with the --transcription_method flag such as --transcription_method wav2vec.

The End-To-End contains a function called detect_figures(). This function requires the EAST (Efficient and Accurate Scene Text Detector) model by default due to the do_text_check argument defaulting to True. See the docstring for more information. You can download the model from Dropbox (this link was extracted from the official code) or Google Drive (my mirror). Then just extract the file by running tar -xzvf frozen_east_text_detection.tar.gz.

Quick-Install (Copy & Paste)

git clone
cd lecture2notes
conda env create
conda activate lecture2notes
python -m spacy download en_core_web_sm
gdown "" -O lecture2notes/end_to_end/model_best.ckpt

Extras (Linux Only):

Install extras only after the above commands have been run.

sudo apt install curl
sudo curl -L -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
sudo apt install ffmpeg sox wget poppler-utils

Commands to download a Vosk model (needed for speech-to-text) are available on the 4. Vosk transcription method page.

Step-by-Step Instructions

  1. Clone this repository: git clone

  2. Change to project directory: cd lecture2notes.

  3. Run installation command: conda env create.

  4. Activate newly created conda environment: conda activate lecture2notes.

  5. Run gdown “” -O lecture2notes/end_to_end/model_best.ckpt from the project root to download the slide classification model and put it in the default expected location.

  6. Other Binary Packages: Install ffmpeg, sox, wget, and poppler-utils with sudo apt install ffmpeg sox wget poppler-utils if on linux. Otherwise, navigate to the sox homepage to download sox, the youtube-dl homepage (GitHub) to download youtube-dl, and follow the directions in this StackOverflow answer (Windows) to install poppler-utils for your platform. ffmpeg is needed for frame extraction in Dataset and End-To-End. sox is needed for automatic audio conversion during the transcription phase of End-To-End. 1 wget is used to download videos that are not on youtube as part of the video_downloader scraper script in Dataset.

  7. End-To-End Process Requirements (Optional)
    1. Spacy: Download the small spacy model by running python -m spacy download en_core_web_sm in the project root. This is required to use certain summarization and similarity features (as discussed above). A spacy model is also required when using spacy as a feature extractor in end_to_end/ 2

    2. DeepSpeech/Vosk: Download the DeepSpeech model (the .pbmm acoustic model and the scorer) from the releases page. To reduce complexity save them to deepspeech-models in the project root. 3 Alternatively, it is recommended to download the small vosk model using the commands on the 4. Vosk transcription method page.

    3. EAST: Download the EAST model from Dropbox or by running gdown Extract it to the End-To-End directory by running tar -xzvf frozen_east_text_detection.tar.gz -C end_to_end/

  8. Dataset Collection Requirements (Optional) YouTube API
    1. Run cp .env.example .env to create a copy of the example .env file.

    2. Add your YouTube API key to your .env file.

    3. You can now use the scraper scripts to scrape YouTube and create the dataset needed to train the slide classifier.

  9. Transcript Download w/YouTube API (Not Recommended) If you want to download video transcripts with the YouTube API 4, place your client_secret.json in the dataset/scraper-scripts folder (if you want to download transcripts with the scraper-scripts) or in End-To-End (if you want to download transcripts in the entire end-to-end process that converts a lecture video to notes).



If your audio is 16000Hz, 1 channel, and .wav format, then sox is not needed.


The default is not to use spacy for feature extraction but the large model (which can be downloaded with python -m spacy download en_core_web_lg) is the default if spacy is manually chosen. So make sure to download the large model if you want to use spacy for feature extraction.


Folder name and location do not matter. Just make sure the scorer and model are in the same directory. The scripts will automatically detect each when given the path to the folder containing them.


The default is to use youtube-dl which needs no API key.