Installation

Overview

Installation is made easy due to conda environments. Simply run conda env create -f environment.yml from the root project directory and conda will create an environment called lecture2notes with all the required packages from environment.yml.

..note:: Read the paper for more in-depth explanations regarding the background, methodology, and results of this project.

Info About Optional Components

Certain functions in the End-To-End transcribe.py file require additional downloads. If you are not using the transcribe feature of the End-To-End approach then this notice can safely be ignored. These extra files may not be necessary depending on your configuration. To use the similarity function to compare two transcripts a spacy model is needed, which you can learn more about on the spacy starter models and core models documentation.

The default transcription method in the End-To-End process is to use vosk. You need to download a vosk model from the models page (Hugging Face Mirror) to use this method or you can specify a different method with the --transcription_method flag such as --transcription_method wav2vec.

The End-To-End figure_detection.py contains a function called detect_figures(). This function requires the EAST (Efficient and Accurate Scene Text Detector) model by default due to the do_text_check argument defaulting to True. See the docstring for more information. You can download the model from Dropbox (this link was extracted from the official code) or Hugging Face. Then just extract the file by running tar -xzvf frozen_east_text_detection.tar.gz.

Quick-Install (Copy & Paste)

git clone https://github.com/HHousen/lecture2notes.git
cd lecture2notes
conda env create
conda activate lecture2notes
python -m spacy download en_core_web_sm
wget "https://huggingface.co/HHousen/lecture2notes/resolve/main/Slide%20Classifier%20Median%20Weights/three-category/epoch%3D8.ckpt" -O lecture2notes/end_to_end/model_best.ckpt

Extras (Linux Only):

Install extras only after the above commands have been run.

sudo apt install curl
sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
sudo apt install ffmpeg sox wget poppler-utils

Commands to download a Vosk model (needed for speech-to-text) are available on the 4. Vosk transcription method page.

Step-by-Step Instructions

Clone this repository: git clone https://github.com/HHousen/lecture2notes.git.
Change to project directory: cd lecture2notes.
Run installation command: conda env create.
Activate newly created conda environment: conda activate lecture2notes.
Run wget “https://huggingface.co/HHousen/lecture2notes/resolve/main/Slide%20Classifier%20Median%20Weights/three-category/epoch%3D8.ckpt” -O lecture2notes/end_to_end/model_best.ckpt from the project root to download the slide classification model and put it in the default expected location.
Other Binary Packages: Install ffmpeg, sox, wget, and poppler-utils with sudo apt install ffmpeg sox wget poppler-utils if on linux. Otherwise, navigate to the sox homepage to download sox, the youtube-dl homepage (GitHub) to download youtube-dl, and follow the directions in this StackOverflow answer (Windows) to install poppler-utils for your platform. ffmpeg is needed for frame extraction in Dataset and End-To-End. sox is needed for automatic audio conversion during the transcription phase of End-To-End. [1] wget is used to download videos that are not on youtube as part of the video_downloader scraper script in Dataset.
End-To-End Process Requirements (Optional)
1. Spacy: Download the small spacy model by running python -m spacy download en_core_web_sm in the project root. This is required to use certain summarization and similarity features (as discussed above). A spacy model is also required when using spacy as a feature extractor in end_to_end/summarization_approaches.py. [2]
2. DeepSpeech/Vosk: Download the DeepSpeech model (the .pbmm acoustic model and the scorer) from the releases page. To reduce complexity save them to deepspeech-models in the project root. [3] Alternatively, it is recommended to download the small vosk model using the commands on the 4. Vosk transcription method page.
3. EAST: Download the EAST model from Dropbox or by running wget https://huggingface.co/HHousen/lecture2notes/resolve/main/frozen_east_text_detection.pb -O end_to_end/frozen_east_text_detection.pb. If downloading from Dropbox, extract it to the End-To-End directory by running tar -xzvf frozen_east_text_detection.tar.gz -C end_to_end/
Dataset Collection Requirements (Optional) YouTube API
1. Run cp .env.example .env to create a copy of the example .env file.
2. Add your YouTube API key to your .env file.
3. You can now use the scraper scripts to scrape YouTube and create the dataset needed to train the slide classifier.
Transcript Download w/YouTube API (Not Recommended) If you want to download video transcripts with the YouTube API [4], place your client_secret.json in the dataset/scraper-scripts folder (if you want to download transcripts with the scraper-scripts) or in End-To-End (if you want to download transcripts in the entire end-to-end process that converts a lecture video to notes).

Footnotes