supp_01_main is the main video showing our results. supp_02_audio_segmentation_demo is the video that records how we annotate audio in our database. supp_03_audio_segmentation_results is a sample of final audio annotation.