dark

Baran Babur

How to get YouTube video transcripts with Python

Why not just watch the video?

One night, when I was deciding what to create for my final year project, I decided to go to YouTube and watch a video. It was a long video, and after I’d finished watching it, I couldn’t even remember the beginning of it. Ever felt the same? I looked at the transcript menu YouTube provides and it was so clunky to navigate through it as it assigns timestamps to each line.

Here is where my solution comes in. I wanted a way to get the transcript of a video without having to be on YouTube. This way I could make notes easier and I wouldn’t have to keep going back to the video and skipping around to find what I’m looking for.

Creating the solution

I’d heard of Selenium and headless browsers in the past but didn’t really know what they were. I had a play around with Selenium using Python and I was surprised at how simple it was to get it doing simple things on the internet, such as clicking buttons.

Within an hour, I could accept cookies and open the transcript menu on YouTube. This had me pondering whether I could solve captcha’s, maybe with some Machine Learning to work out what’s it asking to do? Nevermind. I guess that’s for another day.

Once I had the transcript menu open, I extracted the information from the page. From my time at university, I’d used BeautifulSoup4 before. It’s a Python library that can scrape the web and parse HTML. I used it in this context to extract the transcript from the page. After getting the transcript, I saved it to a text file.

As I’m writing this, I’ve just had the idea of using the NotionAPI to save it to Notion. I’ll probably do that after finishing this write up actually, seems like a good idea.

Using the script

I put the script up on GitHub as a gist. It’s available here: https://gist.github.com/baranbbr/7bbefe30cf7783fd392043d3e7e98fc5.

To get it running yourself, you’ll need to install the following packages:

  • Selenium
  • Selenium driver (of your choice, I used Firefox; if you use something different, you’ll need to specify it in the code)
  • BeautifulSoup4

Then, of course, you’ll need to edit the script to point to the YouTube video of your choice.

Important side note, this only works with YouTube videos that have a transcript.

Conclusions

This is a very crude way of doing things and I realise that. There’s a lot of scope for improvement. Considering it took me a couple of hours to get working, I think it’s a decent starting point, and ultimately, it works for now. I have a feeling that it’ll stop working as soon as YouTube changes the UI even a little bit, but I guess we’ll see with time.


Thank you for reading, if you liked this we should connect on Twitter.

Baran Babur © 2022, Built with GatsbyJS