In my last post, I had talked about preferring text content to videos and coincidentally my Hacker News feed happened to cover a tool that takes a video link and creates a webpage with a transcript generated from the video’s closed captions paired with screenshots of the video. The program is called Youtube-to-Webpage. It is a Perl script that uses yt-dlp & ffmpeg to do the processing.
I tried it out using the curl video I talked about in the previous command as the input and the software did a decent job capturing the details. The output is very plain and looks like the following:

Transcription of Curl Training video
Since the program uses the built-in YouTube captions for getting the text from the video, the transcription is only as good as how good the captions are. One enhancement, that could make it better is to use a Speech-to-Text engine and use that text in the output. The slightly tricky part would be to match the screenshots with the audio/transcription timestamps.
Check it out if you prefer to read text instead of videos. I wonder how the output would look if we feed this to a LLM and ask it to make it like an article. That can be something we can explore for the next post 🙂
– Suramya