Two Lessons learned to end the day

# Lesson 1: Data quality is most important.

devops_toolkit_one.txt

It's **VERY** important to make sure that the data that you're using is 'good' data. See the attached example, 'devops_toolkit_one.txt', the entire script is littered with *Ian*. The author was trying to saying 'Aiven'. Considering there is a lot of content on you tube that could be benefitted from by just reading transcripts, as opposed to listening or watching. This highlights the necessity for data quality. Even the state of the art won't be able to do anything with this kind of data.

*Update*: Upon further examination, the actuality is it was substituting 'Aiven' with *Ivan*. Which leads to Lesson 2.

# Lesson 2: You probably want that Metadata and... Watch the damn video.

devops_toolkit 2.json

I watched part of the video and assumed that Ian stood for 'Aiven' when actually, *Ivan* stood for it. This leads me to a critical point, especially for people who aren't speaking clearly, the Heading, subtitles, and descriptions matter in the form of Metadata. I mention this because, it seems like youtube video transcription is becoming more and more popular as the AI craze has been refueled. Also, for the scrappers... Collect and store that Metadata with your scrapes, and also... Be mindful of others servers, we're all experimenting at some point. Don't overload someone else's servers, and be mindful of how you're using content. Anyways, just a thought after working with...

https://aws.amazon.com/transcribe/

https://ffmpeg.org/

https://github.com/yt-dlp/yt-dlp

https://aws.amazon.com/bedrock/

https://python.langchain.com/

Credit to the video creator, interesting video for managing your databases at scale, multi-cloud, using a nice gui.

https://www.youtube.com/watch?v=VhlY7kkAw7w

4 comments