!pip install pytubefix
ERROR: Invalid requirement: '#': Expected package name at the start of dependency specifier
#
^
Ծառավ հաֆո, լուսանկարի հղումը, Հեղինակ՝ Rafael Ishkhanyan
ToDo 1. Տեսություն 2025
2. Տեսություն 2023 (ToDo)
3. Գործնական 2025
4. Գործնական 2023 (ToDo)
5. Որոշ տնայինների լուծումներ (ToDo)
Google Forms ToDo
Դասընթացի առաջին կտորը ամփոփող հարցում։ Մերսի լրացնելու համար: Սա շնորհակալության նոտաներ։
Use all 4 pillars of OOP and data classes 1. Extract metadata for given video from youtube 2. Extract transcript 3. Translate the transcript
First of all, you may want to create a venv
conda create -n youtube
conda activate youtube
!pip install pytubefix
ERROR: Invalid requirement: '#': Expected package name at the start of dependency specifier
#
^
We use pytubefix, because the original pytube is not working anymore.
from typing import List
from datetime import datetime
from pytubefix import YouTube
from dataclasses import dataclass, asdict
= "https://www.youtube.com/watch?v=tERRFWuYG48" sample_video
= YouTube(sample_video) yt
dir(yt)
__dir__()[15:20] yt.
['fallback_clients',
'_signature_timestamp',
'_visitor_data',
'stream_monostate',
'_author']
yt.title, yt.video_id, yt.keywords, yt.views, yt.length
('Barfuß Am Klavier - AnnenMayKantereit',
'tERRFWuYG48',
['AnnenMayKantereit',
'Barfuß Am Klavier',
'oft gefragt',
'henning may',
'klavier'],
76843686,
201)
@dataclass
class Video:
str
video_id: str
title: str]
keywords: List[int
views: int
length: # published: datetime
@dataclass()
class VideoInfo:
str
video_id: str
title: str]
keywords: List[
publish_date: datetime int length_seconds:
=1, title="Sample Video", keywords=["sample", "video"], publish_date="2023-10-01", length_seconds=300) VideoInfo(video_id
VideoInfo(video_id=1, title='Sample Video', keywords=['sample', 'video'], publish_date='2023-10-01', length_seconds=300)
Does not throw an error although the video_id is not an int. We’ll use pydantic to validate the data in future
class YouTubeVideo:
def __init__(self, url: str):
self.video = YouTube(url)
def __str__(self): # polymorphism
return f"{self.video.title} ({self.video.video_id})"
def get_metadata(self) -> VideoInfo:
return VideoInfo(
=self.video.video_id,
video_id=self.video.title,
title=self.video.keywords,
keywords=self.video.publish_date,
publish_date=self.video.length
length_seconds
)
= YouTubeVideo(sample_video) video
print(video)
Barfuß Am Klavier - AnnenMayKantereit (tERRFWuYG48)
= video.get_metadata() vid_info
asdict(vid_info)
{'video_id': 'tERRFWuYG48',
'title': 'Barfuß Am Klavier - AnnenMayKantereit',
'keywords': ['AnnenMayKantereit',
'Barfuß Am Klavier',
'oft gefragt',
'henning may',
'klavier'],
'publish_date': datetime.datetime(2014, 10, 27, 4, 6, 51, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200))),
'length_seconds': 201}
@dataclass()
class VideoInfo:
str
video_id: str
title: str]
keywords: List[str
publish_date: int
length_seconds:
@staticmethod
def get_days_since_publish(publish_date) -> int:
if isinstance(publish_date, datetime):
= publish_date.strftime("%Y-%m-%d")
publish_date = datetime.strptime(publish_date, "%Y-%m-%d")
publish_date = datetime.now()
current_date return (current_date - publish_date).days
def __post_init__(self):
if not isinstance(self.video_id, str):
raise ValueError("video_id must be a string")
if not isinstance(self.length_seconds, int):
raise ValueError("length_seconds must be an integer")
self.days_since_publish = self.get_days_since_publish(self.publish_date)
= video.get_metadata() vid_info
vid_info.days_since_publish
3894
= YouTube(sample_video) yt
yt.captions
{'de': <Caption lang="German" code="de">}
= yt.captions.get("de")
captions
# captions.generate_txt_captions()
# dir(captions)
captions.generate_srt_captions()
'1\n00:00:11,139 --> 00:00:13,330\nIch sitz schon wieder barfuß am Klavier.\n\n2\n00:00:17,300 --> 00:00:24,340\nIch träume Liebeslieder und sing dabei von dir.\n\n3\n00:00:24,340 --> 00:00:33,650\nDu und ich, wir waren wunderlich.\n\n4\n00:00:33,650 --> 00:00:35,950\nNicht für mich.\n\n5\n00:00:35,950 --> 00:00:41,449\nFür die, die es störte, wenn man uns nachts hörte.\n\n6\n00:00:41,449 --> 00:00:47,520\nIch hab mit dir gemeinsam einsam rumgesessen und geschwiegen.\n\n7\n00:00:47,520 --> 00:00:52,300\nIch erinnere mich am Besten ans gemeinsam einsam Liegen.\n\n8\n00:00:52,300 --> 00:01:01,960\nJeden Morgen danach bei dir; du nackt im Bett – und ich barfuß am Klavier.\n\n9\n00:01:02,960 --> 00:01:08,920\nUnd ich sitz schon wieder barfuß am Klavier.\n\n10\n00:01:08,920 --> 00:01:19,369\nIch träume Liebeslieder und sing dabei von dir.\n\n11\n00:01:19,369 --> 00:01:26,990\nDu und ich, das ging so nicht.\n\n12\n00:01:33,610 --> 00:01:38,180\nDu wolltest alles wissen und das hat mich vertrieben.\n\n13\n00:01:38,180 --> 00:01:43,630\nEigentlich dich, du bist nicht länger geblieben; bei mir.\n\n14\n00:01:43,630 --> 00:01:49,759\nAlso sitz ich, um zu lieben, lieber barfuß am Klavier.\n\n15\n00:01:49,759 --> 00:01:58,909\nUnd ich sitz schon wieder barfuß am Klavier.\n\n16\n00:01:58,909 --> 00:02:07,340\nIch träume Liebeslieder und sing dabei von dir.\n\n17\n00:02:07,340 --> 00:02:38,420\nDu und ich, wir waren zu wenig.\n\n18\n00:02:38,420 --> 00:02:46,400\nIch sitz schon wieder barfuß am Klavier.\n\n19\n00:02:46,500 --> 00:02:49,230\nUnd träum dabei von dir.\n\n20\n00:02:49,230 --> 00:03:21,500\nIch träum dabei von dir.'
=yt.title, output_path=yt.video_id) captions.download(title
'c:\\Users\\hayk_\\OneDrive\\Desktop\\python_math_ml_course\\python\\tERRFWuYG48\\Barfuß Am Klavier - AnnenMayKantereit (de).srt'
class YouTubeVideo:
def __init__(self, url: str):
self.video = YouTube(url)
def get_metadata(self) -> VideoInfo:
return VideoInfo(
=self.video.video_id,
video_id=self.video.title,
title=self.video.keywords,
keywords=self.video.publish_date,
publish_date=self.video.length
length_seconds
)
def get_transcript(self, language: str = "de") -> str:
= self.video.captions.get(language)
captions if not captions:
raise ValueError(f"No captions available for language: {language}")
self.text = captions.generate_txt_captions()
return self.text
def download_transcript(self, language: str = "de", title: str = "transcript", output_path: str = "transcript.txt") -> None:
= self.video.captions.get(language)
captions if not captions:
raise ValueError(f"No captions available for language: {language}")
=title, output_path=output_path)
captions.download(title
= YouTubeVideo(sample_video)
video
= video.get_transcript(language="hy") transcript
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Input In [62], in <cell line: 3>() 1 video = YouTubeVideo(sample_video) ----> 3 transcript = video.get_transcript(language="hy") Input In [61], in YouTubeVideo.get_transcript(self, language) 15 captions = self.video.captions.get(language) 16 if not captions: ---> 17 raise ValueError(f"No captions available for language: {language}") 18 self.text = captions.generate_txt_captions() 19 return self.text ValueError: No captions available for language: hy
= video.get_transcript(language="de") transcript
= YouTube(sample_video) yt
# yt.streams.filter(file_extension="mp4").desc().first().download()
<Stream: itag="18" mime_type="video/mp4" res="360p" fps="25fps" vcodec="avc1.42001E" acodec="mp4a.40.2" progressive="True" sabr="False" type="video">
yt.streams.get_audio_only()
<Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2" progressive="False" sabr="False" type="audio">
= f"{yt.video_id}.mp3"
output_path
=output_path) yt.streams.get_audio_only().download(filename
'c:\\Users\\hayk_\\OneDrive\\Desktop\\python_math_ml_course\\python\\tERRFWuYG48.mp3'
= f"{yt.video_id}.mp4"
output_path
=output_path) yt.streams.get_lowest_resolution().download(filename
'c:\\Users\\hayk_\\OneDrive\\Desktop\\python_math_ml_course\\python\\tERRFWuYG48.mp4'
class YouTubeVideo:
def __init__(self, url: str):
self.video = YouTube(url)
def get_metadata(self) -> VideoInfo:
return VideoInfo(
=self.video.video_id,
video_id=self.video.title,
title=self.video.keywords,
keywords=self.video.publish_date,
publish_date=self.video.length
length_seconds
)
def get_transcript(self, language: str = "de") -> str:
= self.video.captions.get(language)
captions if not captions:
raise ValueError(f"No captions available for language: {language}")
self.text = captions.generate_txt_captions()
return self.text
def download_transcript(self, language: str = "de", title: str = "transcript", output_path: str = "transcript.txt") -> None:
= self.video.captions.get(language)
captions if not captions:
raise ValueError(f"No captions available for language: {language}")
=title, output_path=output_path)
captions.download(title
def download_audio(self, output_path=None) -> None:
"""Download the audio stream of the YouTube video.
Args:
output_path (_type_, optional): _description_. Defaults to None.
Raises:
ValueError: _description_
"""
if output_path is None:
= self.video.video_id
output_path
= self.video.streams.get_audio_only()
audio_stream if not audio_stream:
raise ValueError("No audio stream available")
=output_path)
audio_stream.download(output_path
def download_video(self, output_path=None) -> None:
if output_path is None:
= self.video.video_id
output_path
= self.video.streams.get_lowest_resolution()
video_stream if not video_stream:
raise ValueError("No video stream available")
=output_path) video_stream.download(output_path
= YouTubeVideo(sample_video) yt
"panir") yt.download_audio(
yt.download_video()
We’re gonna use googletrans and DeepL. Maybe more stuff later.
!pip install googletrans
import googletrans
from googletrans import Translator
# https://github.com/ssut/py-googletrans/tree/main
# https://stackoverflow.com/questions/55409641/asyncio-run-cannot-be-called-from-a-running-event-loop-when-using-jupyter-no
async def translate_text(text):
async with Translator() as translator:
= await translator.translate(text, dest='hy')
result print(result) # Translated(src=en, dest=hy, text=պանիր, pronunciation=panir, extra_data="{'translat...")
await translate_text(text="cheese")
Translated(src=en, dest=hy, text=պանիր, pronunciation=panir, extra_data="{'translat...")
# googletrans.LANGUAGES
SETX DEEPL_API_KEY some_string
SUCCESS: Specified value was saved.
%DEEPL_API_KEY% echo
%DEEPL_API_KEY%
import os
os.environ
"DEEPL_API_KEY") os.environ.get(
import deepl
= os.getenv('DEEPL_API_KEY')
auth_key assert auth_key is not None, "Please set the DEEPL_API_KEY environment variable."
= deepl.Translator(auth_key)
deepl_client
deepl_client.get_source_languages()
print("Source languages:")
for language in deepl_client.get_source_languages():
print(f"{language.name} ({language.code})") # Example: "German (DE)"
print("Target languages:")
for language in deepl_client.get_target_languages():
if language.supports_formality:
print
f"{language.name} ({language.code}) supports formality")
(# Example: "Italian (IT) supports formality"
else:
print(f"{language.name} ({language.code})")
# Example: "Lithuanian (LT)"
վայ, հայերեն չկար :)
= deepl_client.translate_text("Ich liebe dich",
result ="EN-US")
target_langprint(result.text)
I love you
from abc import ABC, abstractmethod
class BaseTranslator(ABC):
@abstractmethod
def translate(self, text: str, target_language: str) -> str:
pass
@abstractmethod
def detect_language(self, text: str) -> str:
pass
@abstractmethod
def get_supported_languages(self) -> List[str]:
pass
class GoogleTranslator(BaseTranslator):
"""Google Translate API implementation of BaseTranslator."""
@staticmethod
async def translate(text, target_lang):
async with Translator() as translator:
= await translator.translate(text, dest=target_lang)
result return result.text
@staticmethod
def detect_language(text: str) -> str:
pass
@staticmethod
def get_supported_languages() -> List[str]:
return googletrans.LANGUAGES
= GoogleTranslator() gt
gt.get_supported_languages()
= GoogleTranslator()
gt
# gt.translate(text="Ich liebe dich", target="EN-US")
await gt.translate(text="Ich liebe dich", target_lang="en")
'I love you'
class DeepLTranslator(BaseTranslator):
def __init__(self, api_key) -> None:
try:
= deepl.DeepLClient(api_key)
deepl_client except:
raise ValueError("Invalid DeepL API key provided.")
self._client = deepl_client # procted
def translate(self, text: str, target_lang: str) -> str:
= self._client.translate_text(text, target_lang=target_lang)
result return result.text
def detect_language(self, text: str) -> str:
pass
def get_supported_languages(self) -> List[str]:
pass
= DeepLTranslator(api_key=os.getenv('DEEPL_API_KEY')) dl
="Ich liebe dich", target_lang="EN-US") dl.translate(text
'I love you'
import json
class Pipeline(YouTubeVideo):
def __init__(self, url: str, deepl_api_key: str):
__init__(self, url) # super().__init__(url)
YouTubeVideo.self.dl_translator = DeepLTranslator(api_key=deepl_api_key)
self.google_translator = GoogleTranslator()
self.result = None
async def get_translated_transcript(self) -> str:
self.transcript = self.get_transcript()
if not self.transcript:
raise ValueError("No transcript available for this video.")
= await self.google_translator.translate(text=self.transcript, target_lang="en")
text_google = self.dl_translator.translate(text=self.transcript, target_lang="EN-US")
text_deepl
self.result = {
"original": self.transcript,
"google": text_google,
"deepl": text_deepl
}
return self.result
def save_transcript_json(self, output_path: str = "transcript.json") -> None:
if self.result is None:
self.get_translated_transcript()
with open(output_path, 'w', encoding='utf-8') as f:
self.result, f, ensure_ascii=False, indent=4) json.dump(
= Pipeline(sample_video, deepl_api_key=os.getenv('DEEPL_API_KEY')) p
p.get_transcript()
'Ich sitz schon wieder barfuß am Klavier. Ich träume Liebeslieder und sing dabei von dir. Du und ich, wir waren wunderlich. Nicht für mich. Für die, die es störte, wenn man uns nachts hörte. Ich hab mit dir gemeinsam einsam rumgesessen und geschwiegen. Ich erinnere mich am Besten ans gemeinsam einsam Liegen. Jeden Morgen danach bei dir; du nackt im Bett – und ich barfuß am Klavier. Und ich sitz schon wieder barfuß am Klavier. Ich träume Liebeslieder und sing dabei von dir. Du und ich, das ging so nicht. Du wolltest alles wissen und das hat mich vertrieben. Eigentlich dich, du bist nicht länger geblieben; bei mir. Also sitz ich, um zu lieben, lieber barfuß am Klavier. Und ich sitz schon wieder barfuß am Klavier. Ich träume Liebeslieder und sing dabei von dir. Du und ich, wir waren zu wenig. Ich sitz schon wieder barfuß am Klavier. Und träum dabei von dir. Ich träum dabei von dir.'
= await p.get_translated_transcript() res
res
{'original': 'Ich sitz schon wieder barfuß am Klavier. Ich träume Liebeslieder und sing dabei von dir. Du und ich, wir waren wunderlich. Nicht für mich. Für die, die es störte, wenn man uns nachts hörte. Ich hab mit dir gemeinsam einsam rumgesessen und geschwiegen. Ich erinnere mich am Besten ans gemeinsam einsam Liegen. Jeden Morgen danach bei dir; du nackt im Bett – und ich barfuß am Klavier. Und ich sitz schon wieder barfuß am Klavier. Ich träume Liebeslieder und sing dabei von dir. Du und ich, das ging so nicht. Du wolltest alles wissen und das hat mich vertrieben. Eigentlich dich, du bist nicht länger geblieben; bei mir. Also sitz ich, um zu lieben, lieber barfuß am Klavier. Und ich sitz schon wieder barfuß am Klavier. Ich träume Liebeslieder und sing dabei von dir. Du und ich, wir waren zu wenig. Ich sitz schon wieder barfuß am Klavier. Und träum dabei von dir. Ich träum dabei von dir.',
'google': "I am already sitting barefoot on the piano again. I dream love songs and sing from you. You and me, we were wonderful. Not for me. For those who bothered when we were heard at night. I lonely lonely with you and kept silent. The best way to remember lonely remember. Every morning with you; You naked in bed - and I barefoot on the piano. And I'm already sitting barefoot on the piano again. I dream love songs and sing from you. You and me, it didn't work that way. You wanted to know everything and that drove me out. Actually you, you no longer stayed; with me. So to love, I prefer barefoot on the piano. And I'm already sitting barefoot on the piano again. I dream love songs and sing from you. You and me, we were too little. I am already sitting barefoot on the piano again. And dream of you. I dream of you.",
'deepl': "I'm sitting barefoot at the piano again. I'm dreaming love songs and singing about you. You and I, we were strange. Not for me. For those who were disturbed by hearing us at night. I sat around with you, lonely and silent. I remember lying alone together best. Every morning afterwards with you; you naked in bed - and me barefoot at the piano. And I'm sitting barefoot at the piano again. I dream love songs and sing about you. You and I, that wasn't possible. You wanted to know everything and that drove me away. Actually you, you didn't stay with me any longer. So in order to love, I prefer to sit barefoot at the piano. And I'm sitting barefoot at the piano again. I dream love songs and sing about you. You and I, we weren't enough. I'm sitting barefoot at the piano again. And dreaming of you. I'm dreaming of you."}
p.save_transcript_json()