Compare commits

...

22 Commits

Author SHA1 Message Date
bashonly
56af1477f7
Merge branch 'master' into abema-title-login 2024-01-19 03:45:44 -06:00
sefidel
c51316f8a6
[ie/abematv] Fix extraction with cache (#8895)
Closes #6532
Authored by: sefidel
2024-01-19 09:43:13 +00:00
sepro
a281beba8d
[ie/naver] Fix extractors (#8883)
Closes #8850, Closes #8692
Authored by: seproDev
2024-01-19 05:41:10 +01:00
DmitryScaletta
ba6b0c8261
[ie/chzzk] Add extractors (#8887)
Closes #8804
Authored by: DmitryScaletta
2024-01-19 04:16:21 +01:00
Karavellas
6171b050d7
[ie/ElementorEmbed] Add extractor (#8948)
Authored by: pompos02, seproDev

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
2024-01-19 04:00:49 +01:00
Giulio Muscarello
aa5dcc4ee6
[ie/IlPost] Add extractor (#9001)
Authored by: CapacitorSet
2024-01-19 03:51:53 +01:00
Philipp Waldhauer
5e2e24b2c5
[ie/MagentaMusik] Add extractor (#7790)
Authored by: pwaldhauer, seproDev

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
2024-01-19 00:52:13 +01:00
gmes78
fee2d8d9c3
[ie/Rule34Video] Extract more metadata (#7416)
Closes #7233
Authored by: gmes78
2024-01-19 00:41:28 +01:00
Akmal
cf9af2c7f1
[ie/Facebook] Add new ID format (#3824)
Closes #3496
Authored by: Wikidepia, kclauhk

Co-authored-by: kclauhk <78251477+kclauhk@users.noreply.github.com>
2024-01-19 00:40:08 +01:00
HobbyistDev
cf6413e840
[ie/BiliIntl] Fix and improve subtitles extraction (#7077)
Closes #7075, Closes #6664
Authored by: HobbyistDev, itachi-19, dirkf, seproDev

Co-authored-by: itachi-19 <16500619+itachi-19@users.noreply.github.com>
Co-authored-by: dirkf <fieldhouse@gmx.net>
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
2024-01-19 00:27:25 +01:00
jazz1611
5498729c59
[ie/GoogleDrive] Fix source file extraction (#8990)
Closes #8976
Authored by: jazz1611
2024-01-19 00:24:34 +01:00
Nicolas Appriou
393b487a4e
[ie/ArteTV] Separate closed captions (#8231)
Authored by: Nicals, seproDev

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
2024-01-19 00:23:29 +01:00
Bibhav48
4d9dc0abe2
[ie/cloudflarestream] Extract subtitles (#9007)
Closes #8830
Authored by: Bibhav48
2024-01-18 21:20:04 +00:00
Andrew Gibson
014cb5774d
[ie/aenetworks] Rating should be optional for AP extraction (#9005)
Authored by: agibson-fl
2024-01-18 21:18:04 +00:00
Finn R. Gärtner
8e6e365172
[ie/Piapro] Improve _VALID_URL (#8999)
Authored by: FinnRG
2024-01-14 18:28:03 +00:00
Max
95e82347b3
[ie/Viously] Add extractor (#8927)
Replaces Turbo extractor

Authored by: nbr23, seproDev

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
2024-01-09 04:11:52 +01:00
DmitryScaletta
5b8c69ae04
[ie/twitch] Fix m3u8 extraction (#8960)
Closes #8958
Authored by: DmitryScaletta
2024-01-09 02:47:13 +00:00
garret
5af1f19787
[ie/NhkRadiruLive] Make metadata extraction non-fatal (#8956)
Authored by: garret1317
2024-01-08 17:59:44 +00:00
Simon Sawicki
b6951271ac
[ie/ard:mediathek] Revert to using old id (#8916)
Authored by: Grub4K
2024-01-05 21:34:38 +01:00
Simon Sawicki
ffbd4f2a02
[utils] traverse_obj: Support xml.etree.ElementTree.Element (#8911)
Authored by: Grub4K
2024-01-05 21:26:17 +01:00
mara004
292d60b1ed
[cleanup] Fix typo in README.md (#8894)
Authored by: antonkesy
2024-01-05 18:13:46 +01:00
Ralph Drake
85b33f5c16
[cookies] Fix --cookies-from-browser with macOS Firefox profiles (#8909)
Ref: https://support.mozilla.org/en-US/kb/profile-manager-create-remove-switch-firefox-profiles#firefox:mac

Closes #8898
Authored by: RalphORama
2024-01-02 00:58:36 +00:00
25 changed files with 806 additions and 294 deletions

View File

@ -280,7 +280,7 @@ While all the other dependencies are optional, `ffmpeg` and `ffprobe` are highly
* [**ffmpeg** and **ffprobe**](https://www.ffmpeg.org) - Required for [merging separate video and audio files](#format-selection) as well as for various [post-processing](#post-processing-options) tasks. License [depends on the build](https://www.ffmpeg.org/legal.html)
There are bugs in ffmpeg that causes various issues when used alongside yt-dlp. Since ffmpeg is such an important dependency, we provide [custom builds](https://github.com/yt-dlp/FFmpeg-Builds#ffmpeg-static-auto-builds) with patches for some of these issues at [yt-dlp/FFmpeg-Builds](https://github.com/yt-dlp/FFmpeg-Builds). See [the readme](https://github.com/yt-dlp/FFmpeg-Builds#patches-applied) for details on the specific issues solved by these builds
There are bugs in ffmpeg that cause various issues when used alongside yt-dlp. Since ffmpeg is such an important dependency, we provide [custom builds](https://github.com/yt-dlp/FFmpeg-Builds#ffmpeg-static-auto-builds) with patches for some of these issues at [yt-dlp/FFmpeg-Builds](https://github.com/yt-dlp/FFmpeg-Builds). See [the readme](https://github.com/yt-dlp/FFmpeg-Builds#patches-applied) for details on the specific issues solved by these builds
**Important**: What you need is ffmpeg *binary*, **NOT** [the python package of the same name](https://pypi.org/project/ffmpeg)

View File

@ -2340,6 +2340,58 @@ Line 1
self.assertEqual(traverse_obj(mobj, lambda k, _: k in (0, 'group')), ['0123', '3'],
msg='function on a `re.Match` should give group name as well')
# Test xml.etree.ElementTree.Element as input obj
etree = xml.etree.ElementTree.fromstring('''<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>''')
self.assertEqual(traverse_obj(etree, ''), etree,
msg='empty str key should return the element itself')
self.assertEqual(traverse_obj(etree, 'country'), list(etree),
msg='str key should lead all children with that tag name')
self.assertEqual(traverse_obj(etree, ...), list(etree),
msg='`...` as key should return all children')
self.assertEqual(traverse_obj(etree, lambda _, x: x[0].text == '4'), [etree[1]],
msg='function as key should get element as value')
self.assertEqual(traverse_obj(etree, lambda i, _: i == 1), [etree[1]],
msg='function as key should get index as key')
self.assertEqual(traverse_obj(etree, 0), etree[0],
msg='int key should return the nth child')
self.assertEqual(traverse_obj(etree, './/neighbor/@name'),
['Austria', 'Switzerland', 'Malaysia', 'Costa Rica', 'Colombia'],
msg='`@<attribute>` at end of path should give that attribute')
self.assertEqual(traverse_obj(etree, '//neighbor/@fail'), [None, None, None, None, None],
msg='`@<nonexistant>` at end of path should give `None`')
self.assertEqual(traverse_obj(etree, ('//neighbor/@', 2)), {'name': 'Malaysia', 'direction': 'N'},
msg='`@` should give the full attribute dict')
self.assertEqual(traverse_obj(etree, '//year/text()'), ['2008', '2011', '2011'],
msg='`text()` at end of path should give the inner text')
self.assertEqual(traverse_obj(etree, '//*[@direction]/@direction'), ['E', 'W', 'N', 'W', 'E'],
msg='full python xpath features should be supported')
self.assertEqual(traverse_obj(etree, (0, '@name')), 'Liechtenstein',
msg='special transformations should act on current element')
self.assertEqual(traverse_obj(etree, ('country', 0, ..., 'text()', {int_or_none})), [1, 2008, 141100],
msg='special transformations should act on current element')
def test_http_header_dict(self):
headers = HTTPHeaderDict()
headers['ytdl-test'] = b'0'

View File

@ -186,7 +186,7 @@ def _firefox_browser_dir():
if sys.platform in ('cygwin', 'win32'):
return os.path.expandvars(R'%APPDATA%\Mozilla\Firefox\Profiles')
elif sys.platform == 'darwin':
return os.path.expanduser('~/Library/Application Support/Firefox')
return os.path.expanduser('~/Library/Application Support/Firefox/Profiles')
return os.path.expanduser('~/.mozilla/firefox')

View File

@ -345,6 +345,10 @@ from .chingari import (
ChingariIE,
ChingariUserIE,
)
from .chzzk import (
CHZZKLiveIE,
CHZZKVideoIE,
)
from .cinemax import CinemaxIE
from .cinetecamilano import CinetecaMilanoIE
from .cineverse import (
@ -540,6 +544,7 @@ from .egghead import (
from .eighttracks import EightTracksIE
from .einthusan import EinthusanIE
from .eitb import EitbIE
from .elementorembed import ElementorEmbedIE
from .elonet import ElonetIE
from .elpais import ElPaisIE
from .eltrecetv import ElTreceTVIE
@ -787,6 +792,7 @@ from .iheart import (
IHeartRadioIE,
IHeartRadioPodcastIE,
)
from .ilpost import IlPostIE
from .iltalehti import IltalehtiIE
from .imdb import (
ImdbIE,
@ -996,7 +1002,7 @@ from .lynda import (
)
from .maariv import MaarivIE
from .magellantv import MagellanTVIE
from .magentamusik360 import MagentaMusik360IE
from .magentamusik import MagentaMusikIE
from .mailru import (
MailRuIE,
MailRuMusicIE,
@ -2019,7 +2025,6 @@ from .tunein import (
TuneInPodcastEpisodeIE,
TuneInShortenerIE,
)
from .turbo import TurboIE
from .tv2 import (
TV2IE,
TV2ArticleIE,
@ -2223,6 +2228,7 @@ from .viki import (
VikiIE,
VikiChannelIE,
)
from .viously import ViouslyIE
from .viqeo import ViqeoIE
from .viu import (
ViuIE,

View File

@ -138,11 +138,15 @@ class AbemaTVBaseIE(InfoExtractor):
if self._USERTOKEN:
return self._USERTOKEN
add_opener(self._downloader, AbemaLicenseHandler(self))
username, _ = self._get_login_info()
AbemaTVBaseIE._USERTOKEN = username and self.cache.load(self._NETRC_MACHINE, username)
auth_cache = username and self.cache.load(self._NETRC_MACHINE, username, min_ver='2024.01.19')
AbemaTVBaseIE._USERTOKEN = auth_cache and auth_cache.get('usertoken')
if AbemaTVBaseIE._USERTOKEN:
# try authentication with locally stored token
try:
AbemaTVBaseIE._DEVICE_ID = auth_cache.get('device_id')
self._get_media_token(True)
return
except ExtractorError as e:
@ -161,7 +165,6 @@ class AbemaTVBaseIE(InfoExtractor):
})
AbemaTVBaseIE._USERTOKEN = user_data['token']
add_opener(self._downloader, AbemaLicenseHandler(self))
return self._USERTOKEN
def _get_media_token(self, invalidate=False, to_show=True):
@ -185,7 +188,7 @@ class AbemaTVBaseIE(InfoExtractor):
def _perform_login(self, username, password):
self._get_device_token()
if self.cache.load(self._NETRC_MACHINE, username) and self._get_media_token():
if self.cache.load(self._NETRC_MACHINE, username, min_ver='2024.01.19') and self._get_media_token():
self.write_debug('Skipping logging in')
return
@ -208,7 +211,11 @@ class AbemaTVBaseIE(InfoExtractor):
AbemaTVBaseIE._USERTOKEN = login_response['token']
self._get_media_token(True)
self.cache.store(self._NETRC_MACHINE, username, AbemaTVBaseIE._USERTOKEN)
auth_cache = {
'device_id': AbemaTVBaseIE._DEVICE_ID,
'usertoken': AbemaTVBaseIE._USERTOKEN,
}
self.cache.store(self._NETRC_MACHINE, username, auth_cache)
def _call_api(self, endpoint, video_id, query=None, note='Downloading JSON metadata'):
return self._download_json(

View File

@ -93,7 +93,7 @@ class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'],
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
theplatform_metadata['ratings'][0]['rating'])
traverse_obj(theplatform_metadata, ('ratings', 0, 'rating')))
auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
info.update(self._extract_aen_smil(media_url, video_id, auth))

View File

@ -4,6 +4,7 @@ from functools import partial
from .common import InfoExtractor
from ..utils import (
OnDemandPagedList,
bug_reports_message,
determine_ext,
int_or_none,
join_nonempty,
@ -233,7 +234,7 @@ class ARDBetaMediathekIE(InfoExtractor):
(?:(?:beta|www)\.)?ardmediathek\.de/
(?:[^/]+/)?
(?:player|live|video)/
(?:(?P<display_id>[^?#]+)/)?
(?:[^?#]+/)?
(?P<id>[a-zA-Z0-9]+)
/?(?:[?#]|$)'''
_GEO_COUNTRIES = ['DE']
@ -242,8 +243,8 @@ class ARDBetaMediathekIE(InfoExtractor):
'url': 'https://www.ardmediathek.de/video/filme-im-mdr/liebe-auf-vier-pfoten/mdr-fernsehen/Y3JpZDovL21kci5kZS9zZW5kdW5nLzI4MjA0MC80MjIwOTEtNDAyNTM0',
'md5': 'b6e8ab03f2bcc6e1f9e6cef25fcc03c4',
'info_dict': {
'display_id': 'filme-im-mdr/liebe-auf-vier-pfoten/mdr-fernsehen',
'id': 'Y3JpZDovL21kci5kZS9zZW5kdW5nLzI4MjA0MC80MjIwOTEtNDAyNTM0',
'display_id': 'Y3JpZDovL21kci5kZS9zZW5kdW5nLzI4MjA0MC80MjIwOTEtNDAyNTM0',
'id': '12939099',
'title': 'Liebe auf vier Pfoten',
'description': r're:^Claudia Schmitt, Anwältin in Salzburg',
'duration': 5222,
@ -255,7 +256,7 @@ class ARDBetaMediathekIE(InfoExtractor):
'series': 'Filme im MDR',
'age_limit': 0,
'channel': 'MDR',
'_old_archive_ids': ['ardbetamediathek 12939099'],
'_old_archive_ids': ['ardbetamediathek Y3JpZDovL21kci5kZS9zZW5kdW5nLzI4MjA0MC80MjIwOTEtNDAyNTM0'],
},
}, {
'url': 'https://www.ardmediathek.de/mdr/video/die-robuste-roswita/Y3JpZDovL21kci5kZS9iZWl0cmFnL2Ntcy84MWMxN2MzZC0wMjkxLTRmMzUtODk4ZS0wYzhlOWQxODE2NGI/',
@ -276,37 +277,37 @@ class ARDBetaMediathekIE(InfoExtractor):
'url': 'https://www.ardmediathek.de/video/tagesschau-oder-tagesschau-20-00-uhr/das-erste/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhZ2Vzc2NoYXUvZmM4ZDUxMjgtOTE0ZC00Y2MzLTgzNzAtNDZkNGNiZWJkOTll',
'md5': '1e73ded21cb79bac065117e80c81dc88',
'info_dict': {
'id': 'Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhZ2Vzc2NoYXUvZmM4ZDUxMjgtOTE0ZC00Y2MzLTgzNzAtNDZkNGNiZWJkOTll',
'id': '10049223',
'ext': 'mp4',
'title': 'tagesschau, 20:00 Uhr',
'timestamp': 1636398000,
'description': 'md5:39578c7b96c9fe50afdf5674ad985e6b',
'upload_date': '20211108',
'display_id': 'tagesschau-oder-tagesschau-20-00-uhr/das-erste',
'display_id': 'Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhZ2Vzc2NoYXUvZmM4ZDUxMjgtOTE0ZC00Y2MzLTgzNzAtNDZkNGNiZWJkOTll',
'duration': 915,
'episode': 'tagesschau, 20:00 Uhr',
'series': 'tagesschau',
'thumbnail': 'https://api.ardmediathek.de/image-service/images/urn:ard:image:fbb21142783b0a49?w=960&ch=ee69108ae344f678',
'channel': 'ARD-Aktuell',
'_old_archive_ids': ['ardbetamediathek 10049223'],
'_old_archive_ids': ['ardbetamediathek Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhZ2Vzc2NoYXUvZmM4ZDUxMjgtOTE0ZC00Y2MzLTgzNzAtNDZkNGNiZWJkOTll'],
},
}, {
'url': 'https://www.ardmediathek.de/video/7-tage/7-tage-unter-harten-jungs/hr-fernsehen/N2I2YmM5MzgtNWFlOS00ZGFlLTg2NzMtYzNjM2JlNjk4MDg3',
'md5': 'c428b9effff18ff624d4f903bda26315',
'info_dict': {
'id': 'N2I2YmM5MzgtNWFlOS00ZGFlLTg2NzMtYzNjM2JlNjk4MDg3',
'id': '94834686',
'ext': 'mp4',
'duration': 2700,
'episode': '7 Tage ... unter harten Jungs',
'description': 'md5:0f215470dcd2b02f59f4bd10c963f072',
'upload_date': '20231005',
'timestamp': 1696491171,
'display_id': '7-tage/7-tage-unter-harten-jungs/hr-fernsehen',
'display_id': 'N2I2YmM5MzgtNWFlOS00ZGFlLTg2NzMtYzNjM2JlNjk4MDg3',
'series': '7 Tage ...',
'channel': 'HR',
'thumbnail': 'https://api.ardmediathek.de/image-service/images/urn:ard:image:f6e6d5ffac41925c?w=960&ch=fa32ba69bc87989a',
'title': '7 Tage ... unter harten Jungs',
'_old_archive_ids': ['ardbetamediathek 94834686'],
'_old_archive_ids': ['ardbetamediathek N2I2YmM5MzgtNWFlOS00ZGFlLTg2NzMtYzNjM2JlNjk4MDg3'],
},
}, {
'url': 'https://beta.ardmediathek.de/ard/video/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
@ -357,14 +358,25 @@ class ARDBetaMediathekIE(InfoExtractor):
}), get_all=False)
def _real_extract(self, url):
video_id, display_id = self._match_valid_url(url).group('id', 'display_id')
display_id = self._match_id(url)
page_data = self._download_json(
f'https://api.ardmediathek.de/page-gateway/pages/ard/item/{video_id}', video_id, query={
f'https://api.ardmediathek.de/page-gateway/pages/ard/item/{display_id}', display_id, query={
'embedded': 'false',
'mcV6': 'true',
})
# For user convenience we use the old contentId instead of the longer crid
# Ref: https://github.com/yt-dlp/yt-dlp/issues/8731#issuecomment-1874398283
old_id = traverse_obj(page_data, ('tracking', 'atiCustomVars', 'contentId', {int}))
if old_id is not None:
video_id = str(old_id)
archive_ids = [make_archive_id(ARDBetaMediathekIE, display_id)]
else:
self.report_warning(f'Could not extract contentId{bug_reports_message()}')
video_id = display_id
archive_ids = None
player_data = traverse_obj(
page_data, ('widgets', lambda _, v: v['type'] in ('player_ondemand', 'player_live'), {dict}), get_all=False)
is_live = player_data.get('type') == 'player_live'
@ -419,8 +431,6 @@ class ARDBetaMediathekIE(InfoExtractor):
})
age_limit = traverse_obj(page_data, ('fskRating', {lambda x: remove_start(x, 'FSK')}, {int_or_none}))
old_id = traverse_obj(page_data, ('tracking', 'atiCustomVars', 'contentId'))
return {
'id': video_id,
'display_id': display_id,
@ -438,7 +448,7 @@ class ARDBetaMediathekIE(InfoExtractor):
'channel': 'clipSourceName',
})),
**self._extract_episode_info(page_data.get('title')),
'_old_archive_ids': [make_archive_id(ARDBetaMediathekIE, old_id)],
'_old_archive_ids': archive_ids,
}

View File

@ -70,7 +70,24 @@ class ArteTVIE(ArteTVBaseIE):
'thumbnail': 'https://api-cdn.arte.tv/img/v2/image/q82dTTfyuCXupPsGxXsd7B/940x530',
'upload_date': '20230930',
'ext': 'mp4',
}
},
}, {
'url': 'https://www.arte.tv/de/videos/085374-003-A/im-hohen-norden-geboren/',
'info_dict': {
'id': '085374-003-A',
'ext': 'mp4',
'description': 'md5:ab79ec7cc472a93164415b4e4916abf9',
'timestamp': 1702872000,
'thumbnail': 'https://api-cdn.arte.tv/img/v2/image/TnyHBfPxv3v2GEY3suXGZP/940x530',
'duration': 2594,
'title': 'Die kurze Zeit der Jugend',
'alt_title': 'Im hohen Norden geboren',
'upload_date': '20231218',
'subtitles': {
'fr': 'mincount:1',
'fr-acc': 'mincount:1',
},
},
}]
_GEO_BYPASS = True
@ -121,6 +138,16 @@ class ArteTVIE(ArteTVBaseIE):
),
}
@staticmethod
def _fix_accessible_subs_locale(subs):
updated_subs = {}
for lang, sub_formats in subs.items():
for format in sub_formats:
if format.get('url', '').endswith('-MAL.m3u8'):
lang += '-acc'
updated_subs.setdefault(lang, []).append(format)
return updated_subs
def _real_extract(self, url):
mobj = self._match_valid_url(url)
video_id = mobj.group('id')
@ -174,6 +201,7 @@ class ArteTVIE(ArteTVBaseIE):
secondary_formats.extend(fmts)
else:
formats.extend(fmts)
subs = self._fix_accessible_subs_locale(subs)
self._merge_subtitles(subs, target=subtitles)
elif stream['protocol'] in ('HTTPS', 'RTMP'):

View File

@ -18,6 +18,7 @@ from ..utils import (
OnDemandPagedList,
bool_or_none,
clean_html,
determine_ext,
filter_dict,
float_or_none,
format_field,
@ -1658,19 +1659,34 @@ class BiliIntlBaseIE(InfoExtractor):
'aid': aid,
})) or {}
subtitles = {}
for sub in sub_json.get('subtitles') or []:
sub_url = sub.get('url')
if not sub_url:
continue
sub_data = self._download_json(
sub_url, ep_id or aid, errnote='Unable to download subtitles', fatal=False,
note='Downloading subtitles%s' % f' for {sub["lang"]}' if sub.get('lang') else '')
if not sub_data:
continue
subtitles.setdefault(sub.get('lang_key', 'en'), []).append({
'ext': 'srt',
'data': self.json2srt(sub_data)
})
fetched_urls = set()
for sub in traverse_obj(sub_json, (('subtitles', 'video_subtitle'), ..., {dict})):
for url in traverse_obj(sub, ((None, 'ass', 'srt'), 'url', {url_or_none})):
if url in fetched_urls:
continue
fetched_urls.add(url)
sub_ext = determine_ext(url)
sub_lang = sub.get('lang_key') or 'en'
if sub_ext == 'ass':
subtitles.setdefault(sub_lang, []).append({
'ext': 'ass',
'url': url,
})
elif sub_ext == 'json':
sub_data = self._download_json(
url, ep_id or aid, fatal=False,
note=f'Downloading subtitles{format_field(sub, "lang", " for %s")} ({sub_lang})',
errnote='Unable to download subtitles')
if sub_data:
subtitles.setdefault(sub_lang, []).append({
'ext': 'srt',
'data': self.json2srt(sub_data),
})
else:
self.report_warning('Unexpected subtitle extension', ep_id or aid)
return subtitles
def _get_formats(self, *, ep_id=None, aid=None):

139
yt_dlp/extractor/chzzk.py Normal file
View File

@ -0,0 +1,139 @@
import functools
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
parse_iso8601,
url_or_none,
)
from ..utils.traversal import traverse_obj
class CHZZKLiveIE(InfoExtractor):
IE_NAME = 'chzzk:live'
_VALID_URL = r'https?://chzzk\.naver\.com/live/(?P<id>[\da-f]+)'
_TESTS = [{
'url': 'https://chzzk.naver.com/live/c68b8ef525fb3d2fa146344d84991753',
'info_dict': {
'id': 'c68b8ef525fb3d2fa146344d84991753',
'ext': 'mp4',
'title': str,
'channel': '진짜도현',
'channel_id': 'c68b8ef525fb3d2fa146344d84991753',
'channel_is_verified': False,
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1705510344,
'upload_date': '20240117',
'live_status': 'is_live',
'view_count': int,
'concurrent_view_count': int,
},
'skip': 'The channel is not currently live',
}]
def _real_extract(self, url):
channel_id = self._match_id(url)
live_detail = self._download_json(
f'https://api.chzzk.naver.com/service/v2/channels/{channel_id}/live-detail', channel_id,
note='Downloading channel info', errnote='Unable to download channel info')['content']
if live_detail.get('status') == 'CLOSE':
raise ExtractorError('The channel is not currently live', expected=True)
live_playback = self._parse_json(live_detail['livePlaybackJson'], channel_id)
thumbnails = []
thumbnail_template = traverse_obj(
live_playback, ('thumbnail', 'snapshotThumbnailTemplate', {url_or_none}))
if thumbnail_template and '{type}' in thumbnail_template:
for width in traverse_obj(live_playback, ('thumbnail', 'types', ..., {str})):
thumbnails.append({
'id': width,
'url': thumbnail_template.replace('{type}', width),
'width': int_or_none(width),
})
formats, subtitles = [], {}
for media in traverse_obj(live_playback, ('media', lambda _, v: url_or_none(v['path']))):
is_low_latency = media.get('mediaId') == 'LLHLS'
fmts, subs = self._extract_m3u8_formats_and_subtitles(
media['path'], channel_id, 'mp4', fatal=False, live=True,
m3u8_id='hls-ll' if is_low_latency else 'hls')
for f in fmts:
if is_low_latency:
f['source_preference'] = -2
if '-afragalow.stream-audio.stream' in f['format_id']:
f['quality'] = -2
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
return {
'id': channel_id,
'is_live': True,
'formats': formats,
'subtitles': subtitles,
'thumbnails': thumbnails,
**traverse_obj(live_detail, {
'title': ('liveTitle', {str}),
'timestamp': ('openDate', {functools.partial(parse_iso8601, delimiter=' ')}),
'concurrent_view_count': ('concurrentUserCount', {int_or_none}),
'view_count': ('accumulateCount', {int_or_none}),
'channel': ('channel', 'channelName', {str}),
'channel_id': ('channel', 'channelId', {str}),
'channel_is_verified': ('channel', 'verifiedMark', {bool}),
}),
}
class CHZZKVideoIE(InfoExtractor):
IE_NAME = 'chzzk:video'
_VALID_URL = r'https?://chzzk\.naver\.com/video/(?P<id>\d+)'
_TESTS = [{
'url': 'https://chzzk.naver.com/video/1754',
'md5': 'b0c0c1bb888d913b93d702b1512c7f06',
'info_dict': {
'id': '1754',
'ext': 'mp4',
'title': '치지직 테스트 방송',
'channel': '침착맨',
'channel_id': 'bb382c2c0cc9fa7c86ab3b037fb5799c',
'channel_is_verified': False,
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 15577,
'timestamp': 1702970505.417,
'upload_date': '20231219',
'view_count': int,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_meta = self._download_json(
f'https://api.chzzk.naver.com/service/v2/videos/{video_id}', video_id,
note='Downloading video info', errnote='Unable to download video info')['content']
formats, subtitles = self._extract_mpd_formats_and_subtitles(
f'https://apis.naver.com/neonplayer/vodplay/v1/playback/{video_meta["videoId"]}', video_id,
query={
'key': video_meta['inKey'],
'env': 'real',
'lc': 'en_US',
'cpl': 'en_US',
}, note='Downloading video playback', errnote='Unable to download video playback')
return {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
**traverse_obj(video_meta, {
'title': ('videoTitle', {str}),
'thumbnail': ('thumbnailImageUrl', {url_or_none}),
'timestamp': ('publishDateAt', {functools.partial(float_or_none, scale=1000)}),
'view_count': ('readCount', {int_or_none}),
'duration': ('duration', {int_or_none}),
'channel': ('channel', 'channelName', {str}),
'channel_id': ('channel', 'channelId', {str}),
'channel_is_verified': ('channel', 'verifiedMark', {bool}),
}),
}

View File

@ -46,15 +46,18 @@ class CloudflareStreamIE(InfoExtractor):
video_id.split('.')[1] + '==='), video_id)['sub']
manifest_base_url = base_url + 'manifest/video.'
formats = self._extract_m3u8_formats(
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
manifest_base_url + 'm3u8', video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
formats.extend(self._extract_mpd_formats(
manifest_base_url + 'mpd', video_id, mpd_id='dash', fatal=False))
fmts, subs = self._extract_mpd_formats_and_subtitles(
manifest_base_url + 'mpd', video_id, mpd_id='dash', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
return {
'id': video_id,
'title': video_id,
'thumbnail': base_url + 'thumbnails/thumbnail.jpg',
'formats': formats,
'subtitles': subtitles,
}

View File

@ -0,0 +1,72 @@
import re
from .common import InfoExtractor
from .vimeo import VimeoIE
from .youtube import YoutubeIE
from ..utils import unescapeHTML, url_or_none
from ..utils.traversal import traverse_obj
class ElementorEmbedIE(InfoExtractor):
_VALID_URL = False
_WEBPAGE_TESTS = [{
'url': 'https://capitaltv.cy/2023/12/14/υγεια-και-ζωη-14-12-2023-δρ-ξενια-κωσταντινιδο/',
'info_dict': {
'id': 'KgzuxwuQwM4',
'ext': 'mp4',
'title': 'ΥΓΕΙΑ ΚΑΙ ΖΩΗ 14 12 2023 ΔΡ ΞΕΝΙΑ ΚΩΣΤΑΝΤΙΝΙΔΟΥ',
'thumbnail': 'https://i.ytimg.com/vi/KgzuxwuQwM4/maxresdefault.jpg',
'playable_in_embed': True,
'tags': 'count:16',
'like_count': int,
'channel': 'Capital TV Cyprus',
'channel_id': 'UCR8LwVKTLGEXt4ZAErpCMrg',
'availability': 'public',
'description': 'md5:7a3308a22881aea4612358c4ba121f77',
'duration': 2891,
'upload_date': '20231214',
'uploader_id': '@capitaltvcyprus6389',
'live_status': 'not_live',
'channel_url': 'https://www.youtube.com/channel/UCR8LwVKTLGEXt4ZAErpCMrg',
'uploader_url': 'https://www.youtube.com/@capitaltvcyprus6389',
'uploader': 'Capital TV Cyprus',
'age_limit': 0,
'categories': ['News & Politics'],
'view_count': int,
'channel_follower_count': int,
},
}, {
'url': 'https://elementor.com/academy/theme-builder-collection/?playlist=76011151&video=9e59909',
'info_dict': {
'id': '?playlist=76011151&video=9e59909',
'title': 'Theme Builder Collection - Academy',
'age_limit': 0,
'timestamp': 1702196984.0,
'upload_date': '20231210',
'description': 'md5:7f52c52715ee9e54fd7f82210511673d',
'thumbnail': 'https://elementor.com/academy/wp-content/uploads/2021/07/Theme-Builder-1.png',
},
'playlist_count': 11,
'params': {
'skip_download': True,
},
}]
_WIDGET_REGEX = r'<div[^>]+class="[^"]*elementor-widget-video(?:-playlist)?[^"]*"[^>]*data-settings="([^"]*)"'
def _extract_from_webpage(self, url, webpage):
for data_settings in re.findall(self._WIDGET_REGEX, webpage):
data = self._parse_json(data_settings, None, fatal=False, transform_source=unescapeHTML)
if youtube_url := traverse_obj(data, ('youtube_url', {url_or_none})):
yield self.url_result(youtube_url, ie=YoutubeIE)
for video in traverse_obj(data, ('tabs', lambda _, v: v['_id'], {dict})):
if youtube_url := traverse_obj(video, ('youtube_url', {url_or_none})):
yield self.url_result(youtube_url, ie=YoutubeIE)
if vimeo_url := traverse_obj(video, ('vimeo_url', {url_or_none})):
yield self.url_result(vimeo_url, ie=VimeoIE)
for direct_url in traverse_obj(video, (('hosted_url', 'external_url'), 'url', {url_or_none})):
yield {
'id': video['_id'],
'url': direct_url,
'title': video.get('title'),
}

View File

@ -57,7 +57,7 @@ class FacebookIE(InfoExtractor):
)|
facebook:
)
(?P<id>[0-9]+)
(?P<id>pfbid[A-Za-z0-9]+|\d+)
'''
_EMBED_REGEX = [
r'<iframe[^>]+?src=(["\'])(?P<url>https?://www\.facebook\.com/(?:video/embed|plugins/video\.php).+?)\1',
@ -247,6 +247,24 @@ class FacebookIE(InfoExtractor):
'thumbnail': r're:^https?://.*',
'duration': 148.435,
},
}, {
'url': 'https://www.facebook.com/attn/posts/pfbid0j1Czf2gGDVqeQ8KiMLFm3pWN8GxsQmeRrVhimWDzMuKQoR8r4b1knNsejELmUgyhl',
'info_dict': {
'id': '6968553779868435',
'ext': 'mp4',
'description': 'md5:2f2fcf93e97ac00244fe64521bbdb0cb',
'uploader': 'ATTN:',
'upload_date': '20231207',
'title': 'ATTN:',
'duration': 132.675,
'uploader_id': '100064451419378',
'view_count': int,
'thumbnail': r're:^https?://.*',
'timestamp': 1701975646,
},
}, {
'url': 'https://www.facebook.com/story.php?story_fbid=pfbid0Fnzhm8UuzjBYpPMNFzaSpFE9UmLdU4fJN8qTANi1Dmtj5q7DNrL5NERXfsAzDEV7l&id=100073071055552',
'only_matching': True,
}, {
'url': 'https://www.facebook.com/video.php?v=10204634152394104',
'only_matching': True,

View File

@ -19,9 +19,9 @@ class GoogleDriveIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:
(?:docs|drive)\.google\.com/
(?:docs|drive|drive\.usercontent)\.google\.com/
(?:
(?:uc|open)\?.*?id=|
(?:uc|open|download)\?.*?id=|
file/d/
)|
video\.google\.com/get_player\?.*?docid=
@ -53,6 +53,9 @@ class GoogleDriveIE(InfoExtractor):
}, {
'url': 'https://drive.google.com/uc?id=0B2fjwgkl1A_CX083Tkowdmt6d28',
'only_matching': True,
}, {
'url': 'https://drive.usercontent.google.com/download?id=0ByeS4oOUV-49Zzh4R1J6R09zazQ',
'only_matching': True,
}]
_FORMATS_EXT = {
'5': 'flv',
@ -205,9 +208,10 @@ class GoogleDriveIE(InfoExtractor):
formats.append(f)
source_url = update_url_query(
'https://drive.google.com/uc', {
'https://drive.usercontent.google.com/download', {
'id': video_id,
'export': 'download',
'confirm': 't',
})
def request_source_file(source_url, kind, data=None):

View File

@ -0,0 +1,69 @@
import functools
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
url_or_none,
urlencode_postdata,
)
from ..utils.traversal import traverse_obj
class IlPostIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ilpost\.it/episodes/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.ilpost.it/episodes/1-avis-akvasas-ka/',
'md5': '43649f002d85e1c2f319bb478d479c40',
'info_dict': {
'id': '2972047',
'ext': 'mp3',
'display_id': '1-avis-akvasas-ka',
'title': '1. Avis akvasas ka',
'url': 'https://www.ilpost.it/wp-content/uploads/2023/12/28/1703781217-l-invasione-pt1-v6.mp3',
'timestamp': 1703835014,
'upload_date': '20231229',
'duration': 2495.0,
'availability': 'public',
'series_id': '235598',
'description': '',
}
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
endpoint_metadata = self._search_json(
r'var\s+ilpostpodcast\s*=', webpage, 'metadata', display_id)
episode_id = endpoint_metadata['post_id']
podcast_id = endpoint_metadata['podcast_id']
podcast_metadata = self._download_json(
endpoint_metadata['ajax_url'], display_id, data=urlencode_postdata({
'action': 'checkpodcast',
'cookie': endpoint_metadata['cookie'],
'post_id': episode_id,
'podcast_id': podcast_id,
}))
episode = traverse_obj(podcast_metadata, (
'data', 'postcastList', lambda _, v: str(v['id']) == episode_id, {dict}), get_all=False)
if not episode:
raise ExtractorError('Episode could not be extracted')
return {
'id': episode_id,
'display_id': display_id,
'series_id': podcast_id,
'vcodec': 'none',
**traverse_obj(episode, {
'title': ('title', {str}),
'description': ('description', {str}),
'url': ('podcast_raw_url', {url_or_none}),
'thumbnail': ('image', {url_or_none}),
'timestamp': ('timestamp', {int_or_none}),
'duration': ('milliseconds', {functools.partial(float_or_none, scale=1000)}),
'availability': ('free', {lambda v: 'public' if v else 'subscriber_only'}),
}),
}

View File

@ -0,0 +1,62 @@
from .common import InfoExtractor
from ..utils import ExtractorError, int_or_none, join_nonempty, url_or_none
from ..utils.traversal import traverse_obj
class MagentaMusikIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?magentamusik\.de/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.magentamusik.de/marty-friedman-woa-2023-9208205928595409235',
'md5': 'd82dd4748f55fc91957094546aaf8584',
'info_dict': {
'id': '9208205928595409235',
'display_id': 'marty-friedman-woa-2023-9208205928595409235',
'ext': 'mp4',
'title': 'Marty Friedman: W:O:A 2023',
'alt_title': 'Konzert vom: 05.08.2023 13:00',
'duration': 2760,
'categories': ['Musikkonzert'],
'release_year': 2023,
'location': 'Deutschland',
}
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
player_config = self._search_json(
r'data-js-element="o-video-player__config">', webpage, 'player config', display_id, fatal=False)
if not player_config:
raise ExtractorError('No video found', expected=True)
asset_id = player_config['assetId']
asset_details = self._download_json(
f'https://wcps.t-online.de/cvss/magentamusic/vodclient/v2/assetdetails/58938/{asset_id}',
display_id, note='Downloading asset details')
video_id = traverse_obj(
asset_details, ('content', 'partnerInformation', ..., 'reference', {str}), get_all=False)
if not video_id:
raise ExtractorError('Unable to extract video id')
vod_data = self._download_json(
f'https://wcps.t-online.de/cvss/magentamusic/vodclient/v2/player/58935/{video_id}/Main%20Movie', video_id)
smil_url = traverse_obj(
vod_data, ('content', 'feature', 'representations', ...,
'contentPackages', ..., 'media', 'href', {url_or_none}), get_all=False)
return {
'id': video_id,
'display_id': display_id,
'formats': self._extract_smil_formats(smil_url, video_id),
**traverse_obj(vod_data, ('content', 'feature', 'metadata', {
'title': 'title',
'alt_title': 'originalTitle',
'description': 'longDescription',
'duration': ('runtimeInSeconds', {int_or_none}),
'location': ('countriesOfProduction', {list}, {lambda x: join_nonempty(*x, delim=', ')}),
'release_year': ('yearOfProduction', {int_or_none}),
'categories': ('mainGenre', {str}, {lambda x: x and [x]}),
})),
}

View File

@ -1,58 +0,0 @@
from .common import InfoExtractor
class MagentaMusik360IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?magenta-musik-360\.de/([a-z0-9-]+-(?P<id>[0-9]+)|festivals/.+)'
_TESTS = [{
'url': 'https://www.magenta-musik-360.de/within-temptation-wacken-2019-1-9208205928595185932',
'md5': '65b6f060b40d90276ec6fb9b992c1216',
'info_dict': {
'id': '9208205928595185932',
'ext': 'm3u8',
'title': 'WITHIN TEMPTATION',
'description': 'Robert Westerholt und Sharon Janny den Adel gründeten die Symphonic Metal-Band. Privat sind die Niederländer ein Paar und haben zwei Kinder. Die Single Ice Queen brachte ihnen Platin und Gold und verhalf 2002 zum internationalen Durchbruch. Charakteristisch für die Band war Anfangs der hohe Gesang von Frontfrau Sharon. Stilistisch fing die Band im Gothic Metal an. Mit neuem Sound, schnellen Gitarrenriffs und Gitarrensoli, avancierte Within Temptation zur erfolgreichen Rockband. Auch dieses Jahr wird die Band ihre Fangemeinde wieder mitreißen.',
}
}, {
'url': 'https://www.magenta-musik-360.de/festivals/wacken-world-wide-2020-body-count-feat-ice-t',
'md5': '81010d27d7cab3f7da0b0f681b983b7e',
'info_dict': {
'id': '9208205928595231363',
'ext': 'm3u8',
'title': 'Body Count feat. Ice-T',
'description': 'Body Count feat. Ice-T konnten bereits im vergangenen Jahr auf dem „Holy Ground“ in Wacken überzeugen. 2020 gehen die Crossover-Metaller aus einem Club in Los Angeles auf Sendung und bringen mit ihrer Mischung aus Metal und Hip-Hop Abwechslung und ordentlich Alarm zum WWW. Bereits seit 1990 stehen die beiden Gründer Ice-T (Gesang) und Ernie C (Gitarre) auf der Bühne. Sieben Studioalben hat die Gruppe bis jetzt veröffentlicht, darunter das Debüt „Body Count“ (1992) mit dem kontroversen Track „Cop Killer“.',
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
# _match_id casts to string, but since "None" is not a valid video_id for magenta
# there is no risk for confusion
if video_id == "None":
webpage = self._download_webpage(url, video_id)
video_id = self._html_search_regex(r'data-asset-id="([^"]+)"', webpage, 'video_id')
json = self._download_json("https://wcps.t-online.de/cvss/magentamusic/vodplayer/v3/player/58935/%s/Main%%20Movie" % video_id, video_id)
xml_url = json['content']['feature']['representations'][0]['contentPackages'][0]['media']['href']
metadata = json['content']['feature'].get('metadata')
title = None
description = None
duration = None
thumbnails = []
if metadata:
title = metadata.get('title')
description = metadata.get('fullDescription')
duration = metadata.get('runtimeInSeconds')
for img_key in ('teaserImageWide', 'smallCoverImage'):
if img_key in metadata:
thumbnails.append({'url': metadata[img_key].get('href')})
xml = self._download_xml(xml_url, video_id)
final_url = xml[0][0][0].attrib['src']
return {
'id': video_id,
'title': title,
'description': description,
'url': final_url,
'duration': duration,
'thumbnails': thumbnails
}

View File

@ -1,20 +1,25 @@
import base64
import hashlib
import hmac
import itertools
import json
import re
from urllib.parse import urlparse, parse_qs
import time
from urllib.parse import parse_qs, urlparse
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
dict_get,
int_or_none,
join_nonempty,
merge_dicts,
parse_duration,
parse_iso8601,
traverse_obj,
try_get,
unified_timestamp,
update_url_query,
url_or_none,
)
@ -110,6 +115,18 @@ class NaverBaseIE(InfoExtractor):
**self.process_subtitles(video_data, get_subs),
}
def _call_api(self, path, video_id):
api_endpoint = f'https://apis.naver.com/now_web2/now_web_api/v1{path}'
key = b'nbxvs5nwNG9QKEWK0ADjYA4JZoujF4gHcIwvoCxFTPAeamq5eemvt5IWAYXxrbYM'
msgpad = int(time.time() * 1000)
md = base64.b64encode(hmac.HMAC(
key, f'{api_endpoint[:255]}{msgpad}'.encode(), digestmod=hashlib.sha1).digest()).decode()
return self._download_json(api_endpoint, video_id=video_id, headers=self.geo_verification_headers(), query={
'msgpad': msgpad,
'md': md,
})['result']
class NaverIE(NaverBaseIE):
_VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/(?:v|embed)/(?P<id>\d+)'
@ -125,21 +142,32 @@ class NaverIE(NaverBaseIE):
'upload_date': '20130903',
'uploader': '메가스터디, 합격불변의 법칙',
'uploader_id': 'megastudy',
'uploader_url': 'https://tv.naver.com/megastudy',
'view_count': int,
'like_count': int,
'comment_count': int,
'duration': 2118,
'thumbnail': r're:^https?://.*\.jpg',
},
}, {
'url': 'http://tv.naver.com/v/395837',
'md5': '8a38e35354d26a17f73f4e90094febd3',
'md5': '7791205fa89dbed2f5e3eb16d287ff05',
'info_dict': {
'id': '395837',
'ext': 'mp4',
'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
'description': 'md5:eb6aca9d457b922e43860a2a2b1984d3',
'description': 'md5:c76be23e21403a6473d8119678cdb5cb',
'timestamp': 1432030253,
'upload_date': '20150519',
'uploader': '4가지쇼 시즌2',
'uploader_id': 'wrappinguser29',
'uploader': '4가지쇼',
'uploader_id': '4show',
'uploader_url': 'https://tv.naver.com/4show',
'view_count': int,
'like_count': int,
'comment_count': int,
'duration': 277,
'thumbnail': r're:^https?://.*\.jpg',
},
'skip': 'Georestricted',
}, {
'url': 'http://tvcast.naver.com/v/81652',
'only_matching': True,
@ -147,56 +175,63 @@ class NaverIE(NaverBaseIE):
def _real_extract(self, url):
video_id = self._match_id(url)
content = self._download_json(
'https://tv.naver.com/api/json/v/' + video_id,
video_id, headers=self.geo_verification_headers())
player_info_json = content.get('playerInfoJson') or {}
current_clip = player_info_json.get('currentClip') or {}
data = self._call_api(f'/clips/{video_id}/play-info', video_id)
vid = current_clip.get('videoId')
in_key = current_clip.get('inKey')
vid = traverse_obj(data, ('clip', 'videoId', {str}))
in_key = traverse_obj(data, ('play', 'inKey', {str}))
if not vid or not in_key:
player_auth = try_get(player_info_json, lambda x: x['playerOption']['auth'])
if player_auth == 'notCountry':
self.raise_geo_restricted(countries=['KR'])
elif player_auth == 'notLogin':
self.raise_login_required()
raise ExtractorError('couldn\'t extract vid and key')
raise ExtractorError('Unable to extract video info')
info = self._extract_video_info(video_id, vid, in_key)
info.update({
'description': clean_html(current_clip.get('description')),
'timestamp': int_or_none(current_clip.get('firstExposureTime'), 1000),
'duration': parse_duration(current_clip.get('displayPlayTime')),
'like_count': int_or_none(current_clip.get('recommendPoint')),
'age_limit': 19 if current_clip.get('adult') else None,
})
info.update(traverse_obj(data, ('clip', {
'title': 'title',
'description': 'description',
'timestamp': ('firstExposureDatetime', {parse_iso8601}),
'duration': ('playTime', {int_or_none}),
'like_count': ('likeItCount', {int_or_none}),
'view_count': ('playCount', {int_or_none}),
'comment_count': ('commentCount', {int_or_none}),
'thumbnail': ('thumbnailImageUrl', {url_or_none}),
'uploader': 'channelName',
'uploader_id': 'channelId',
'uploader_url': ('channelUrl', {url_or_none}),
'age_limit': ('adultVideo', {lambda x: 19 if x else None}),
})))
return info
class NaverLiveIE(InfoExtractor):
class NaverLiveIE(NaverBaseIE):
IE_NAME = 'Naver:live'
_VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/l/(?P<id>\d+)'
_GEO_BYPASS = False
_TESTS = [{
'url': 'https://tv.naver.com/l/52010',
'url': 'https://tv.naver.com/l/127062',
'info_dict': {
'id': '52010',
'id': '127062',
'ext': 'mp4',
'title': '[LIVE] 뉴스특보 : "수도권 거리두기, 2주간 2단계로 조정"',
'description': 'md5:df7f0c237a5ed5e786ce5c91efbeaab3',
'channel_id': 'NTV-ytnnews24-0',
'start_time': 1597026780000,
'live_status': 'is_live',
'channel': '뉴스는 YTN',
'channel_id': 'ytnnews24',
'title': 're:^대한민국 24시간 뉴스 채널 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:f938b5956711beab6f882314ffadf4d5',
'start_time': 1677752280,
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)',
'like_count': int,
},
}, {
'url': 'https://tv.naver.com/l/51549',
'url': 'https://tv.naver.com/l/140535',
'info_dict': {
'id': '51549',
'id': '140535',
'ext': 'mp4',
'title': '연합뉴스TV - 코로나19 뉴스특보',
'description': 'md5:c655e82091bc21e413f549c0eaccc481',
'channel_id': 'NTV-yonhapnewstv-0',
'start_time': 1596406380000,
'live_status': 'is_live',
'channel': 'KBS뉴스',
'channel_id': 'kbsnews',
'start_time': 1696867320,
'title': 're:^언제 어디서나! KBS 뉴스 24 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:6ad419c0bf2f332829bda3f79c295284',
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)',
'like_count': int,
},
}, {
'url': 'https://tv.naver.com/l/54887',
@ -205,55 +240,27 @@ class NaverLiveIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
page = self._download_webpage(url, video_id, 'Downloading Page', 'Unable to download Page')
secure_url = self._search_regex(r'sApiF:\s+(?:"|\')([^"\']+)', page, 'secureurl')
info = self._extract_video_info(video_id, secure_url)
info.update({
'description': self._og_search_description(page)
})
return info
def _extract_video_info(self, video_id, url):
video_data = self._download_json(url, video_id, headers=self.geo_verification_headers())
meta = video_data.get('meta')
status = meta.get('status')
data = self._call_api(f'/live-end/normal/{video_id}/play-info?renewLastPlayDate=true', video_id)
status = traverse_obj(data, ('live', 'liveStatus'))
if status == 'CLOSED':
raise ExtractorError('Stream is offline.', expected=True)
elif status != 'OPENED':
raise ExtractorError('Unknown status %s' % status)
title = meta.get('title')
stream_list = video_data.get('streams')
if stream_list is None:
raise ExtractorError('Could not get stream data.', expected=True)
formats = []
for quality in stream_list:
if not quality.get('url'):
continue
prop = quality.get('property')
if prop.get('abr'): # This abr doesn't mean Average audio bitrate.
continue
formats.extend(self._extract_m3u8_formats(
quality.get('url'), video_id, 'mp4',
m3u8_id=quality.get('qualityId'), live=True
))
raise ExtractorError(f'Unknown status {status!r}')
return {
'id': video_id,
'title': title,
'formats': formats,
'channel_id': meta.get('channelId'),
'channel_url': meta.get('channelUrl'),
'thumbnail': meta.get('imgUrl'),
'start_time': meta.get('startTime'),
'categories': [meta.get('categoryId')],
'formats': self._extract_m3u8_formats(
traverse_obj(data, ('playbackBody', {json.loads}, 'media', 0, 'path')), video_id, live=True),
**traverse_obj(data, ('live', {
'title': 'title',
'channel': 'channelName',
'channel_id': 'channelId',
'description': 'description',
'like_count': (('likeCount', 'likeItCount'), {int_or_none}),
'thumbnail': ('thumbnailImageUrl', {url_or_none}),
'start_time': (('startTime', 'startDateTime', 'startYmdt'), {parse_iso8601}),
}), get_all=False),
'is_live': True
}

View File

@ -665,7 +665,7 @@ class NhkRadiruLiveIE(InfoExtractor):
noa_info = self._download_json(
f'https:{config.find(".//url_program_noa").text}'.format(area=data.find('areakey').text),
station, note=f'Downloading {area} station metadata')
station, note=f'Downloading {area} station metadata', fatal=False)
present_info = traverse_obj(noa_info, ('nowonair_list', self._NOA_STATION_IDS.get(station), 'present'))
return {

View File

@ -12,7 +12,7 @@ from ..utils import (
class PiaproIE(InfoExtractor):
_NETRC_MACHINE = 'piapro'
_VALID_URL = r'https?://piapro\.jp/(?:t|content)/(?P<id>\w+)/?'
_VALID_URL = r'https?://piapro\.jp/(?:t|content)/(?P<id>[\w-]+)/?'
_TESTS = [{
'url': 'https://piapro.jp/t/NXYR',
'md5': 'f7c0f760913fb1d44a1c45a4af793909',
@ -49,6 +49,9 @@ class PiaproIE(InfoExtractor):
}, {
'url': 'https://piapro.jp/content/hcw0z3a169wtemz6',
'only_matching': True
}, {
'url': 'https://piapro.jp/t/-SO-',
'only_matching': True
}]
_login_status = False

View File

@ -1,7 +1,20 @@
import re
from ..utils import parse_duration, unescapeHTML
from .common import InfoExtractor
from ..utils import (
clean_html,
extract_attributes,
get_element_by_attribute,
get_element_by_class,
get_element_html_by_class,
get_elements_by_class,
int_or_none,
join_nonempty,
parse_count,
parse_duration,
unescapeHTML,
)
from ..utils.traversal import traverse_obj
class Rule34VideoIE(InfoExtractor):
@ -17,7 +30,16 @@ class Rule34VideoIE(InfoExtractor):
'thumbnail': 'https://rule34video.com/contents/videos_screenshots/3065000/3065157/preview.jpg',
'duration': 347.0,
'age_limit': 18,
'tags': 'count:14'
'view_count': int,
'like_count': int,
'comment_count': int,
'timestamp': 1639872000,
'description': 'https://discord.gg/aBqPrHSHvv',
'upload_date': '20211219',
'uploader': 'Sweet HMV',
'uploader_url': 'https://rule34video.com/members/22119/',
'categories': ['3D', 'MMD', 'iwara'],
'tags': 'mincount:10'
}
},
{
@ -30,7 +52,17 @@ class Rule34VideoIE(InfoExtractor):
'thumbnail': 'https://rule34video.com/contents/videos_screenshots/3065000/3065296/preview.jpg',
'duration': 938.0,
'age_limit': 18,
'tags': 'count:50'
'view_count': int,
'like_count': int,
'comment_count': int,
'timestamp': 1640131200,
'description': '',
'creator': 'WildeerStudio',
'upload_date': '20211222',
'uploader': 'CerZule',
'uploader_url': 'https://rule34video.com/members/36281/',
'categories': ['3D', 'Tomb Raider'],
'tags': 'mincount:40'
}
},
]
@ -49,17 +81,44 @@ class Rule34VideoIE(InfoExtractor):
'quality': quality,
})
title = self._html_extract_title(webpage)
thumbnail = self._html_search_regex(r'preview_url:\s+\'([^\']+)\'', webpage, 'thumbnail', default=None)
duration = self._html_search_regex(r'"icon-clock"></i>\s+<span>((?:\d+:?)+)', webpage, 'duration', default=None)
categories, creator, uploader, uploader_url = [None] * 4
for col in get_elements_by_class('col', webpage):
label = clean_html(get_element_by_class('label', col))
if label == 'Categories:':
categories = list(map(clean_html, get_elements_by_class('item', col)))
elif label == 'Artist:':
creator = join_nonempty(*map(clean_html, get_elements_by_class('item', col)), delim=', ')
elif label == 'Uploaded By:':
uploader = clean_html(get_element_by_class('name', col))
uploader_url = extract_attributes(get_element_html_by_class('name', col) or '').get('href')
return {
**traverse_obj(self._search_json_ld(webpage, video_id, default={}), ({
'title': 'title',
'view_count': 'view_count',
'like_count': 'like_count',
'duration': 'duration',
'timestamp': 'timestamp',
'description': 'description',
'thumbnail': ('thumbnails', 0, 'url'),
})),
'id': video_id,
'formats': formats,
'title': title,
'thumbnail': thumbnail,
'duration': parse_duration(duration),
'title': self._html_extract_title(webpage),
'thumbnail': self._html_search_regex(
r'preview_url:\s+\'([^\']+)\'', webpage, 'thumbnail', default=None),
'duration': parse_duration(self._html_search_regex(
r'"icon-clock"></i>\s+<span>((?:\d+:?)+)', webpage, 'duration', default=None)),
'view_count': int_or_none(self._html_search_regex(
r'"icon-eye"></i>\s+<span>([ \d]+)', webpage, 'views', default='').replace(' ', '')),
'like_count': parse_count(get_element_by_class('voters count', webpage)),
'comment_count': int_or_none(self._search_regex(
r'[^(]+\((\d+)\)', get_element_by_attribute('href', '#tab_comments', webpage), 'comment count', fatal=False)),
'age_limit': 18,
'creator': creator,
'uploader': uploader,
'uploader_url': uploader_url,
'categories': categories,
'tags': list(map(unescapeHTML, re.findall(
r'<a class="tag_item"[^>]+\bhref="https://rule34video\.com/tags/\d+/"[^>]*>(?P<tag>[^>]*)</a>', webpage))),
}

View File

@ -1,64 +0,0 @@
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
qualities,
xpath_text,
)
class TurboIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?turbo\.fr/videos-voiture/(?P<id>[0-9]+)-'
_API_URL = 'http://www.turbo.fr/api/tv/xml.php?player_generique=player_generique&id={0:}'
_TEST = {
'url': 'http://www.turbo.fr/videos-voiture/454443-turbo-du-07-09-2014-renault-twingo-3-bentley-continental-gt-speed-ces-guide-achat-dacia.html',
'md5': '33f4b91099b36b5d5a91f84b5bcba600',
'info_dict': {
'id': '454443',
'ext': 'mp4',
'duration': 3715,
'title': 'Turbo du 07/09/2014 : Renault Twingo 3, Bentley Continental GT Speed, CES, Guide Achat Dacia... ',
'description': 'Turbo du 07/09/2014 : Renault Twingo 3, Bentley Continental GT Speed, CES, Guide Achat Dacia...',
'thumbnail': r're:^https?://.*\.jpg$',
}
}
def _real_extract(self, url):
mobj = self._match_valid_url(url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
playlist = self._download_xml(self._API_URL.format(video_id), video_id)
item = playlist.find('./channel/item')
if item is None:
raise ExtractorError('Playlist item was not found', expected=True)
title = xpath_text(item, './title', 'title')
duration = int_or_none(xpath_text(item, './durate', 'duration'))
thumbnail = xpath_text(item, './visuel_clip', 'thumbnail')
description = self._html_search_meta('description', webpage)
formats = []
get_quality = qualities(['3g', 'sd', 'hq'])
for child in item:
m = re.search(r'url_video_(?P<quality>.+)', child.tag)
if m:
quality = compat_str(m.group('quality'))
formats.append({
'format_id': quality,
'url': child.text,
'quality': get_quality(quality),
})
return {
'id': video_id,
'title': title,
'duration': duration,
'thumbnail': thumbnail,
'description': description,
'formats': formats,
}

View File

@ -8,7 +8,6 @@ from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
)
from ..utils import (
@ -191,6 +190,20 @@ class TwitchBaseIE(InfoExtractor):
'url': thumbnail,
}] if thumbnail else None
def _extract_twitch_m3u8_formats(self, video_id, token, signature):
"""Subclasses must define _M3U8_PATH"""
return self._extract_m3u8_formats(
f'{self._USHER_BASE}/{self._M3U8_PATH}/{video_id}.m3u8', video_id, 'mp4', query={
'allow_source': 'true',
'allow_audio_only': 'true',
'allow_spectre': 'true',
'p': random.randint(1000000, 10000000),
'player': 'twitchweb',
'playlist_include_framerate': 'true',
'sig': signature,
'token': token,
})
class TwitchVodIE(TwitchBaseIE):
IE_NAME = 'twitch:vod'
@ -203,6 +216,7 @@ class TwitchVodIE(TwitchBaseIE):
)
(?P<id>\d+)
'''
_M3U8_PATH = 'vod'
_TESTS = [{
'url': 'http://www.twitch.tv/riotgames/v/6528877?t=5m10s',
@ -532,20 +546,8 @@ class TwitchVodIE(TwitchBaseIE):
info = self._extract_info_gql(video, vod_id)
access_token = self._download_access_token(vod_id, 'video', 'id')
formats = self._extract_m3u8_formats(
'%s/vod/%s.m3u8?%s' % (
self._USHER_BASE, vod_id,
compat_urllib_parse_urlencode({
'allow_source': 'true',
'allow_audio_only': 'true',
'allow_spectre': 'true',
'player': 'twitchweb',
'playlist_include_framerate': 'true',
'nauth': access_token['value'],
'nauthsig': access_token['signature'],
})),
vod_id, 'mp4', entry_protocol='m3u8_native')
formats = self._extract_twitch_m3u8_formats(
vod_id, access_token['value'], access_token['signature'])
formats.extend(self._extract_storyboard(vod_id, video.get('storyboard'), info.get('duration')))
self._prefer_source(formats)
@ -924,6 +926,7 @@ class TwitchStreamIE(TwitchBaseIE):
)
(?P<id>[^/#?]+)
'''
_M3U8_PATH = 'api/channel/hls'
_TESTS = [{
'url': 'http://www.twitch.tv/shroomztv',
@ -1026,23 +1029,10 @@ class TwitchStreamIE(TwitchBaseIE):
access_token = self._download_access_token(
channel_name, 'stream', 'channelName')
token = access_token['value']
stream_id = stream.get('id') or channel_name
query = {
'allow_source': 'true',
'allow_audio_only': 'true',
'allow_spectre': 'true',
'p': random.randint(1000000, 10000000),
'player': 'twitchweb',
'playlist_include_framerate': 'true',
'segment_preference': '4',
'sig': access_token['signature'].encode('utf-8'),
'token': token.encode('utf-8'),
}
formats = self._extract_m3u8_formats(
'%s/api/channel/hls/%s.m3u8' % (self._USHER_BASE, channel_name),
stream_id, 'mp4', query=query)
formats = self._extract_twitch_m3u8_formats(
channel_name, access_token['value'], access_token['signature'])
self._prefer_source(formats)
view_count = stream.get('viewers')

View File

@ -0,0 +1,60 @@
import base64
import re
from .common import InfoExtractor
from ..utils import (
extract_attributes,
int_or_none,
parse_iso8601,
)
from ..utils.traversal import traverse_obj
class ViouslyIE(InfoExtractor):
_VALID_URL = False
_WEBPAGE_TESTS = [{
'url': 'http://www.turbo.fr/videos-voiture/454443-turbo-du-07-09-2014-renault-twingo-3-bentley-continental-gt-speed-ces-guide-achat-dacia.html',
'md5': '37a6c3381599381ff53a7e1e0575c0bc',
'info_dict': {
'id': 'F_xQzS2jwb3',
'ext': 'mp4',
'title': 'Turbo du 07/09/2014\xa0: Renault Twingo 3, Bentley Continental GT Speed, CES, Guide Achat Dacia...',
'description': 'Turbo du 07/09/2014\xa0: Renault Twingo 3, Bentley Continental GT Speed, CES, Guide Achat Dacia...',
'age_limit': 0,
'upload_date': '20230328',
'timestamp': 1680037507,
'duration': 3716,
'categories': ['motors'],
}
}]
def _extract_from_webpage(self, url, webpage):
viously_players = re.findall(r'<div[^>]*class="(?:[^"]*\s)?v(?:iou)?sly-player(?:\s[^"]*)?"[^>]*>', webpage)
if not viously_players:
return
def custom_decode(text):
STANDARD_ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='
CUSTOM_ALPHABET = 'VIOUSLYABCDEFGHJKMNPQRTWXZviouslyabcdefghjkmnpqrtwxz9876543210+/='
data = base64.b64decode(text.translate(str.maketrans(CUSTOM_ALPHABET, STANDARD_ALPHABET)))
return data.decode('utf-8').strip('\x00')
for video_id in traverse_obj(viously_players, (..., {extract_attributes}, 'id')):
formats = self._extract_m3u8_formats(
f'https://www.viously.com/video/hls/{video_id}/index.m3u8', video_id, fatal=False)
if not formats:
continue
data = self._download_json(
f'https://www.viously.com/export/json/{video_id}', video_id,
transform_source=custom_decode, fatal=False)
yield {
'id': video_id,
'formats': formats,
**traverse_obj(data, ('video', {
'title': ('title', {str}),
'description': ('description', {str}),
'duration': ('duration', {int_or_none}),
'timestamp': ('iso_date', {parse_iso8601}),
'categories': ('category', 'name', {str}, {lambda x: [x] if x else None}),
})),
}

View File

@ -3,6 +3,7 @@ import contextlib
import inspect
import itertools
import re
import xml.etree.ElementTree
from ._utils import (
IDENTITY,
@ -118,7 +119,7 @@ def traverse_obj(
branching = True
if isinstance(obj, collections.abc.Mapping):
result = obj.values()
elif is_iterable_like(obj):
elif is_iterable_like(obj) or isinstance(obj, xml.etree.ElementTree.Element):
result = obj
elif isinstance(obj, re.Match):
result = obj.groups()
@ -132,7 +133,7 @@ def traverse_obj(
branching = True
if isinstance(obj, collections.abc.Mapping):
iter_obj = obj.items()
elif is_iterable_like(obj):
elif is_iterable_like(obj) or isinstance(obj, xml.etree.ElementTree.Element):
iter_obj = enumerate(obj)
elif isinstance(obj, re.Match):
iter_obj = itertools.chain(
@ -168,7 +169,7 @@ def traverse_obj(
result = next((v for k, v in obj.groupdict().items() if casefold(k) == key), None)
elif isinstance(key, (int, slice)):
if is_iterable_like(obj, collections.abc.Sequence):
if is_iterable_like(obj, (collections.abc.Sequence, xml.etree.ElementTree.Element)):
branching = isinstance(key, slice)
with contextlib.suppress(IndexError):
result = obj[key]
@ -176,6 +177,34 @@ def traverse_obj(
with contextlib.suppress(IndexError):
result = str(obj)[key]
elif isinstance(obj, xml.etree.ElementTree.Element) and isinstance(key, str):
xpath, _, special = key.rpartition('/')
if not special.startswith('@') and special != 'text()':
xpath = key
special = None
# Allow abbreviations of relative paths, absolute paths error
if xpath.startswith('/'):
xpath = f'.{xpath}'
elif xpath and not xpath.startswith('./'):
xpath = f'./{xpath}'
def apply_specials(element):
if special is None:
return element
if special == '@':
return element.attrib
if special.startswith('@'):
return try_call(element.attrib.get, args=(special[1:],))
if special == 'text()':
return element.text
assert False, f'apply_specials is missing case for {special!r}'
if xpath:
result = list(map(apply_specials, obj.iterfind(xpath)))
else:
result = apply_specials(obj)
return branching, result if branching else (result,)
def lazy_last(iterable):