Compare commits

...

71 Commits

Author SHA1 Message Date
Elyse
9feef0b976
Merge 9dd8574b68 into 0b7ec08816 2024-10-21 17:22:00 -06:00
DarkZeros
0b7ec08816
[ie/telecinco] Fix extractors (#11142)
Closes #10986, Closes #11106
Authored by: DarkZeros, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2024-10-21 21:18:12 +00:00
David Skrundz
40054cb4a7
[ie/gem.cbc.ca] Fix formats extraction (#11196)
Also extracts `timestamp` and `release_timestamp` as seconds instead of milliseconds

Authored by: DavidSkrundz
2024-10-21 18:56:43 +00:00
bashonly
fed53d70bd [ie/youtube] Remove broken android_producer client (#11297)
Authored by: bashonly
2024-10-21 18:39:58 +00:00
bashonly
ec2f4bf082 [ie/youtube] Remove broken age-restriction workaround (#11297)
Closes #11296
Authored by: bashonly
2024-10-21 18:39:58 +00:00
bashonly
9dd8574b68
Merge branch 'master' into yt-live-from-start-range 2024-10-09 11:23:14 -05:00
bashonly
160d973aee
Merge branch 'master' into yt-live-from-start-range 2024-08-17 04:40:45 -05:00
bashonly
c0be43d4d7
Merge branch 'yt-dlp:master' into pr/live-sections 2024-07-24 22:58:09 -05:00
bashonly
4f1af12b70
Merge branch 'master' into pr/live-sections 2024-07-21 17:25:36 -05:00
bashonly
724a6cb2cb
Merge branch 'yt-dlp:master' into pr/live-sections 2024-07-10 19:08:37 -05:00
bashonly
66a6e0a686
Merge branch 'yt-dlp:master' into pr/live-sections 2024-07-08 00:18:09 -05:00
bashonly
6208f7be9c
Merge branch 'master' into yt-live-from-start-range 2024-06-12 01:29:53 -05:00
bashonly
6a84199473
Merge branch 'yt-dlp:master' into pr/live-sections 2024-05-28 13:22:13 -05:00
bashonly
54ad67d785
Merge branch 'yt-dlp:master' into pr/live-sections 2024-05-23 09:48:06 -05:00
bashonly
172dfbeaed
Merge branch 'yt-dlp:master' into pr/live-sections 2024-05-10 13:52:35 -05:00
bashonly
cf96b24de6
Merge branch 'master' into yt-live-from-start-range 2024-04-16 11:01:17 -05:00
bashonly
50c943e8a0
Merge branch 'yt-dlp:master' into pr/yt-live-from-start-range 2024-03-19 15:18:22 -05:00
bashonly
6fc6349ef0
Merge branch 'master' into yt-live-from-start-range 2024-02-29 04:58:30 -06:00
bashonly
5156a16cf9
Merge branch 'master' into yt-live-from-start-range 2024-01-19 17:05:19 -06:00
Elyse
fb2b57a773 Merge remote-tracking branch 'github/yt-live-from-start-range' into yt-live-from-start-range 2023-10-08 01:01:31 -06:00
Elyse
2741b5827d Merge remote-tracking branch 'origin' into yt-live-from-start-range 2023-10-08 00:24:29 -06:00
bashonly
bd730470f2
Cleanup 2023-07-22 13:32:10 -05:00
bashonly
194bc49c55
Merge branch 'yt-dlp:master' into pr/6498 2023-07-22 13:23:54 -05:00
bashonly
1416cee726
Update yt_dlp/options.py 2023-07-22 17:59:48 +00:00
Elyse
622c555356 Fix bug after merge 2023-06-24 14:43:50 -06:00
Elyse
99e6074c5d Merge remote-tracking branch 'origin' into yt-live-from-start-range 2023-06-24 14:30:12 -06:00
Elyse
1f7974690e Merge remote-tracking branch 'origin' into yt-live-from-start-range 2023-06-03 14:39:32 -06:00
Elyse
8ee942a9c8 Add warning about --download-sections without --live-from-start 2023-05-13 13:29:28 -06:00
Elyse
444e02ef3b Merge remote-tracking branch 'origin/master' into yt-live-from-start-range 2023-05-07 00:33:18 -06:00
Elyse
4e93198ae6 Restore README.md
I think this is auto-generated by some script
2023-05-06 23:29:40 -06:00
Elyse
78285eea86 Update options docs 2023-05-06 23:24:58 -06:00
Elyse
7f93eb7a28 Support for epoch timestamps 2023-05-06 23:05:38 -06:00
Elyse
128d30492b Always compute last_seq 2023-04-18 23:17:39 -06:00
Elyse
129555b19a Fix return values of _extract_sequence_from_mpd 2023-03-17 22:39:21 -06:00
Elyse
01f672fe27 Lock less agressively
This gives a speed performance of about 30%
2023-03-17 22:37:31 -06:00
Elyse
2fbe18557b Add some documentation 2023-03-12 01:42:45 -06:00
Elyse
b131f3d1f1 Improve option documentation 2023-03-12 01:37:33 -06:00
Elyse
544836de83 Allow days in parse_duration 2023-03-12 01:37:21 -06:00
pukkandan
6cea8cbe2d
Merge remote-tracking branch 'origin/master' into pr/6498 2023-03-12 11:57:41 +05:30
Elyse
5e4699a623 Fix linter 2023-03-11 20:02:52 -06:00
Elyse
79ae58a5c4 Fix linter 2023-03-11 20:00:34 -06:00
Elyse
3faa1e33ed Add initial documentation 2023-03-11 19:51:14 -06:00
Elyse
fbae888c65 Add debug for selected section 2023-03-11 19:51:14 -06:00
Elyse
cdac7641d6 Remove tz_aware date code 2023-03-11 19:51:14 -06:00
Elyse
a43ba2eff6 Fix unified_timestamp 2023-03-11 19:51:14 -06:00
Elyse
0ed9a73a73 Add fragment count 2023-03-11 19:51:14 -06:00
Elyse
e40132da09 Revert "[utils] Allow using local timezone for 'now' timestamps"
This reverts commit 1799a6ae36.
2023-03-11 19:51:14 -06:00
Elyse
e6e2eb00f1 Support negative durations 2023-03-11 19:51:14 -06:00
pukkandan
9fc70f3f6d [extractor/youtube] Construct fragment list lazily
Building fragment list for all formats take significant time for large videos
2023-03-11 19:51:14 -06:00
pukkandan
5ef1a928a7 [extractor/youtube] Add extractor-arg include_duplicate_formats 2023-03-11 19:51:14 -06:00
Lesmiscore
db62ffdafe [extractor/youtube] Add client name to format_note when -v (#6254)
Authored by: Lesmiscore, pukkandan
2023-03-11 19:51:14 -06:00
vampirefrog
f137666451 [extractor/rokfin] Re-construct manifest url (#6507)
Authored by: vampirefrog
2023-03-11 19:51:14 -06:00
Daniel Vogt
e3ffdf76aa [extractor/opencast] Fix format bug (#6512)
Authored by: C0D3D3V
2023-03-11 19:51:14 -06:00
pukkandan
9f717b69b4 [extractor/hidive] Fix login
Fixes https://github.com/yt-dlp/yt-dlp/issues/6493#issuecomment-1462906556
2023-03-11 19:51:14 -06:00
pukkandan
34d3df72e9 Support loading info.json with a list at it's root 2023-03-11 19:51:14 -06:00
makeworld
96f5d29db0 [extractor/cbc:gem] Update _VALID_URL (#6499)
Authored by: makeworld-the-better-one
Closes #6395
2023-03-11 19:51:13 -06:00
Elyse
c222f6cbfc [extractor/twitch] Fix is_live (#6500)
Closes #6494
Authored by: elyse0
2023-03-11 19:51:13 -06:00
pukkandan
2d1655493f [extractor/youtube] Bypass throttling for -f17
and related cleanup

Thanks @AudricV for the finding
2023-03-11 19:51:13 -06:00
pukkandan
c376b95f95 [downloader/curl] Fix progress reporting
Bug in 8c53322cda
Closes #6490
2023-03-11 19:51:13 -06:00
Daniel Vogt
8df470761e [extractor/opencast] Add ltitools to _VALID_URL (#6371)
Authored by: C0D3D3V
2023-03-11 19:51:13 -06:00
D0LLYNH0
e3b08bac9c [extractor/iq] Set more language codes (#6476)
Authored by: D0LLYNH0
2023-03-11 19:51:13 -06:00
Elyse
932758707f Fix linter 2023-03-09 18:51:10 -06:00
Elyse
317ba03fdf Improve parse_chapters comments 2023-03-09 18:35:20 -06:00
Elyse
e42e25619f Create last_segment_url only if necessary 2023-03-09 18:24:39 -06:00
Elyse
fba1c397b1 [youtube] Support --download-sections for YT Livestream from start 2023-03-09 17:32:19 -06:00
Elyse
b83d7526f2 Add fixme in modified parse_chapters function
A range like '*(now-1hour)-(now-30minutes)' doesn't work
2023-03-09 17:21:02 -06:00
Elyse
fdb9aaf416 Use local timezone for download sections 2023-03-09 17:19:39 -06:00
Elyse
1799a6ae36 [utils] Allow using local timezone for 'now' timestamps 2023-03-09 17:18:44 -06:00
Elyse
367429e238 [common] Extract start and end keys for Dash fragments 2023-03-09 17:17:16 -06:00
Sophire
439be2b4a4 [utils] Add microseconds to unified_timestamp 2023-03-09 12:07:08 -06:00
Elyse
2fbd6de957 [utils] Add hackish 'now' support for --download-sections 2023-03-09 11:30:40 -06:00
11 changed files with 254 additions and 184 deletions

View File

@ -452,10 +452,15 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('2018-03-14T08:32:43.1493874+00:00'), 1521016363) self.assertEqual(unified_timestamp('2018-03-14T08:32:43.1493874+00:00'), 1521016363)
self.assertEqual(unified_timestamp('Sunday, 26 Nov 2006, 19:00'), 1164567600) self.assertEqual(unified_timestamp('Sunday, 26 Nov 2006, 19:00'), 1164567600)
self.assertEqual(unified_timestamp('wed, aug 16, 2008, 12:00pm'), 1218931200) self.assertEqual(unified_timestamp('wed, aug 16, 2008, 12:00pm'), 1218931200)
self.assertEqual(unified_timestamp('2022-10-13T02:37:47.831Z'), 1665628667)
self.assertEqual(unified_timestamp('December 31 1969 20:00:01 EDT'), 1) self.assertEqual(unified_timestamp('December 31 1969 20:00:01 EDT'), 1)
self.assertEqual(unified_timestamp('Wednesday 31 December 1969 18:01:26 MDT'), 86) self.assertEqual(unified_timestamp('Wednesday 31 December 1969 18:01:26 MDT'), 86)
self.assertEqual(unified_timestamp('12/31/1969 20:01:18 EDT', False), 78) self.assertEqual(unified_timestamp('12/31/1969 20:01:18 EDT', False), 78)
self.assertEqual(unified_timestamp('2023-03-09T18:01:33.646Z', with_milliseconds=True), 1678384893.646)
# ISO8601 spec says that if no timezone is specified, we should use local timezone;
# but yt-dlp uses UTC to keep things consistent
self.assertEqual(unified_timestamp('2023-03-11T06:48:34.008'), 1678517314)
def test_determine_ext(self): def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4') self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')

View File

@ -28,7 +28,12 @@ from .cache import Cache
from .compat import urllib # isort: split from .compat import urllib # isort: split
from .compat import compat_os_name, urllib_req_to_req from .compat import compat_os_name, urllib_req_to_req
from .cookies import CookieLoadError, LenientSimpleCookie, load_cookies from .cookies import CookieLoadError, LenientSimpleCookie, load_cookies
from .downloader import FFmpegFD, get_suitable_downloader, shorten_protocol_name from .downloader import (
DashSegmentsFD,
FFmpegFD,
get_suitable_downloader,
shorten_protocol_name,
)
from .downloader.rtmp import rtmpdump_version from .downloader.rtmp import rtmpdump_version
from .extractor import gen_extractor_classes, get_info_extractor from .extractor import gen_extractor_classes, get_info_extractor
from .extractor.common import UnsupportedURLIE from .extractor.common import UnsupportedURLIE
@ -3377,7 +3382,7 @@ class YoutubeDL:
fd, success = None, True fd, success = None, True
if info_dict.get('protocol') or info_dict.get('url'): if info_dict.get('protocol') or info_dict.get('url'):
fd = get_suitable_downloader(info_dict, self.params, to_stdout=temp_filename == '-') fd = get_suitable_downloader(info_dict, self.params, to_stdout=temp_filename == '-')
if fd != FFmpegFD and 'no-direct-merge' not in self.params['compat_opts'] and ( if fd not in [FFmpegFD, DashSegmentsFD] and 'no-direct-merge' not in self.params['compat_opts'] and (
info_dict.get('section_start') or info_dict.get('section_end')): info_dict.get('section_start') or info_dict.get('section_end')):
msg = ('This format cannot be partially downloaded' if FFmpegFD.available() msg = ('This format cannot be partially downloaded' if FFmpegFD.available()
else 'You have requested downloading the video partially, but ffmpeg is not installed') else 'You have requested downloading the video partially, but ffmpeg is not installed')

View File

@ -12,6 +12,7 @@ import itertools
import optparse import optparse
import os import os
import re import re
import time
import traceback import traceback
from .compat import compat_os_name from .compat import compat_os_name
@ -339,12 +340,13 @@ def validate_options(opts):
(?P<end_sign>-?)(?P<end>[^-]+) (?P<end_sign>-?)(?P<end>[^-]+)
)?''' )?'''
current_time = time.time()
chapters, ranges, from_url = [], [], False chapters, ranges, from_url = [], [], False
for regex in value or []: for regex in value or []:
if advanced and regex == '*from-url': if advanced and regex == '*from-url':
from_url = True from_url = True
continue continue
elif not regex.startswith('*'): elif not regex.startswith('*') and not regex.startswith('#'):
try: try:
chapters.append(re.compile(regex)) chapters.append(re.compile(regex))
except re.error as err: except re.error as err:
@ -361,11 +363,16 @@ def validate_options(opts):
err = 'Must be of the form "*start-end"' err = 'Must be of the form "*start-end"'
elif not advanced and any(signs): elif not advanced and any(signs):
err = 'Negative timestamps are not allowed' err = 'Negative timestamps are not allowed'
else: elif regex.startswith('*'):
dur[0] *= -1 if signs[0] else 1 dur[0] *= -1 if signs[0] else 1
dur[1] *= -1 if signs[1] else 1 dur[1] *= -1 if signs[1] else 1
if dur[1] == float('-inf'): if dur[1] == float('-inf'):
err = '"-inf" is not a valid end' err = '"-inf" is not a valid end'
elif regex.startswith('#'):
dur[0] = dur[0] * (-1 if signs[0] else 1) + current_time
dur[1] = dur[1] * (-1 if signs[1] else 1) + current_time
if dur[1] == float('-inf'):
err = '"-inf" is not a valid end'
if err: if err:
raise ValueError(f'invalid {name} time range "{regex}". {err}') raise ValueError(f'invalid {name} time range "{regex}". {err}')
ranges.append(dur) ranges.append(dur)

View File

@ -36,6 +36,8 @@ class DashSegmentsFD(FragmentFD):
'filename': fmt.get('filepath') or filename, 'filename': fmt.get('filepath') or filename,
'live': 'is_from_start' if fmt.get('is_from_start') else fmt.get('is_live'), 'live': 'is_from_start' if fmt.get('is_from_start') else fmt.get('is_live'),
'total_frags': fragment_count, 'total_frags': fragment_count,
'section_start': info_dict.get('section_start'),
'section_end': info_dict.get('section_end'),
} }
if real_downloader: if real_downloader:

View File

@ -4,7 +4,6 @@ import json
import re import re
import time import time
import urllib.parse import urllib.parse
import xml.etree.ElementTree
from .common import InfoExtractor from .common import InfoExtractor
from ..networking import HEADRequest from ..networking import HEADRequest
@ -12,7 +11,6 @@ from ..utils import (
ExtractorError, ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
join_nonempty,
js_to_json, js_to_json,
mimetype2ext, mimetype2ext,
orderedSet, orderedSet,
@ -524,14 +522,13 @@ class CBCGemIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
# This is a normal, public, TV show video # This is a normal, public, TV show video
'url': 'https://gem.cbc.ca/media/schitts-creek/s06e01', 'url': 'https://gem.cbc.ca/media/schitts-creek/s06e01',
'md5': '93dbb31c74a8e45b378cf13bd3f6f11e',
'info_dict': { 'info_dict': {
'id': 'schitts-creek/s06e01', 'id': 'schitts-creek/s06e01',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Smoke Signals', 'title': 'Smoke Signals',
'description': 'md5:929868d20021c924020641769eb3e7f1', 'description': 'md5:929868d20021c924020641769eb3e7f1',
'thumbnail': 'https://images.radio-canada.ca/v1/synps-cbc/episode/perso/cbc_schitts_creek_season_06e01_thumbnail_v01.jpg?im=Resize=(Size)', 'thumbnail': r're:https://images\.radio-canada\.ca/[^#?]+/cbc_schitts_creek_season_06e01_thumbnail_v01\.jpg',
'duration': 1314, 'duration': 1324,
'categories': ['comedy'], 'categories': ['comedy'],
'series': 'Schitt\'s Creek', 'series': 'Schitt\'s Creek',
'season': 'Season 6', 'season': 'Season 6',
@ -539,19 +536,21 @@ class CBCGemIE(InfoExtractor):
'episode': 'Smoke Signals', 'episode': 'Smoke Signals',
'episode_number': 1, 'episode_number': 1,
'episode_id': 'schitts-creek/s06e01', 'episode_id': 'schitts-creek/s06e01',
'upload_date': '20210618',
'timestamp': 1623988800,
'release_date': '20200107',
'release_timestamp': 1578427200,
}, },
'params': {'format': 'bv'}, 'params': {'format': 'bv'},
'skip': 'Geo-restricted to Canada',
}, { }, {
# This video requires an account in the browser, but works fine in yt-dlp # This video requires an account in the browser, but works fine in yt-dlp
'url': 'https://gem.cbc.ca/media/schitts-creek/s01e01', 'url': 'https://gem.cbc.ca/media/schitts-creek/s01e01',
'md5': '297a9600f554f2258aed01514226a697',
'info_dict': { 'info_dict': {
'id': 'schitts-creek/s01e01', 'id': 'schitts-creek/s01e01',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Cup Runneth Over', 'title': 'The Cup Runneth Over',
'description': 'md5:9bca14ea49ab808097530eb05a29e797', 'description': 'md5:9bca14ea49ab808097530eb05a29e797',
'thumbnail': 'https://images.radio-canada.ca/v1/synps-cbc/episode/perso/cbc_schitts_creek_season_01e01_thumbnail_v01.jpg?im=Resize=(Size)', 'thumbnail': r're:https://images\.radio-canada\.ca/[^#?]+/cbc_schitts_creek_season_01e01_thumbnail_v01\.jpg',
'series': 'Schitt\'s Creek', 'series': 'Schitt\'s Creek',
'season_number': 1, 'season_number': 1,
'season': 'Season 1', 'season': 'Season 1',
@ -560,9 +559,12 @@ class CBCGemIE(InfoExtractor):
'episode_id': 'schitts-creek/s01e01', 'episode_id': 'schitts-creek/s01e01',
'duration': 1309, 'duration': 1309,
'categories': ['comedy'], 'categories': ['comedy'],
'upload_date': '20210617',
'timestamp': 1623902400,
'release_date': '20151124',
'release_timestamp': 1448323200,
}, },
'params': {'format': 'bv'}, 'params': {'format': 'bv'},
'skip': 'Geo-restricted to Canada',
}, { }, {
'url': 'https://gem.cbc.ca/nadiyas-family-favourites/s01e01', 'url': 'https://gem.cbc.ca/nadiyas-family-favourites/s01e01',
'only_matching': True, 'only_matching': True,
@ -631,38 +633,6 @@ class CBCGemIE(InfoExtractor):
return return
self._claims_token = self.cache.load(self._NETRC_MACHINE, 'claims_token') self._claims_token = self.cache.load(self._NETRC_MACHINE, 'claims_token')
def _find_secret_formats(self, formats, video_id):
""" Find a valid video url and convert it to the secret variant """
base_format = next((f for f in formats if f.get('vcodec') != 'none'), None)
if not base_format:
return
base_url = re.sub(r'(Manifest\(.*?),filter=[\w-]+(.*?\))', r'\1\2', base_format['url'])
url = re.sub(r'(Manifest\(.*?),format=[\w-]+(.*?\))', r'\1\2', base_url)
secret_xml = self._download_xml(url, video_id, note='Downloading secret XML', fatal=False)
if not isinstance(secret_xml, xml.etree.ElementTree.Element):
return
for child in secret_xml:
if child.attrib.get('Type') != 'video':
continue
for video_quality in child:
bitrate = int_or_none(video_quality.attrib.get('Bitrate'))
if not bitrate or 'Index' not in video_quality.attrib:
continue
height = int_or_none(video_quality.attrib.get('MaxHeight'))
yield {
**base_format,
'format_id': join_nonempty('sec', height),
# Note: \g<1> is necessary instead of \1 since bitrate is a number
'url': re.sub(r'(QualityLevels\()\d+(\))', fr'\g<1>{bitrate}\2', base_url),
'width': int_or_none(video_quality.attrib.get('MaxWidth')),
'tbr': bitrate / 1000.0,
'height': height,
}
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_info = self._download_json( video_info = self._download_json(
@ -676,7 +646,6 @@ class CBCGemIE(InfoExtractor):
else: else:
headers = {} headers = {}
m3u8_info = self._download_json(video_info['playSession']['url'], video_id, headers=headers) m3u8_info = self._download_json(video_info['playSession']['url'], video_id, headers=headers)
m3u8_url = m3u8_info.get('url')
if m3u8_info.get('errorCode') == 1: if m3u8_info.get('errorCode') == 1:
self.raise_geo_restricted(countries=['CA']) self.raise_geo_restricted(countries=['CA'])
@ -685,9 +654,9 @@ class CBCGemIE(InfoExtractor):
elif m3u8_info.get('errorCode') != 0: elif m3u8_info.get('errorCode') != 0:
raise ExtractorError(f'{self.IE_NAME} said: {m3u8_info.get("errorCode")} - {m3u8_info.get("message")}') raise ExtractorError(f'{self.IE_NAME} said: {m3u8_info.get("errorCode")} - {m3u8_info.get("message")}')
formats = self._extract_m3u8_formats(m3u8_url, video_id, m3u8_id='hls') formats = self._extract_m3u8_formats(
m3u8_info['url'], video_id, 'mp4', m3u8_id='hls', query={'manifestType': ''})
self._remove_duplicate_formats(formats) self._remove_duplicate_formats(formats)
formats.extend(self._find_secret_formats(formats, video_id))
for fmt in formats: for fmt in formats:
if fmt.get('vcodec') == 'none': if fmt.get('vcodec') == 'none':
@ -703,20 +672,21 @@ class CBCGemIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': video_info['title'],
'description': video_info.get('description'),
'thumbnail': video_info.get('image'),
'series': video_info.get('series'),
'season_number': video_info.get('season'),
'season': f'Season {video_info.get("season")}',
'episode_number': video_info.get('episode'),
'episode': video_info.get('title'),
'episode_id': video_id, 'episode_id': video_id,
'duration': video_info.get('duration'),
'categories': [video_info.get('category')],
'formats': formats, 'formats': formats,
'release_timestamp': video_info.get('airDate'), **traverse_obj(video_info, {
'timestamp': video_info.get('availableDate'), 'title': ('title', {str}),
'episode': ('title', {str}),
'description': ('description', {str}),
'thumbnail': ('image', {url_or_none}),
'series': ('series', {str}),
'season_number': ('season', {int_or_none}),
'episode_number': ('episode', {int_or_none}),
'duration': ('duration', {int_or_none}),
'categories': ('category', {str}, all),
'release_timestamp': ('airDate', {int_or_none(scale=1000)}),
'timestamp': ('availableDate', {int_or_none(scale=1000)}),
}),
} }

View File

@ -2723,7 +2723,7 @@ class InfoExtractor:
r = int(s.get('r', 0)) r = int(s.get('r', 0))
ms_info['total_number'] += 1 + r ms_info['total_number'] += 1 + r
ms_info['s'].append({ ms_info['s'].append({
't': int(s.get('t', 0)), 't': int_or_none(s.get('t')),
# @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60]) # @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
'd': int(s.attrib['d']), 'd': int(s.attrib['d']),
'r': r, 'r': r,
@ -2765,8 +2765,14 @@ class InfoExtractor:
return ms_info return ms_info
mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration')) mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
availability_start_time = unified_timestamp(
mpd_doc.get('availabilityStartTime'), with_milliseconds=True) or 0
stream_numbers = collections.defaultdict(int) stream_numbers = collections.defaultdict(int)
for period_idx, period in enumerate(mpd_doc.findall(_add_ns('Period'))): for period_idx, period in enumerate(mpd_doc.findall(_add_ns('Period'))):
# segmentIngestTime is completely out of spec, but YT Livestream do this
segment_ingest_time = period.get('{http://youtube.com/yt/2012/10/10}segmentIngestTime')
if segment_ingest_time:
availability_start_time = unified_timestamp(segment_ingest_time, with_milliseconds=True)
period_entry = { period_entry = {
'id': period.get('id', f'period-{period_idx}'), 'id': period.get('id', f'period-{period_idx}'),
'formats': [], 'formats': [],
@ -2945,13 +2951,17 @@ class InfoExtractor:
'Bandwidth': bandwidth, 'Bandwidth': bandwidth,
'Number': segment_number, 'Number': segment_number,
} }
duration = float_or_none(segment_d, representation_ms_info['timescale'])
start = float_or_none(segment_time, representation_ms_info['timescale'])
representation_ms_info['fragments'].append({ representation_ms_info['fragments'].append({
media_location_key: segment_url, media_location_key: segment_url,
'duration': float_or_none(segment_d, representation_ms_info['timescale']), 'duration': duration,
'start': availability_start_time + start,
'end': availability_start_time + start + duration,
}) })
for s in representation_ms_info['s']: for s in representation_ms_info['s']:
segment_time = s.get('t') or segment_time segment_time = s['t'] if s.get('t') is not None else segment_time
segment_d = s['d'] segment_d = s['d']
add_segment_url() add_segment_url()
segment_number += 1 segment_number += 1
@ -2967,6 +2977,7 @@ class InfoExtractor:
fragments = [] fragments = []
segment_index = 0 segment_index = 0
timescale = representation_ms_info['timescale'] timescale = representation_ms_info['timescale']
start = 0
for s in representation_ms_info['s']: for s in representation_ms_info['s']:
duration = float_or_none(s['d'], timescale) duration = float_or_none(s['d'], timescale)
for _ in range(s.get('r', 0) + 1): for _ in range(s.get('r', 0) + 1):
@ -2974,8 +2985,11 @@ class InfoExtractor:
fragments.append({ fragments.append({
location_key(segment_uri): segment_uri, location_key(segment_uri): segment_uri,
'duration': duration, 'duration': duration,
'start': availability_start_time + start,
'end': availability_start_time + start + duration,
}) })
segment_index += 1 segment_index += 1
start += duration
representation_ms_info['fragments'] = fragments representation_ms_info['fragments'] = fragments
elif 'segment_urls' in representation_ms_info: elif 'segment_urls' in representation_ms_info:
# Segment URLs with no SegmentTimeline # Segment URLs with no SegmentTimeline

View File

@ -1,14 +1,13 @@
from .telecinco import TelecincoIE from .telecinco import TelecincoBaseIE
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
) )
class MiTeleIE(TelecincoIE): # XXX: Do not subclass from concrete IE class MiTeleIE(TelecincoBaseIE):
IE_DESC = 'mitele.es' IE_DESC = 'mitele.es'
_VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/)+(?P<id>[^/]+)/player' _VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/)+(?P<id>[^/]+)/player'
_TESTS = [{ _TESTS = [{
'url': 'http://www.mitele.es/programas-tv/diario-de/57b0dfb9c715da65618b4afa/player', 'url': 'http://www.mitele.es/programas-tv/diario-de/57b0dfb9c715da65618b4afa/player',
'info_dict': { 'info_dict': {
@ -27,6 +26,7 @@ class MiTeleIE(TelecincoIE): # XXX: Do not subclass from concrete IE
'timestamp': 1471209401, 'timestamp': 1471209401,
'upload_date': '20160814', 'upload_date': '20160814',
}, },
'skip': 'HTTP Error 404 Not Found',
}, { }, {
# no explicit title # no explicit title
'url': 'http://www.mitele.es/programas-tv/cuarto-milenio/57b0de3dc915da14058b4876/player', 'url': 'http://www.mitele.es/programas-tv/cuarto-milenio/57b0de3dc915da14058b4876/player',
@ -49,6 +49,26 @@ class MiTeleIE(TelecincoIE): # XXX: Do not subclass from concrete IE
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'HTTP Error 404 Not Found',
}, {
'url': 'https://www.mitele.es/programas-tv/horizonte/temporada-5/programa-171-40_013480051/player/',
'info_dict': {
'id': '7adbe22e-cd41-4787-afa4-36f3da7c2c6f',
'ext': 'mp4',
'title': 'Horizonte Temporada 5 Programa 171',
'description': 'md5:97f1fb712c5ac27e5693a8b3c5c0c6e3',
'episode': 'Las Zonas de Bajas Emisiones, a debate',
'episode_number': 171,
'season': 'Season 5',
'season_number': 5,
'series': 'Horizonte',
'duration': 7012,
'upload_date': '20240927',
'timestamp': 1727416450,
'thumbnail': 'https://album.mediaset.es/eimg/2024/09/27/horizonte-171_9f02.jpg',
'age_limit': 12,
},
'params': {'geo_bypass_country': 'ES'},
}, { }, {
'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player', 'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player',
'only_matching': True, 'only_matching': True,

View File

@ -2,15 +2,69 @@ import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError,
clean_html, clean_html,
int_or_none, int_or_none,
join_nonempty,
str_or_none, str_or_none,
try_get, traverse_obj,
update_url,
url_or_none,
) )
class TelecincoIE(InfoExtractor): class TelecincoBaseIE(InfoExtractor):
def _parse_content(self, content, url):
video_id = content['dataMediaId']
config = self._download_json(
content['dataConfig'], video_id, 'Downloading config JSON')
services = config['services']
caronte = self._download_json(services['caronte'], video_id)
if traverse_obj(caronte, ('dls', 0, 'drm', {bool})):
self.report_drm(video_id)
stream = caronte['dls'][0]['stream']
headers = {
'Referer': url,
'Origin': re.match(r'https?://[^/]+', url).group(0),
}
geo_headers = {**headers, **self.geo_verification_headers()}
try:
cdn = self._download_json(
caronte['cerbero'], video_id, data=json.dumps({
'bbx': caronte['bbx'],
'gbx': self._download_json(services['gbx'], video_id)['gbx'],
}).encode(), headers={
'Content-Type': 'application/json',
**geo_headers,
})['tokens']['1']['cdn']
except ExtractorError as error:
if isinstance(error.cause, HTTPError) and error.cause.status == 403:
error_code = traverse_obj(
self._webpage_read_content(error.cause.response, caronte['cerbero'], video_id, fatal=False),
({json.loads}, 'code', {int}))
if error_code == 4038:
self.raise_geo_restricted(countries=['ES'])
raise
formats = self._extract_m3u8_formats(
update_url(stream, query=cdn), video_id, 'mp4', m3u8_id='hls', headers=geo_headers)
return {
'id': video_id,
'title': traverse_obj(config, ('info', 'title', {str})),
'formats': formats,
'thumbnail': (traverse_obj(content, ('dataPoster', {url_or_none}))
or traverse_obj(config, 'poster', 'imageUrl', expected_type=url_or_none)),
'duration': traverse_obj(content, ('dataDuration', {int_or_none})),
'http_headers': headers,
}
class TelecincoIE(TelecincoBaseIE):
IE_DESC = 'telecinco.es, cuatro.com and mediaset.es' IE_DESC = 'telecinco.es, cuatro.com and mediaset.es'
_VALID_URL = r'https?://(?:www\.)?(?:telecinco\.es|cuatro\.com|mediaset\.es)/(?:[^/]+/)+(?P<id>.+?)\.html' _VALID_URL = r'https?://(?:www\.)?(?:telecinco\.es|cuatro\.com|mediaset\.es)/(?:[^/]+/)+(?P<id>.+?)\.html'
@ -30,6 +84,7 @@ class TelecincoIE(InfoExtractor):
'duration': 662, 'duration': 662,
}, },
}], }],
'skip': 'HTTP Error 410 Gone',
}, { }, {
'url': 'http://www.cuatro.com/deportes/futbol/barcelona/Leo_Messi-Champions-Roma_2_2052780128.html', 'url': 'http://www.cuatro.com/deportes/futbol/barcelona/Leo_Messi-Champions-Roma_2_2052780128.html',
'md5': 'c86fe0d99e3bdb46b7950d38bf6ef12a', 'md5': 'c86fe0d99e3bdb46b7950d38bf6ef12a',
@ -40,23 +95,24 @@ class TelecincoIE(InfoExtractor):
'description': 'md5:a62ecb5f1934fc787107d7b9a2262805', 'description': 'md5:a62ecb5f1934fc787107d7b9a2262805',
'duration': 79, 'duration': 79,
}, },
'skip': 'Redirects to main page',
}, { }, {
'url': 'http://www.mediaset.es/12meses/campanas/doylacara/conlatratanohaytrato/Ayudame-dar-cara-trata-trato_2_1986630220.html', 'url': 'http://www.mediaset.es/12meses/campanas/doylacara/conlatratanohaytrato/Ayudame-dar-cara-trata-trato_2_1986630220.html',
'md5': 'eddb50291df704ce23c74821b995bcac', 'md5': '5ce057f43f30b634fbaf0f18c71a140a',
'info_dict': { 'info_dict': {
'id': 'aywerkD2Sv1vGNqq9b85Q2', 'id': 'aywerkD2Sv1vGNqq9b85Q2',
'ext': 'mp4', 'ext': 'mp4',
'title': '#DOYLACARA. Con la trata no hay trato', 'title': '#DOYLACARA. Con la trata no hay trato',
'description': 'md5:2771356ff7bfad9179c5f5cd954f1477',
'duration': 50, 'duration': 50,
'thumbnail': 'https://album.mediaset.es/eimg/2017/11/02/1tlQLO5Q3mtKT24f3EaC24.jpg',
}, },
}, { }, {
# video in opening's content # video in opening's content
'url': 'https://www.telecinco.es/vivalavida/fiorella-sobrina-edmundo-arrocet-entrevista_18_2907195140.html', 'url': 'https://www.telecinco.es/vivalavida/fiorella-sobrina-edmundo-arrocet-entrevista_18_2907195140.html',
'info_dict': { 'info_dict': {
'id': '2907195140', 'id': '1691427',
'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"', 'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"',
'description': 'md5:73f340a7320143d37ab895375b2bf13a', 'description': r're:Fiorella, la sobrina de Edmundo Arrocet, concedió .{727}',
}, },
'playlist': [{ 'playlist': [{
'md5': 'adb28c37238b675dad0f042292f209a7', 'md5': 'adb28c37238b675dad0f042292f209a7',
@ -65,6 +121,7 @@ class TelecincoIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"', 'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"',
'duration': 1015, 'duration': 1015,
'thumbnail': 'https://album.mediaset.es/eimg/2020/02/29/5opaC37lUhKlZ7FoDhiVC.jpg',
}, },
}], }],
'params': { 'params': {
@ -81,66 +138,29 @@ class TelecincoIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
def _parse_content(self, content, url):
video_id = content['dataMediaId']
config = self._download_json(
content['dataConfig'], video_id, 'Downloading config JSON')
title = config['info']['title']
services = config['services']
caronte = self._download_json(services['caronte'], video_id)
stream = caronte['dls'][0]['stream']
headers = self.geo_verification_headers()
headers.update({
'Content-Type': 'application/json;charset=UTF-8',
'Origin': re.match(r'https?://[^/]+', url).group(0),
})
cdn = self._download_json(
caronte['cerbero'], video_id, data=json.dumps({
'bbx': caronte['bbx'],
'gbx': self._download_json(services['gbx'], video_id)['gbx'],
}).encode(), headers=headers)['tokens']['1']['cdn']
formats = self._extract_m3u8_formats(
stream + '?' + cdn, video_id, 'mp4', 'm3u8_native', m3u8_id='hls')
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': content.get('dataPoster') or config.get('poster', {}).get('imageUrl'),
'duration': int_or_none(content.get('dataDuration')),
}
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
article = self._parse_json(self._search_regex( article = self._search_json(
r'window\.\$REACTBASE_STATE\.article(?:_multisite)?\s*=\s*({.+})', r'window\.\$REACTBASE_STATE\.article(?:_multisite)?\s*=',
webpage, 'article'), display_id)['article'] webpage, 'article', display_id)['article']
title = article.get('title') description = traverse_obj(article, ('leadParagraph', {clean_html}, filter))
description = clean_html(article.get('leadParagraph')) or ''
if article.get('editorialType') != 'VID': if article.get('editorialType') != 'VID':
entries = [] entries = []
body = [article.get('opening')]
body.extend(try_get(article, lambda x: x['body'], list) or []) for p in traverse_obj(article, ((('opening', all), 'body'), lambda _, v: v['content'])):
for p in body: content = p['content']
if not isinstance(p, dict):
continue
content = p.get('content')
if not content:
continue
type_ = p.get('type') type_ = p.get('type')
if type_ == 'paragraph': if type_ == 'paragraph' and isinstance(content, str):
content_str = str_or_none(content) description = join_nonempty(description, content, delim='')
if content_str: elif type_ == 'video' and isinstance(content, dict):
description += content_str
continue
if type_ == 'video' and isinstance(content, dict):
entries.append(self._parse_content(content, url)) entries.append(self._parse_content(content, url))
return self.playlist_result( return self.playlist_result(
entries, str_or_none(article.get('id')), title, description) entries, str_or_none(article.get('id')),
content = article['opening']['content'] traverse_obj(article, ('title', {str})), clean_html(description))
info = self._parse_content(content, url)
info.update({ info = self._parse_content(article['opening']['content'], url)
'description': description, info['description'] = description
})
return info return info

View File

@ -114,6 +114,7 @@ INNERTUBE_CLIENTS = {
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 67, 'INNERTUBE_CONTEXT_CLIENT_NAME': 67,
}, },
# This client now requires sign-in for every video
'web_creator': { 'web_creator': {
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
@ -153,6 +154,7 @@ INNERTUBE_CLIENTS = {
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
'REQUIRE_PO_TOKEN': True, 'REQUIRE_PO_TOKEN': True,
}, },
# This client now requires sign-in for every video
'android_creator': { 'android_creator': {
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
@ -200,21 +202,6 @@ INNERTUBE_CLIENTS = {
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
'PLAYER_PARAMS': '2AMB', 'PLAYER_PARAMS': '2AMB',
}, },
# This client only has legacy formats and storyboards
'android_producer': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_PRODUCER',
'clientVersion': '0.111.1',
'androidSdkVersion': 30,
'userAgent': 'com.google.android.apps.youtube.producer/0.111.1 (Linux; U; Android 11) gzip',
'osName': 'Android',
'osVersion': '11',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 91,
'REQUIRE_JS_PLAYER': False,
},
# iOS clients have HLS live streams. Setting device model to get 60fps formats. # iOS clients have HLS live streams. Setting device model to get 60fps formats.
# See: https://github.com/TeamNewPipe/NewPipeExtractor/issues/680#issuecomment-1002724558 # See: https://github.com/TeamNewPipe/NewPipeExtractor/issues/680#issuecomment-1002724558
'ios': { 'ios': {
@ -247,6 +234,7 @@ INNERTUBE_CLIENTS = {
'INNERTUBE_CONTEXT_CLIENT_NAME': 26, 'INNERTUBE_CONTEXT_CLIENT_NAME': 26,
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
}, },
# This client now requires sign-in for every video
'ios_creator': { 'ios_creator': {
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
@ -282,8 +270,9 @@ INNERTUBE_CLIENTS = {
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 7, 'INNERTUBE_CONTEXT_CLIENT_NAME': 7,
}, },
# This client can access age restricted videos (unless the uploader has disabled the 'allow embedding' option) # This client now requires sign-in for every video
# See: https://github.com/zerodytrash/YouTube-Internal-Clients # It was previously an age-gate workaround for videos that were `playable_in_embed`
# It may still be useful if signed into an EU account that is not age-verified
'tv_embedded': { 'tv_embedded': {
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
@ -1525,6 +1514,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'heatmap': 'count:100', 'heatmap': 'count:100',
'timestamp': 1401991663, 'timestamp': 1401991663,
}, },
'skip': 'Age-restricted; requires authentication',
}, },
{ {
'note': 'Age-gate video with embed allowed in public site', 'note': 'Age-gate video with embed allowed in public site',
@ -1555,6 +1545,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'comment_count': int, 'comment_count': int,
'channel_is_verified': True, 'channel_is_verified': True,
}, },
'skip': 'Age-restricted; requires authentication',
}, },
{ {
'note': 'Age-gate video embedable only with clientScreen=EMBED', 'note': 'Age-gate video embedable only with clientScreen=EMBED',
@ -1585,6 +1576,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': '@ProjektMelody', 'uploader_id': '@ProjektMelody',
'timestamp': 1577508724, 'timestamp': 1577508724,
}, },
'skip': 'Age-restricted; requires authentication',
}, },
{ {
'note': 'Non-Agegated non-embeddable video', 'note': 'Non-Agegated non-embeddable video',
@ -2356,6 +2348,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'channel_is_verified': True, 'channel_is_verified': True,
'timestamp': 1405513526, 'timestamp': 1405513526,
}, },
'skip': 'Age-restricted; requires authentication',
}, },
{ {
# restricted location, https://github.com/ytdl-org/youtube-dl/issues/28685 # restricted location, https://github.com/ytdl-org/youtube-dl/issues/28685
@ -2726,6 +2719,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'timestamp': 1577508724, 'timestamp': 1577508724,
}, },
'params': {'extractor_args': {'youtube': {'player_client': ['tv_embedded']}}, 'format': '251-drc'}, 'params': {'extractor_args': {'youtube': {'player_client': ['tv_embedded']}}, 'format': '251-drc'},
'skip': 'Age-restricted; requires authentication',
}, },
{ {
'url': 'https://www.youtube.com/live/qVv6vCqciTM', 'url': 'https://www.youtube.com/live/qVv6vCqciTM',
@ -2865,6 +2859,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
microformats = traverse_obj( microformats = traverse_obj(
prs, (..., 'microformat', 'playerMicroformatRenderer'), prs, (..., 'microformat', 'playerMicroformatRenderer'),
expected_type=dict) expected_type=dict)
with lock:
_, live_status, _, formats, _ = self._list_formats(video_id, microformats, video_details, prs, player_url) _, live_status, _, formats, _ = self._list_formats(video_id, microformats, video_details, prs, player_url)
is_live = live_status == 'is_live' is_live = live_status == 'is_live'
start_time = time.time() start_time = time.time()
@ -2874,7 +2869,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
@returns (manifest_url, manifest_stream_number, is_live) or None @returns (manifest_url, manifest_stream_number, is_live) or None
""" """
for retry in self.RetryManager(fatal=False): for retry in self.RetryManager(fatal=False):
with lock:
refetch_manifest(format_id, delay) refetch_manifest(format_id, delay)
f = next((f for f in formats if f['format_id'] == format_id), None) f = next((f for f in formats if f['format_id'] == format_id), None)
@ -2906,6 +2900,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
begin_index = 0 begin_index = 0
download_start_time = ctx.get('start') or time.time() download_start_time = ctx.get('start') or time.time()
section_start = ctx.get('section_start') or 0
section_end = ctx.get('section_end') or math.inf
self.write_debug(f'Selected section: {section_start} -> {section_end}')
lack_early_segments = download_start_time - (live_start_time or download_start_time) > MAX_DURATION lack_early_segments = download_start_time - (live_start_time or download_start_time) > MAX_DURATION
if lack_early_segments: if lack_early_segments:
self.report_warning(bug_reports_message( self.report_warning(bug_reports_message(
@ -2926,9 +2925,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
or (mpd_url, stream_number, False)) or (mpd_url, stream_number, False))
if not refresh_sequence: if not refresh_sequence:
if expire_fast and not is_live: if expire_fast and not is_live:
return False, last_seq return False
elif old_mpd_url == mpd_url: elif old_mpd_url == mpd_url:
return True, last_seq return True
if manifestless_orig_fmt: if manifestless_orig_fmt:
fmt_info = manifestless_orig_fmt fmt_info = manifestless_orig_fmt
else: else:
@ -2939,14 +2939,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
fmts = None fmts = None
if not fmts: if not fmts:
no_fragment_score += 2 no_fragment_score += 2
return False, last_seq return False
fmt_info = next(x for x in fmts if x['manifest_stream_number'] == stream_number) fmt_info = next(x for x in fmts if x['manifest_stream_number'] == stream_number)
fragments = fmt_info['fragments'] fragments = fmt_info['fragments']
fragment_base_url = fmt_info['fragment_base_url'] fragment_base_url = fmt_info['fragment_base_url']
assert fragment_base_url assert fragment_base_url
_last_seq = int(re.search(r'(?:/|^)sq/(\d+)', fragments[-1]['path']).group(1)) return True
return True, _last_seq
self.write_debug(f'[{video_id}] Generating fragments for format {format_id}') self.write_debug(f'[{video_id}] Generating fragments for format {format_id}')
while is_live: while is_live:
@ -2966,11 +2965,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
last_segment_url = None last_segment_url = None
continue continue
else: else:
should_continue, last_seq = _extract_sequence_from_mpd(True, no_fragment_score > 15) should_continue = _extract_sequence_from_mpd(True, no_fragment_score > 15)
no_fragment_score += 2 no_fragment_score += 2
if not should_continue: if not should_continue:
continue continue
last_fragment = fragments[-1]
last_seq = int(re.search(r'(?:/|^)sq/(\d+)', fragments[-1]['path']).group(1))
known_fragment = next(
(fragment for fragment in fragments if f'sq/{known_idx}' in fragment['path']), None)
if known_fragment and known_fragment['end'] > section_end:
break
if known_idx > last_seq: if known_idx > last_seq:
last_segment_url = None last_segment_url = None
continue continue
@ -2980,20 +2987,36 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if begin_index < 0 and known_idx < 0: if begin_index < 0 and known_idx < 0:
# skip from the start when it's negative value # skip from the start when it's negative value
known_idx = last_seq + begin_index known_idx = last_seq + begin_index
if lack_early_segments: if lack_early_segments:
known_idx = max(known_idx, last_seq - int(MAX_DURATION // fragments[-1]['duration'])) known_idx = max(known_idx, last_seq - int(MAX_DURATION // last_fragment['duration']))
fragment_count = last_seq - known_idx if section_end == math.inf else int(
(section_end - section_start) // last_fragment['duration'])
try: try:
for idx in range(known_idx, last_seq): for idx in range(known_idx, last_seq):
# do not update sequence here or you'll get skipped some part of it # do not update sequence here or you'll get skipped some part of it
should_continue, _ = _extract_sequence_from_mpd(False, False) should_continue = _extract_sequence_from_mpd(False, False)
if not should_continue: if not should_continue:
known_idx = idx - 1 known_idx = idx - 1
raise ExtractorError('breaking out of outer loop') raise ExtractorError('breaking out of outer loop')
frag_duration = last_fragment['duration']
frag_start = last_fragment['start'] - (last_seq - idx) * frag_duration
frag_end = frag_start + frag_duration
if frag_start >= section_start and frag_end <= section_end:
last_segment_url = urljoin(fragment_base_url, f'sq/{idx}') last_segment_url = urljoin(fragment_base_url, f'sq/{idx}')
yield { yield {
'url': last_segment_url, 'url': last_segment_url,
'fragment_count': last_seq, 'fragment_count': fragment_count,
'duration': frag_duration,
'start': frag_start,
'end': frag_end,
} }
if known_idx == last_seq: if known_idx == last_seq:
no_fragment_score += 5 no_fragment_score += 5
else: else:
@ -3953,26 +3976,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
else: else:
prs.append(pr) prs.append(pr)
# tv_embedded can work around age-gate and age-verification IF the video is embeddable
if self._is_agegated(pr) and variant != 'tv_embedded':
append_client(f'tv_embedded.{base_client}')
# Unauthenticated users will only get tv_embedded client formats if age-gated
if self._is_agegated(pr) and not self.is_authenticated:
self.to_screen(
f'{video_id}: This video is age-restricted; some formats may be missing '
f'without authentication. {self._login_hint()}', only_once=True)
# EU countries require age-verification for accounts to access age-restricted videos # EU countries require age-verification for accounts to access age-restricted videos
# If account is not age-verified, _is_agegated() will be truthy for non-embedded clients # If account is not age-verified, _is_agegated() will be truthy for non-embedded clients
# If embedding is disabled for the video, _is_unplayable() will be truthy for tv_embedded if self.is_authenticated and self._is_agegated(pr):
embedding_is_disabled = variant == 'tv_embedded' and self._is_unplayable(pr)
if self.is_authenticated and (self._is_agegated(pr) or embedding_is_disabled):
self.to_screen( self.to_screen(
f'{video_id}: This video is age-restricted and YouTube is requiring ' f'{video_id}: This video is age-restricted and YouTube is requiring '
'account age-verification; some formats may be missing', only_once=True) 'account age-verification; some formats may be missing', only_once=True)
# web_creator and mediaconnect can work around the age-verification requirement # web_creator and mediaconnect can work around the age-verification requirement
# _producer, _testsuite, & _vr variants can also work around age-verification # _testsuite & _vr variants can also work around age-verification
# tv_embedded may(?) still work around age-verification if the video is embeddable
append_client('web_creator', 'mediaconnect') append_client('web_creator', 'mediaconnect')
prs.extend(deprioritized_prs) prs.extend(deprioritized_prs)
@ -4176,6 +4188,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
dct['downloader_options'] = {'http_chunk_size': CHUNK_SIZE} dct['downloader_options'] = {'http_chunk_size': CHUNK_SIZE}
yield dct yield dct
if live_status == 'is_live' and self.get_param('download_ranges') and not self.get_param('live_from_start'):
self.report_warning('For YT livestreams, --download-sections is only supported with --live-from-start')
needs_live_processing = self._needs_live_processing(live_status, duration) needs_live_processing = self._needs_live_processing(live_status, duration)
skip_bad_formats = 'incomplete' not in format_types skip_bad_formats = 'incomplete' not in format_types
if self._configuration_arg('include_incomplete_formats'): if self._configuration_arg('include_incomplete_formats'):

View File

@ -427,7 +427,14 @@ def create_parser():
general.add_option( general.add_option(
'--live-from-start', '--live-from-start',
action='store_true', dest='live_from_start', action='store_true', dest='live_from_start',
help='Download livestreams from the start. Currently only supported for YouTube (Experimental)') help=('Download livestreams from the start. Currently only supported for YouTube (Experimental). '
'Time ranges can be specified using --download-sections to download only a part of the stream. '
'Negative values are allowed for specifying a relative previous time, using the # syntax '
'e.g. --download-sections "#-24hours - 0" (download last 24 hours), '
'e.g. --download-sections "#-1h - 30m" (download from 1 hour ago until the next 30 minutes), '
'e.g. --download-sections "#-3days - -2days" (download from 3 days ago until 2 days ago). '
'It is also possible to specify an exact unix timestamp range, using the * syntax, '
'e.g. --download-sections "*1672531200 - 1672549200" (download between those two timestamps)'))
general.add_option( general.add_option(
'--no-live-from-start', '--no-live-from-start',
action='store_false', dest='live_from_start', action='store_false', dest='live_from_start',

View File

@ -1236,7 +1236,7 @@ def unified_strdate(date_str, day_first=True):
return str(upload_date) return str(upload_date)
def unified_timestamp(date_str, day_first=True): def unified_timestamp(date_str, day_first=True, with_milliseconds=False):
if not isinstance(date_str, str): if not isinstance(date_str, str):
return None return None
@ -1262,7 +1262,7 @@ def unified_timestamp(date_str, day_first=True):
for expression in date_formats(day_first): for expression in date_formats(day_first):
with contextlib.suppress(ValueError): with contextlib.suppress(ValueError):
dt_ = dt.datetime.strptime(date_str, expression) - timezone + dt.timedelta(hours=pm_delta) dt_ = dt.datetime.strptime(date_str, expression) - timezone + dt.timedelta(hours=pm_delta)
return calendar.timegm(dt_.timetuple()) return calendar.timegm(dt_.timetuple()) + (dt_.microsecond / 1e6 if with_milliseconds else 0)
timetuple = email.utils.parsedate_tz(date_str) timetuple = email.utils.parsedate_tz(date_str)
if timetuple: if timetuple:
@ -2085,16 +2085,19 @@ def parse_duration(s):
days, hours, mins, secs, ms = [None] * 5 days, hours, mins, secs, ms = [None] * 5
m = re.match(r'''(?x) m = re.match(r'''(?x)
(?P<sign>[+-])?
(?P<before_secs> (?P<before_secs>
(?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)? (?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)?
(?P<secs>(?(before_secs)[0-9]{1,2}|[0-9]+)) (?P<secs>(?(before_secs)[0-9]{1,2}|[0-9]+))
(?P<ms>[.:][0-9]+)?Z?$ (?P<ms>[.:][0-9]+)?Z?$
''', s) ''', s)
if m: if m:
days, hours, mins, secs, ms = m.group('days', 'hours', 'mins', 'secs', 'ms') sign, days, hours, mins, secs, ms = m.group('sign', 'days', 'hours', 'mins', 'secs', 'ms')
else: else:
m = re.match( m = re.match(
r'''(?ix)(?:P? r'''(?ix)(?:
(?P<sign>[+-])?
P?
(?: (?:
[0-9]+\s*y(?:ears?)?,?\s* [0-9]+\s*y(?:ears?)?,?\s*
)? )?
@ -2118,17 +2121,19 @@ def parse_duration(s):
(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*s(?:ec(?:ond)?s?)?\s* (?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*s(?:ec(?:ond)?s?)?\s*
)?Z?$''', s) )?Z?$''', s)
if m: if m:
days, hours, mins, secs, ms = m.groups() sign, days, hours, mins, secs, ms = m.groups()
else: else:
m = re.match(r'(?i)(?:(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)Z?$', s) m = re.match(r'(?i)(?P<sign>[+-])?(?:(?P<days>[0-9.]+)\s*(?:days?)|(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)Z?$', s)
if m: if m:
hours, mins = m.groups() sign, days, hours, mins = m.groups()
else: else:
return None return None
sign = -1 if sign == '-' else 1
if ms: if ms:
ms = ms.replace(':', '.') ms = ms.replace(':', '.')
return sum(float(part or 0) * mult for part, mult in ( return sign * sum(float(part or 0) * mult for part, mult in (
(days, 86400), (hours, 3600), (mins, 60), (secs, 1), (ms, 1))) (days, 86400), (hours, 3600), (mins, 60), (secs, 1), (ms, 1)))