Compare commits

..

No commits in common. "5ca095cbcde3e32642a4fe5b2d69e8e3c785a021" and "c1d71d0d9f41db5e4306c86af232f5f6220a130b" have entirely different histories.

21 changed files with 35 additions and 23 deletions

View File

@ -217,7 +217,7 @@ After you have ensured this site is distributing its content legally, you can fo
1. Add an import in [`yt_dlp/extractor/_extractors.py`](yt_dlp/extractor/_extractors.py). Note that the class name must end with `IE`. 1. Add an import in [`yt_dlp/extractor/_extractors.py`](yt_dlp/extractor/_extractors.py). Note that the class name must end with `IE`.
1. Run `python test/test_download.py TestDownload.test_YourExtractor` (note that `YourExtractor` doesn't end with `IE`). This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, the tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. You can also run all the tests in one go with `TestDownload.test_YourExtractor_all` 1. Run `python test/test_download.py TestDownload.test_YourExtractor` (note that `YourExtractor` doesn't end with `IE`). This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, the tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. You can also run all the tests in one go with `TestDownload.test_YourExtractor_all`
1. Make sure you have atleast one test for your extractor. Even if all videos covered by the extractor are expected to be inaccessible for automated testing, tests should still be added with a `skip` parameter indicating why the particular test is disabled from running. 1. Make sure you have atleast one test for your extractor. Even if all videos covered by the extractor are expected to be inaccessible for automated testing, tests should still be added with a `skip` parameter indicating why the particular test is disabled from running.
1. Have a look at [`yt_dlp/extractor/common.py`](yt_dlp/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](yt_dlp/extractor/common.py#L119-L440). Add tests and code for as many as you want. 1. Have a look at [`yt_dlp/extractor/common.py`](yt_dlp/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](yt_dlp/extractor/common.py#L91-L426). Add tests and code for as many as you want.
1. Make sure your code follows [yt-dlp coding conventions](#yt-dlp-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart): 1. Make sure your code follows [yt-dlp coding conventions](#yt-dlp-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
$ flake8 yt_dlp/extractor/yourextractor.py $ flake8 yt_dlp/extractor/yourextractor.py
@ -251,7 +251,7 @@ Extractors are very fragile by nature since they depend on the layout of the sou
### Mandatory and optional metafields ### Mandatory and optional metafields
For extraction to work yt-dlp relies on metadata your extractor extracts and provides to yt-dlp expressed by an [information dictionary](yt_dlp/extractor/common.py#L119-L440) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by yt-dlp: For extraction to work yt-dlp relies on metadata your extractor extracts and provides to yt-dlp expressed by an [information dictionary](yt_dlp/extractor/common.py#L91-L426) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by yt-dlp:
- `id` (media identifier) - `id` (media identifier)
- `title` (media title) - `title` (media title)
@ -696,7 +696,7 @@ formats = [
### Use convenience conversion and parsing functions ### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`yt_dlp/utils/`](yt_dlp/utils/): `int_or_none`, `float_or_none`. Use them for string to number conversions as well. Wrap all extracted numeric data into safe functions from [`yt_dlp/utils.py`](yt_dlp/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
Use `url_or_none` for safe URL processing. Use `url_or_none` for safe URL processing.
@ -704,7 +704,7 @@ Use `traverse_obj` and `try_call` (superseeds `dict_get` and `try_get`) for safe
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction. Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
Explore [`yt_dlp/utils/`](yt_dlp/utils/) for more useful convenience functions. Explore [`yt_dlp/utils.py`](yt_dlp/utils.py) for more useful convenience functions.
#### Examples #### Examples

View File

@ -1800,7 +1800,7 @@ The following extractors use this feature:
#### youtube #### youtube
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes * `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively * `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The main clients are `web`, `android` and `ios` with variants `_music`, `_embedded`, `_embedscreen`, `_creator` (e.g. `web_embedded`); and `mweb`, `mweb_embedscreen` and `tv_embedded` (agegate bypass) with no variants. By default, `ios,android,web` is used, but `tv_embedded` and `creator` variants are added as required for age-gated videos. Similarly, the music variants are added for `music.youtube.com` urls. You can use `all` to use all the clients, and `default` for the default clients. * `player_client`: Clients to extract video data from. The main clients are `web`, `android` and `ios` with variants `_music`, `_embedded`, `_embedscreen`, `_creator` (e.g. `web_embedded`); and `mweb` and `tv_embedded` (agegate bypass) with no variants. By default, `ios,android,web` is used, but `tv_embedded` and `creator` variants are added as required for age-gated videos. Similarly, the music variants are added for `music.youtube.com` urls. You can use `all` to use all the clients, and `default` for the default clients.
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details * `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp. * `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side) * `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)

View File

@ -260,7 +260,7 @@ class CommitRange:
AUTHOR_INDICATOR_RE = re.compile(r'Authored by:? ', re.IGNORECASE) AUTHOR_INDICATOR_RE = re.compile(r'Authored by:? ', re.IGNORECASE)
MESSAGE_RE = re.compile(r''' MESSAGE_RE = re.compile(r'''
(?:\[(?P<prefix>[^\]]+)\]\ )? (?:\[(?P<prefix>[^\]]+)\]\ )?
(?:(?P<sub_details>`?[\w.-]+`?): )? (?:(?P<sub_details>`?[^:`]+`?): )?
(?P<message>.+?) (?P<message>.+?)
(?:\ \((?P<issues>\#\d+(?:,\ \#\d+)*)\))? (?:\ \((?P<issues>\#\d+(?:,\ \#\d+)*)\))?
''', re.VERBOSE | re.DOTALL) ''', re.VERBOSE | re.DOTALL)

View File

@ -631,6 +631,7 @@ class TestYoutubeDL(unittest.TestCase):
self.assertEqual(test_dict['playlist'], 'funny videos') self.assertEqual(test_dict['playlist'], 'funny videos')
outtmpl_info = { outtmpl_info = {
'id': '1234',
'id': '1234', 'id': '1234',
'ext': 'mp4', 'ext': 'mp4',
'width': None, 'width': None,

View File

@ -269,14 +269,14 @@ class TestNetworkingExceptions:
assert not response.closed assert not response.closed
def test_incomplete_read_error(self): def test_incomplete_read_error(self):
error = IncompleteRead(4, 3, cause='test') error = IncompleteRead(b'test', 3, cause='test')
assert isinstance(error, IncompleteRead) assert isinstance(error, IncompleteRead)
assert repr(error) == '<IncompleteRead: 4 bytes read, 3 more expected>' assert repr(error) == '<IncompleteRead: 4 bytes read, 3 more expected>'
assert str(error) == error.msg == '4 bytes read, 3 more expected' assert str(error) == error.msg == '4 bytes read, 3 more expected'
assert error.partial == 4 assert error.partial == b'test'
assert error.expected == 3 assert error.expected == 3
assert error.cause == 'test' assert error.cause == 'test'
error = IncompleteRead(3) error = IncompleteRead(b'aaa')
assert repr(error) == '<IncompleteRead: 3 bytes read>' assert repr(error) == '<IncompleteRead: 3 bytes read>'
assert str(error) == '3 bytes read' assert str(error) == '3 bytes read'

View File

@ -239,9 +239,9 @@ class YoutubeDL:
'selected' (check selected formats), 'selected' (check selected formats),
or None (check only if requested by extractor) or None (check only if requested by extractor)
paths: Dictionary of output paths. The allowed keys are 'home' paths: Dictionary of output paths. The allowed keys are 'home'
'temp' and the keys of OUTTMPL_TYPES (in utils/_utils.py) 'temp' and the keys of OUTTMPL_TYPES (in utils.py)
outtmpl: Dictionary of templates for output names. Allowed keys outtmpl: Dictionary of templates for output names. Allowed keys
are 'default' and the keys of OUTTMPL_TYPES (in utils/_utils.py). are 'default' and the keys of OUTTMPL_TYPES (in utils.py).
For compatibility with youtube-dl, a single string can also be used For compatibility with youtube-dl, a single string can also be used
outtmpl_na_placeholder: Placeholder for unavailable meta fields. outtmpl_na_placeholder: Placeholder for unavailable meta fields.
restrictfilenames: Do not allow "&" and spaces in file names restrictfilenames: Do not allow "&" and spaces in file names
@ -422,7 +422,7 @@ class YoutubeDL:
asked whether to download the video. asked whether to download the video.
- Raise utils.DownloadCancelled(msg) to abort remaining - Raise utils.DownloadCancelled(msg) to abort remaining
downloads when a video is rejected. downloads when a video is rejected.
match_filter_func in utils/_utils.py is one example for this. match_filter_func in utils.py is one example for this.
color: A Dictionary with output stream names as keys color: A Dictionary with output stream names as keys
and their respective color policy as values. and their respective color policy as values.
Can also just be a single color policy, Can also just be a single color policy,

View File

@ -1,7 +1,7 @@
# flake8: noqa: F405 # flake8: noqa: F405
from urllib import * # noqa: F403 from urllib import * # noqa: F403
del request # noqa: F821 del request
from . import request # noqa: F401 from . import request # noqa: F401
from ..compat_utils import passthrough_module from ..compat_utils import passthrough_module

View File

@ -180,6 +180,7 @@ class ABCIViewIE(InfoExtractor):
_VALID_URL = r'https?://iview\.abc\.net\.au/(?:[^/]+/)*video/(?P<id>[^/?#]+)' _VALID_URL = r'https?://iview\.abc\.net\.au/(?:[^/]+/)*video/(?P<id>[^/?#]+)'
_GEO_COUNTRIES = ['AU'] _GEO_COUNTRIES = ['AU']
# ABC iview programs are normally available for 14 days only.
_TESTS = [{ _TESTS = [{
'url': 'https://iview.abc.net.au/show/gruen/series/11/video/LE1927H001S00', 'url': 'https://iview.abc.net.au/show/gruen/series/11/video/LE1927H001S00',
'md5': '67715ce3c78426b11ba167d875ac6abf', 'md5': '67715ce3c78426b11ba167d875ac6abf',

View File

@ -169,7 +169,7 @@ class ArteTVIE(ArteTVBaseIE):
))) )))
short_label = traverse_obj(stream_version, 'shortLabel', expected_type=str, default='?') short_label = traverse_obj(stream_version, 'shortLabel', expected_type=str, default='?')
if 'HLS' in stream['protocol']: if stream['protocol'].startswith('HLS'):
fmts, subs = self._extract_m3u8_formats_and_subtitles( fmts, subs = self._extract_m3u8_formats_and_subtitles(
stream['url'], video_id=video_id, ext='mp4', m3u8_id=stream_version_code, fatal=False) stream['url'], video_id=video_id, ext='mp4', m3u8_id=stream_version_code, fatal=False)
for fmt in fmts: for fmt in fmts:

View File

@ -197,6 +197,10 @@ class IGNVideoIE(IGNBaseIE):
'thumbnail': 'https://sm.ign.com/ign_me/video/h/how-hitman/how-hitman-aims-to-be-different-than-every-other-s_8z14.jpg', 'thumbnail': 'https://sm.ign.com/ign_me/video/h/how-hitman/how-hitman-aims-to-be-different-than-every-other-s_8z14.jpg',
'duration': 298, 'duration': 298,
'tags': 'count:13', 'tags': 'count:13',
'display_id': '112203',
'thumbnail': 'https://sm.ign.com/ign_me/video/h/how-hitman/how-hitman-aims-to-be-different-than-every-other-s_8z14.jpg',
'duration': 298,
'tags': 'count:13',
}, },
'expected_warnings': ['HTTP Error 400: Bad Request'], 'expected_warnings': ['HTTP Error 400: Bad Request'],
}, { }, {

View File

@ -127,6 +127,7 @@ class NebulaIE(NebulaBaseIE):
'channel_id': 'lindsayellis', 'channel_id': 'lindsayellis',
'uploader': 'Lindsay Ellis', 'uploader': 'Lindsay Ellis',
'uploader_id': 'lindsayellis', 'uploader_id': 'lindsayellis',
'timestamp': 1533009600,
'uploader_url': 'https://nebula.tv/lindsayellis', 'uploader_url': 'https://nebula.tv/lindsayellis',
'series': 'Lindsay Ellis', 'series': 'Lindsay Ellis',
'display_id': 'that-time-disney-remade-beauty-and-the-beast', 'display_id': 'that-time-disney-remade-beauty-and-the-beast',

View File

@ -146,6 +146,7 @@ class PlayVidsIE(PeekVidsBaseIE):
'uploader': 'Brazzers', 'uploader': 'Brazzers',
'age_limit': 18, 'age_limit': 18,
'view_count': int, 'view_count': int,
'age_limit': 18,
'categories': list, 'categories': list,
'tags': list, 'tags': list,
}, },

View File

@ -82,7 +82,7 @@ class RadioFranceBaseIE(InfoExtractor):
def _extract_data_from_webpage(self, webpage, display_id, key): def _extract_data_from_webpage(self, webpage, display_id, key):
return traverse_obj(self._search_json( return traverse_obj(self._search_json(
r'\bconst\s+data\s*=', webpage, key, display_id, r'\bconst\s+data\s*=', webpage, key, display_id,
contains_pattern=r'\[\{(?s:.+)\}\]', transform_source=js_to_json), contains_pattern=r'(\[\{.*?\}\]);', transform_source=js_to_json),
(..., 'data', key, {dict}), get_all=False) or {} (..., 'data', key, {dict}), get_all=False) or {}

View File

@ -239,10 +239,10 @@ class RCSEmbedsIE(RCSBaseIE):
} }
}, { }, {
'url': 'https://video.gazzanet.gazzetta.it/video-embed/gazzanet-mo05-0000260789', 'url': 'https://video.gazzanet.gazzetta.it/video-embed/gazzanet-mo05-0000260789',
'only_matching': True 'match_only': True
}, { }, {
'url': 'https://video.gazzetta.it/video-embed/49612410-00ca-11eb-bcd8-30d4253e0140', 'url': 'https://video.gazzetta.it/video-embed/49612410-00ca-11eb-bcd8-30d4253e0140',
'only_matching': True 'match_only': True
}] }]
_WEBPAGE_TESTS = [{ _WEBPAGE_TESTS = [{
'url': 'https://www.iodonna.it/video-iodonna/personaggi-video/monica-bellucci-piu-del-lavoro-oggi-per-me-sono-importanti-lamicizia-e-la-famiglia/', 'url': 'https://www.iodonna.it/video-iodonna/personaggi-video/monica-bellucci-piu-del-lavoro-oggi-per-me-sono-importanti-lamicizia-e-la-famiglia/',
@ -325,7 +325,7 @@ class RCSIE(RCSBaseIE):
} }
}, { }, {
'url': 'https://video.corriere.it/video-360/metro-copenaghen-tutta-italiana/a248a7f0-e2db-11e9-9830-af2de6b1f945', 'url': 'https://video.corriere.it/video-360/metro-copenaghen-tutta-italiana/a248a7f0-e2db-11e9-9830-af2de6b1f945',
'only_matching': True 'match_only': True
}] }]

View File

@ -40,6 +40,7 @@ class RokfinIE(InfoExtractor):
'channel': 'Jimmy Dore', 'channel': 'Jimmy Dore',
'channel_id': 65429, 'channel_id': 65429,
'channel_url': 'https://rokfin.com/TheJimmyDoreShow', 'channel_url': 'https://rokfin.com/TheJimmyDoreShow',
'duration': 213.0,
'availability': 'public', 'availability': 'public',
'live_status': 'not_live', 'live_status': 'not_live',
'dislike_count': int, 'dislike_count': int,

View File

@ -78,6 +78,7 @@ class S4CSeriesIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '864982911', 'id': '864982911',
'title': 'Iaith ar Daith', 'title': 'Iaith ar Daith',
'description': 'md5:e878ebf660dce89bd2ef521d7ce06397'
}, },
}, { }, {
'url': 'https://www.s4c.cymru/clic/series/866852587', 'url': 'https://www.s4c.cymru/clic/series/866852587',
@ -85,6 +86,7 @@ class S4CSeriesIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '866852587', 'id': '866852587',
'title': 'FFIT Cymru', 'title': 'FFIT Cymru',
'description': 'md5:abcb3c129cb68dbb6cd304fd33b07e96'
}, },
}] }]

View File

@ -76,6 +76,7 @@ class SovietsClosetIE(SovietsClosetBaseIE):
'title': 'Arma 3 - Zeus Games #5', 'title': 'Arma 3 - Zeus Games #5',
'uploader': 'SovietWomble', 'uploader': 'SovietWomble',
'thumbnail': r're:^https?://.*\.b-cdn\.net/c0e5e76f-3a93-40b4-bf01-12343c2eec5d/thumbnail\.jpg$', 'thumbnail': r're:^https?://.*\.b-cdn\.net/c0e5e76f-3a93-40b4-bf01-12343c2eec5d/thumbnail\.jpg$',
'uploader': 'SovietWomble',
'creator': 'SovietWomble', 'creator': 'SovietWomble',
'release_timestamp': 1461157200, 'release_timestamp': 1461157200,
'release_date': '20160420', 'release_date': '20160420',

View File

@ -902,7 +902,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
e.g. 'streamed 6 days ago', '5 seconds ago (edited)', 'updated today', '8 yr ago' e.g. 'streamed 6 days ago', '5 seconds ago (edited)', 'updated today', '8 yr ago'
""" """
# XXX: this could be moved to a general function in utils/_utils.py # XXX: this could be moved to a general function in utils.py
# The relative time text strings are roughly the same as what # The relative time text strings are roughly the same as what
# Javascript's Intl.RelativeTimeFormat function generates. # Javascript's Intl.RelativeTimeFormat function generates.
# See: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/RelativeTimeFormat # See: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/RelativeTimeFormat

View File

@ -1,4 +1,4 @@
# flake8: noqa: F401 # flake8: noqa: 401
from .common import ( from .common import (
HEADRequest, HEADRequest,
PUTRequest, PUTRequest,

View File

@ -337,7 +337,7 @@ def handle_sslerror(e: ssl.SSLError):
def handle_response_read_exceptions(e): def handle_response_read_exceptions(e):
if isinstance(e, http.client.IncompleteRead): if isinstance(e, http.client.IncompleteRead):
raise IncompleteRead(partial=len(e.partial), cause=e, expected=e.expected) from e raise IncompleteRead(partial=e.partial, cause=e, expected=e.expected) from e
elif isinstance(e, ssl.SSLError): elif isinstance(e, ssl.SSLError):
handle_sslerror(e) handle_sslerror(e)
elif isinstance(e, (OSError, EOFError, http.client.HTTPException, *CONTENT_DECODE_ERRORS)): elif isinstance(e, (OSError, EOFError, http.client.HTTPException, *CONTENT_DECODE_ERRORS)):

View File

@ -75,10 +75,10 @@ class HTTPError(RequestError):
class IncompleteRead(TransportError): class IncompleteRead(TransportError):
def __init__(self, partial: int, expected: int = None, **kwargs): def __init__(self, partial, expected=None, **kwargs):
self.partial = partial self.partial = partial
self.expected = expected self.expected = expected
msg = f'{partial} bytes read' msg = f'{len(partial)} bytes read'
if expected is not None: if expected is not None:
msg += f', {expected} more expected' msg += f', {expected} more expected'