Compare commits

...

6 Commits

Author SHA1 Message Date
Simon Sawicki
e2320b8877
Remove -bb from yt-dlp.cmd and yt-dlp.sh 2023-12-29 18:47:45 +01:00
Simon Sawicki
1688392137
Update collaborators 2023-12-29 17:07:01 +01:00
Simon Sawicki
b410bdfb5b
Merge branch 'master' into cleanup/2023-12 2023-12-29 15:17:14 +01:00
Simon Sawicki
225cf2b830
Fix 2d1d683a54
Authored by: Grub4K
2023-12-26 20:07:09 +01:00
Simon Sawicki
2d1d683a54
[devscripts] run_tests: Create Python script (#8720)
Authored by: Grub4K
2023-12-26 18:30:04 +01:00
Simon Sawicki
65de7d204c
Update to ytdl-commit-be008e6 (#8836)
- [utils] Make restricted filenames ignore some Unicode categories (by dirkf)
- [ie/telewebion] Fix extraction (by Grub4K)
- [ie/imgur] Overhaul extractor (by bashonly, Grub4K)
- [ie/EpidemicSound] Add extractor (by Grub4K)

Authored by: bashonly, dirkf, Grub4K

Co-authored-by: bashonly <bashonly@protonmail.com>
2023-12-26 01:40:24 +01:00
15 changed files with 626 additions and 175 deletions

View File

@ -38,18 +38,14 @@ jobs:
os: [ubuntu-latest]
# CPython 3.8 is in quick-test
python-version: ['3.9', '3.10', '3.11', '3.12', pypy-3.8, pypy-3.10]
run-tests-ext: [sh]
include:
# atleast one of each CPython/PyPy tests must be in windows
- os: windows-latest
python-version: '3.8'
run-tests-ext: bat
- os: windows-latest
python-version: '3.12'
run-tests-ext: bat
- os: windows-latest
python-version: pypy-3.9
run-tests-ext: bat
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
@ -62,4 +58,4 @@ jobs:
continue-on-error: False
run: |
python3 -m yt_dlp -v || true # Print debug head
./devscripts/run_tests.${{ matrix.run-tests-ext }} core
python3 ./devscripts/run_tests.py core

View File

@ -18,7 +18,7 @@ jobs:
run: pip install pytest -r requirements.txt
- name: Run tests
continue-on-error: true
run: ./devscripts/run_tests.sh download
run: python3 ./devscripts/run_tests.py download
full:
name: Full Download Tests
@ -29,15 +29,12 @@ jobs:
matrix:
os: [ubuntu-latest]
python-version: ['3.10', '3.11', '3.12', pypy-3.8, pypy-3.10]
run-tests-ext: [sh]
include:
# atleast one of each CPython/PyPy tests must be in windows
- os: windows-latest
python-version: '3.8'
run-tests-ext: bat
- os: windows-latest
python-version: pypy-3.9
run-tests-ext: bat
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
@ -48,4 +45,4 @@ jobs:
run: pip install pytest -r requirements.txt
- name: Run tests
continue-on-error: true
run: ./devscripts/run_tests.${{ matrix.run-tests-ext }} download
run: python3 ./devscripts/run_tests.py download

View File

@ -19,7 +19,7 @@ jobs:
- name: Run tests
run: |
python3 -m yt_dlp -v || true
./devscripts/run_tests.sh core
python3 ./devscripts/run_tests.py core
flake8:
name: Linter
if: "!contains(github.event.head_commit.message, 'ci skip all')"

View File

@ -140,12 +140,9 @@ To run yt-dlp as a developer, you don't need to build anything either. Simply ex
python -m yt_dlp
To run the test, simply invoke your favorite test runner, or execute a test file directly; any of the following work:
To run all the available core tests, use:
python -m unittest discover
python test/test_download.py
nosetests
pytest
python devscripts/run_tests.py
See item 6 of [new extractor tutorial](#adding-support-for-a-new-site) for how to run extractor specific test cases.
@ -187,15 +184,21 @@ After you have ensured this site is distributing its content legally, you can fo
'url': 'https://yourextractor.com/watch/42',
'md5': 'TODO: md5 sum of the first 10241 bytes of the video file (use --test)',
'info_dict': {
# For videos, only the 'id' and 'ext' fields are required to RUN the test:
'id': '42',
'ext': 'mp4',
'title': 'Video title goes here',
'thumbnail': r're:^https?://.*\.jpg$',
# TODO more properties, either as:
# * A value
# * MD5 checksum; start the string with md5:
# * A regular expression; start the string with re:
# * Any Python type, e.g. int or float
# Then if the test run fails, it will output the missing/incorrect fields.
# Properties can be added as:
# * A value, e.g.
# 'title': 'Video title goes here',
# * MD5 checksum; start the string with 'md5:', e.g.
# 'description': 'md5:098f6bcd4621d373cade4e832627b4f6',
# * A regular expression; start the string with 're:', e.g.
# 'thumbnail': r're:^https?://.*\.jpg$',
# * A count of elements in a list; start the string with 'count:', e.g.
# 'tags': 'count:10',
# * Any Python type, e.g.
# 'view_count': int,
}
}]
@ -215,8 +218,8 @@ After you have ensured this site is distributing its content legally, you can fo
}
```
1. Add an import in [`yt_dlp/extractor/_extractors.py`](yt_dlp/extractor/_extractors.py). Note that the class name must end with `IE`.
1. Run `python test/test_download.py TestDownload.test_YourExtractor` (note that `YourExtractor` doesn't end with `IE`). This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, the tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. You can also run all the tests in one go with `TestDownload.test_YourExtractor_all`
1. Make sure you have atleast one test for your extractor. Even if all videos covered by the extractor are expected to be inaccessible for automated testing, tests should still be added with a `skip` parameter indicating why the particular test is disabled from running.
1. Run `python devscripts/run_tests.py YourExtractor`. This *may fail* at first, but you can continually re-run it until you're done. Upon failure, it will output the missing fields and/or correct values which you can copy. If you decide to add more than one test, the tests will then be named `YourExtractor`, `YourExtractor_1`, `YourExtractor_2`, etc. Note that tests with an `only_matching` key in the test's dict are not included in the count. You can also run all the tests in one go with `YourExtractor_all`
1. Make sure you have at least one test for your extractor. Even if all videos covered by the extractor are expected to be inaccessible for automated testing, tests should still be added with a `skip` parameter indicating why the particular test is disabled from running.
1. Have a look at [`yt_dlp/extractor/common.py`](yt_dlp/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](yt_dlp/extractor/common.py#L119-L440). Add tests and code for as many as you want.
1. Make sure your code follows [yt-dlp coding conventions](#yt-dlp-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):

View File

@ -29,6 +29,7 @@ You can also find lists of all [contributors of yt-dlp](CONTRIBUTORS) and [autho
[![gh-sponsor](https://img.shields.io/badge/_-Github-white.svg?logo=github&labelColor=555555&style=for-the-badge)](https://github.com/sponsors/coletdjnz)
* Improved plugin architecture
* Rewrote the networking infrastructure, implemented support for `requests`
* YouTube improvements including: age-gate bypass, private playlists, multiple-clients (to avoid throttling) and a lot of under-the-hood improvements
* Added support for new websites YoutubeWebArchive, MainStreaming, PRX, nzherald, Mediaklikk, StarTV etc
* Improved/fixed support for Patreon, panopto, gfycat, itv, pbs, SouthParkDE etc
@ -46,16 +47,17 @@ You can also find lists of all [contributors of yt-dlp](CONTRIBUTORS) and [autho
## [bashonly](https://github.com/bashonly)
* `--update-to`, automated release, nightly builds
* `--cookies-from-browser` support for Firefox containers
* Added support for new websites Genius, Kick, NBCStations, Triller, VideoKen etc
* Improved/fixed support for Anvato, Brightcove, Instagram, ParamountPlus, Reddit, SlidesLive, TikTok, Twitter, Vimeo etc
* `--update-to`, self-updater rewrite, automated/nightly/master releases
* `--cookies-from-browser` support for Firefox containers, external downloader cookie handling overhaul
* Added support for new websites like Dacast, Kick, NBCStations, Triller, VideoKen, Weverse, WrestleUniverse etc
* Improved/fixed support for Anvato, Brightcove, Reddit, SlidesLive, TikTok, Twitter, Vimeo etc
## [Grub4K](https://github.com/Grub4K)
[![ko-fi](https://img.shields.io/badge/_-Ko--fi-red.svg?logo=kofi&labelColor=555555&style=for-the-badge)](https://ko-fi.com/Grub4K) [![gh-sponsor](https://img.shields.io/badge/_-Github-white.svg?logo=github&labelColor=555555&style=for-the-badge)](https://github.com/sponsors/Grub4K)
[![gh-sponsor](https://img.shields.io/badge/_-Github-white.svg?logo=github&labelColor=555555&style=for-the-badge)](https://github.com/sponsors/Grub4K) [![ko-fi](https://img.shields.io/badge/_-Ko--fi-red.svg?logo=kofi&labelColor=555555&style=for-the-badge)](https://ko-fi.com/Grub4K)
* `--update-to`, automated release, nightly builds
* Rework internals like `traverse_obj`, various core refactors and bugs fixes
* Helped fix crunchyroll, Twitter, wrestleuniverse, wistia, slideslive etc
* `--update-to`, self-updater rewrite, automated/nightly/master releases
* Reworked internals like `traverse_obj`, various core refactors and bugs fixes
* Implemented proper progress reporting for parallel downloads
* Improved/fixed/added Bundestag, crunchyroll, pr0gramm, Twitter, WrestleUniverse etc

View File

@ -1,17 +1,4 @@
@setlocal
@echo off
cd /d %~dp0..
if ["%~1"]==[""] (
set "test_set="test""
) else if ["%~1"]==["core"] (
set "test_set="-m not download""
) else if ["%~1"]==["download"] (
set "test_set="-m "download""
) else (
echo.Invalid test type "%~1". Use "core" ^| "download"
exit /b 1
)
set PYTHONWARNINGS=error
pytest %test_set%
>&2 echo run_tests.bat is deprecated. Please use `devscripts/run_tests.py` instead
python %~dp0run_tests.py %~1

71
devscripts/run_tests.py Executable file
View File

@ -0,0 +1,71 @@
#!/usr/bin/env python3
import argparse
import functools
import os
import re
import subprocess
import sys
from pathlib import Path
fix_test_name = functools.partial(re.compile(r'IE(_all|_\d+)?$').sub, r'\1')
def parse_args():
parser = argparse.ArgumentParser(description='Run selected yt-dlp tests')
parser.add_argument(
'test', help='a extractor tests, or one of "core" or "download"', nargs='*')
parser.add_argument(
'-k', help='run a test matching EXPRESSION. Same as "pytest -k"', metavar='EXPRESSION')
return parser.parse_args()
def run_tests(*tests, pattern=None, ci=False):
run_core = 'core' in tests or (not pattern and not tests)
run_download = 'download' in tests
tests = list(map(fix_test_name, tests))
arguments = ['pytest', '-Werror', '--tb=short']
if ci:
arguments.append('--color=yes')
if run_core:
arguments.extend(['-m', 'not download'])
elif run_download:
arguments.extend(['-m', 'download'])
elif pattern:
arguments.extend(['-k', pattern])
else:
arguments.extend(
f'test/test_download.py::TestDownload::test_{test}' for test in tests)
print(f'Running {arguments}', flush=True)
try:
return subprocess.call(arguments)
except FileNotFoundError:
pass
arguments = [sys.executable, '-Werror', '-m', 'unittest']
if run_core:
print('"pytest" needs to be installed to run core tests', file=sys.stderr, flush=True)
return 1
elif run_download:
arguments.append('test.test_download')
elif pattern:
arguments.extend(['-k', pattern])
else:
arguments.extend(
f'test.test_download.TestDownload.test_{test}' for test in tests)
print(f'Running {arguments}', flush=True)
return subprocess.call(arguments)
if __name__ == '__main__':
try:
args = parse_args()
os.chdir(Path(__file__).parent.parent)
sys.exit(run_tests(*args.test, pattern=args.k, ci=bool(os.getenv('CI'))))
except KeyboardInterrupt:
pass

View File

@ -1,14 +1,4 @@
#!/usr/bin/env sh
if [ -z "$1" ]; then
test_set='test'
elif [ "$1" = 'core' ]; then
test_set="-m not download"
elif [ "$1" = 'download' ]; then
test_set="-m download"
else
echo 'Invalid test type "'"$1"'". Use "core" | "download"'
exit 1
fi
python3 -bb -Werror -m pytest "$test_set"
>&2 echo 'run_tests.sh is deprecated. Please use `devscripts/run_tests.py` instead'
python3 devscripts/run_tests.py "$1"

View File

@ -1 +1 @@
@py -bb -Werror -Xdev "%~dp0yt_dlp\__main__.py" %*
@py -Werror -Xdev "%~dp0yt_dlp\__main__.py" %*

View File

@ -1,2 +1,2 @@
#!/usr/bin/env sh
exec "${PYTHON:-python3}" -bb -Werror -Xdev "$(dirname "$(realpath "$0")")/yt_dlp/__main__.py" "$@"
exec "${PYTHON:-python3}" -Werror -Xdev "$(dirname "$(realpath "$0")")/yt_dlp/__main__.py" "$@"

View File

@ -548,6 +548,7 @@ from .epicon import (
EpiconIE,
EpiconSeriesIE,
)
from .epidemicsound import EpidemicSoundIE
from .eplus import EplusIbIE
from .epoch import EpochIE
from .eporner import EpornerIE

View File

@ -0,0 +1,107 @@
from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
orderedSet,
parse_iso8601,
parse_qs,
parse_resolution,
str_or_none,
traverse_obj,
url_or_none,
)
class EpidemicSoundIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?epidemicsound\.com/track/(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'https://www.epidemicsound.com/track/yFfQVRpSPz/',
'md5': 'd98ff2ddb49e8acab9716541cbc9dfac',
'info_dict': {
'id': '45014',
'display_id': 'yFfQVRpSPz',
'ext': 'mp3',
'title': 'Door Knock Door 1',
'alt_title': 'Door Knock Door 1',
'tags': ['foley', 'door', 'knock', 'glass', 'window', 'glass door knock'],
'categories': ['Misc. Door'],
'duration': 1,
'thumbnail': 'https://cdn.epidemicsound.com/curation-assets/commercial-release-cover-images/default-sfx/3000x3000.jpg',
'timestamp': 1415320353,
'upload_date': '20141107',
},
}, {
'url': 'https://www.epidemicsound.com/track/mj8GTTwsZd/',
'md5': 'c82b745890f9baf18dc2f8d568ee3830',
'info_dict': {
'id': '148700',
'display_id': 'mj8GTTwsZd',
'ext': 'mp3',
'title': 'Noplace',
'tags': ['liquid drum n bass', 'energetic'],
'categories': ['drum and bass'],
'duration': 237,
'timestamp': 1694426482,
'thumbnail': 'https://cdn.epidemicsound.com/curation-assets/commercial-release-cover-images/11138/3000x3000.jpg',
'upload_date': '20230911',
'release_timestamp': 1700535606,
'release_date': '20231121',
},
}]
@staticmethod
def _epidemic_parse_thumbnail(url: str):
if not url_or_none(url):
return None
return {
'url': url,
**(traverse_obj(url, ({parse_qs}, {
'width': ('width', 0, {int_or_none}),
'height': ('height', 0, {int_or_none}),
})) or parse_resolution(url)),
}
@staticmethod
def _epidemic_fmt_or_none(f):
if not f.get('format'):
f['format'] = f.get('format_id')
elif not f.get('format_id'):
f['format_id'] = f['format']
if not f['url'] or not f['format']:
return None
if f.get('format_note'):
f['format_note'] = f'track ID {f["format_note"]}'
if f['format'] != 'full':
f['preference'] = -2
return f
def _real_extract(self, url):
video_id = self._match_id(url)
json_data = self._download_json(f'https://www.epidemicsound.com/json/track/{video_id}', video_id)
thumbnails = traverse_obj(json_data, [('imageUrl', 'cover')])
thumb_base_url = traverse_obj(json_data, ('coverArt', 'baseUrl', {url_or_none}))
if thumb_base_url:
thumbnails.extend(traverse_obj(json_data, (
'coverArt', 'sizes', ..., {thumb_base_url.__add__})))
return traverse_obj(json_data, {
'id': ('id', {str_or_none}),
'display_id': ('publicSlug', {str}),
'title': ('title', {str}),
'alt_title': ('oldTitle', {str}),
'duration': ('length', {float_or_none}),
'timestamp': ('added', {parse_iso8601}),
'release_timestamp': ('releaseDate', {parse_iso8601}),
'categories': ('genres', ..., 'tag', {str}),
'tags': ('metadataTags', ..., {str}),
'age_limit': ('isExplicit', {lambda b: 18 if b else None}),
'thumbnails': ({lambda _: thumbnails}, {orderedSet}, ..., {self._epidemic_parse_thumbnail}),
'formats': ('stems', {dict.items}, ..., {
'format': (0, {str_or_none}),
'format_note': (1, 's3TrackId', {str_or_none}),
'format_id': (1, 'stemType', {str}),
'url': (1, 'lqMp3Url', {url_or_none}),
}, {self._epidemic_fmt_or_none}),
})

View File

@ -1,99 +1,243 @@
import functools
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
determine_ext,
float_or_none,
int_or_none,
js_to_json,
mimetype2ext,
ExtractorError,
parse_iso8601,
str_or_none,
strip_or_none,
traverse_obj,
url_or_none,
)
class ImgurIE(InfoExtractor):
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?!(?:a|gallery|(?:t(?:opic)?|r)/[^/]+)/)(?P<id>[a-zA-Z0-9]+)'
class ImgurBaseIE(InfoExtractor):
_CLIENT_ID = '546c25a59c58ad7'
@classmethod
def _imgur_result(cls, item_id):
return cls.url_result(f'https://imgur.com/{item_id}', ImgurIE, item_id)
def _call_api(self, endpoint, video_id, **kwargs):
return self._download_json(
f'https://api.imgur.com/post/v1/{endpoint}/{video_id}?client_id={self._CLIENT_ID}&include=media,account',
video_id, **kwargs)
@staticmethod
def get_description(s):
if 'Discover the magic of the internet at Imgur' in s:
return None
return s or None
class ImgurIE(ImgurBaseIE):
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?!(?:a|gallery|t|topic|r)/)(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'https://i.imgur.com/A61SaA1.gifv',
'url': 'https://imgur.com/A61SaA1',
'info_dict': {
'id': 'A61SaA1',
'ext': 'mp4',
'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
'title': 'MRW gifv is up and running without any bugs',
'timestamp': 1416446068,
'upload_date': '20141120',
'dislike_count': int,
'comment_count': int,
'release_timestamp': 1416446068,
'release_date': '20141120',
'like_count': int,
'thumbnail': 'https://i.imgur.com/A61SaA1h.jpg',
},
}, {
'url': 'https://imgur.com/A61SaA1',
'url': 'https://i.imgur.com/A61SaA1.gifv',
'only_matching': True,
}, {
'url': 'https://i.imgur.com/crGpqCV.mp4',
'only_matching': True,
}, {
# no title
'url': 'https://i.imgur.com/jxBXAMC.gifv',
'only_matching': True,
'info_dict': {
'id': 'jxBXAMC',
'ext': 'mp4',
'title': 'Fahaka puffer feeding',
'timestamp': 1533835503,
'upload_date': '20180809',
'release_date': '20180809',
'like_count': int,
'duration': 30.0,
'comment_count': int,
'release_timestamp': 1533835503,
'thumbnail': 'https://i.imgur.com/jxBXAMCh.jpg',
'dislike_count': int,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._call_api('media', video_id)
if not traverse_obj(data, ('media', 0, (
('type', {lambda t: t == 'video' or None}),
('metadata', 'is_animated'))), get_all=False):
raise ExtractorError(f'{video_id} is not a video or animated image', expected=True)
webpage = self._download_webpage(
'https://i.imgur.com/{id}.gifv'.format(id=video_id), video_id)
f'https://i.imgur.com/{video_id}.gifv', video_id, fatal=False) or ''
formats = []
width = int_or_none(self._og_search_property(
'video:width', webpage, default=None))
height = int_or_none(self._og_search_property(
'video:height', webpage, default=None))
media_fmt = traverse_obj(data, ('media', 0, {
'url': ('url', {url_or_none}),
'ext': ('ext', {str}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
'filesize': ('size', {int_or_none}),
'acodec': ('metadata', 'has_sound', {lambda b: None if b else 'none'}),
}))
media_url = media_fmt.get('url')
if media_url:
if not media_fmt.get('ext'):
media_fmt['ext'] = mimetype2ext(traverse_obj(
data, ('media', 0, 'mime_type'))) or determine_ext(media_url)
if traverse_obj(data, ('media', 0, 'type')) == 'image':
media_fmt['acodec'] = 'none'
media_fmt.setdefault('preference', -10)
formats.append(media_fmt)
video_elements = self._search_regex(
r'(?s)<div class="video-elements">(.*?)</div>',
webpage, 'video elements', default=None)
if not video_elements:
raise ExtractorError(
'No sources found for video %s. Maybe an image?' % video_id,
expected=True)
formats = []
for m in re.finditer(r'<source\s+src="(?P<src>[^"]+)"\s+type="(?P<type>[^"]+)"', video_elements):
formats.append({
'format_id': m.group('type').partition('/')[2],
'url': self._proto_relative_url(m.group('src')),
'ext': mimetype2ext(m.group('type')),
'width': width,
'height': height,
'http_headers': {
'User-Agent': 'yt-dlp (like wget)',
},
})
if video_elements:
def og_get_size(media_type):
return {
p: int_or_none(self._og_search_property(f'{media_type}:{p}', webpage, default=None))
for p in ('width', 'height')
}
gif_json = self._search_regex(
r'(?s)var\s+videoItem\s*=\s*(\{.*?\})',
webpage, 'GIF code', fatal=False)
if gif_json:
gifd = self._parse_json(
gif_json, video_id, transform_source=js_to_json)
formats.append({
'format_id': 'gif',
'preference': -10, # gifs are worse than videos
'width': width,
'height': height,
'ext': 'gif',
'acodec': 'none',
'vcodec': 'gif',
'container': 'gif',
'url': self._proto_relative_url(gifd['gifUrl']),
'filesize': gifd.get('size'),
'http_headers': {
'User-Agent': 'yt-dlp (like wget)',
},
size = og_get_size('video')
if not any(size.values()):
size = og_get_size('image')
formats = traverse_obj(
re.finditer(r'<source\s+src="(?P<src>[^"]+)"\s+type="(?P<type>[^"]+)"', video_elements),
(..., {
'format_id': ('type', {lambda s: s.partition('/')[2]}),
'url': ('src', {self._proto_relative_url}),
'ext': ('type', {mimetype2ext}),
}))
for f in formats:
f.update(size)
# We can get the original gif format from the webpage as well
gif_json = traverse_obj(self._search_json(
r'var\s+videoItem\s*=', webpage, 'GIF info', video_id,
transform_source=js_to_json, fatal=False), {
'url': ('gifUrl', {self._proto_relative_url}),
'filesize': ('size', {int_or_none}),
})
if gif_json:
gif_json.update(size)
gif_json.update({
'format_id': 'gif',
'preference': -10, # gifs < videos
'ext': 'gif',
'acodec': 'none',
'vcodec': 'gif',
'container': 'gif',
})
formats.append(gif_json)
search = functools.partial(self._html_search_meta, html=webpage, default=None)
twitter_fmt = {
'format_id': 'twitter',
'url': url_or_none(search('twitter:player:stream')),
'ext': mimetype2ext(search('twitter:player:stream:content_type')),
'width': int_or_none(search('twitter:width')),
'height': int_or_none(search('twitter:height')),
}
if twitter_fmt['url']:
formats.append(twitter_fmt)
if not formats:
self.raise_no_formats(
f'No sources found for video {video_id}. Maybe a plain image?', expected=True)
self._remove_duplicate_formats(formats)
return {
'title': self._og_search_title(webpage, default=None),
'description': self.get_description(self._og_search_description(webpage, default='')),
**traverse_obj(data, {
'uploader_id': ('account_id', {lambda a: str(a) if int_or_none(a) else None}),
'uploader': ('account', 'username', {lambda x: strip_or_none(x) or None}),
'uploader_url': ('account', 'avatar_url', {url_or_none}),
'like_count': ('upvote_count', {int_or_none}),
'dislike_count': ('downvote_count', {int_or_none}),
'comment_count': ('comment_count', {int_or_none}),
'age_limit': ('is_mature', {lambda x: 18 if x else None}),
'timestamp': (('updated_at', 'created_at'), {parse_iso8601}),
'release_timestamp': ('created_at', {parse_iso8601}),
}, get_all=False),
**traverse_obj(data, ('media', 0, 'metadata', {
'title': ('title', {lambda x: strip_or_none(x) or None}),
'description': ('description', {self.get_description}),
'duration': ('duration', {float_or_none}),
'timestamp': (('updated_at', 'created_at'), {parse_iso8601}),
'release_timestamp': ('created_at', {parse_iso8601}),
}), get_all=False),
'id': video_id,
'formats': formats,
'title': self._og_search_title(webpage, default=video_id),
'thumbnail': url_or_none(search('thumbnailUrl')),
}
class ImgurGalleryIE(InfoExtractor):
class ImgurGalleryBaseIE(ImgurBaseIE):
_GALLERY = True
def _real_extract(self, url):
gallery_id = self._match_id(url)
data = self._call_api('albums', gallery_id, fatal=False, expected_status=404)
info = traverse_obj(data, {
'title': ('title', {lambda x: strip_or_none(x) or None}),
'description': ('description', {self.get_description}),
})
if traverse_obj(data, 'is_album'):
def yield_media_ids():
for m_id in traverse_obj(data, (
'media', lambda _, v: v.get('type') == 'video' or v['metadata']['is_animated'],
'id', {lambda x: str_or_none(x) or None})):
yield m_id
# if a gallery with exactly one video, apply album metadata to video
media_id = (
self._GALLERY
and traverse_obj(data, ('image_count', {lambda c: c == 1}))
and next(yield_media_ids(), None))
if not media_id:
result = self.playlist_result(
map(self._imgur_result, yield_media_ids()), gallery_id)
result.update(info)
return result
gallery_id = media_id
result = self._imgur_result(gallery_id)
info['_type'] = 'url_transparent'
result.update(info)
return result
class ImgurGalleryIE(ImgurGalleryBaseIE):
IE_NAME = 'imgur:gallery'
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:gallery|(?:t(?:opic)?|r)/[^/]+)/(?P<id>[a-zA-Z0-9]+)'
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:gallery|(?:t(?:opic)?|r)/[^/?#]+)/(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'http://imgur.com/gallery/Q95ko',
@ -102,49 +246,121 @@ class ImgurGalleryIE(InfoExtractor):
'title': 'Adding faces make every GIF better',
},
'playlist_count': 25,
'skip': 'Zoinks! You\'ve taken a wrong turn.',
}, {
# TODO: static images - replace with animated/video gallery
'url': 'http://imgur.com/topic/Aww/ll5Vk',
'only_matching': True,
}, {
'url': 'https://imgur.com/gallery/YcAQlkx',
'add_ies': ['Imgur'],
'info_dict': {
'id': 'YcAQlkx',
'ext': 'mp4',
'title': 'Classic Steve Carell gif...cracks me up everytime....damn the repost downvotes....',
}
'timestamp': 1358554297,
'upload_date': '20130119',
'uploader_id': '1648642',
'uploader': 'wittyusernamehere',
'release_timestamp': 1358554297,
'thumbnail': 'https://i.imgur.com/YcAQlkxh.jpg',
'release_date': '20130119',
'uploader_url': 'https://i.imgur.com/u3R4I2S_d.png?maxwidth=290&fidelity=grand',
'comment_count': int,
'dislike_count': int,
'like_count': int,
},
}, {
# TODO: static image - replace with animated/video gallery
'url': 'http://imgur.com/topic/Funny/N8rOudd',
'only_matching': True,
}, {
'url': 'http://imgur.com/r/aww/VQcQPhM',
'only_matching': True,
'add_ies': ['Imgur'],
'info_dict': {
'id': 'VQcQPhM',
'ext': 'mp4',
'title': 'The boss is here',
'timestamp': 1476494751,
'upload_date': '20161015',
'uploader_id': '19138530',
'uploader': 'thematrixcam',
'comment_count': int,
'dislike_count': int,
'uploader_url': 'https://i.imgur.com/qCjr5Pi_d.png?maxwidth=290&fidelity=grand',
'release_timestamp': 1476494751,
'like_count': int,
'release_date': '20161015',
'thumbnail': 'https://i.imgur.com/VQcQPhMh.jpg',
},
},
# from https://github.com/ytdl-org/youtube-dl/pull/16674
{
'url': 'https://imgur.com/t/unmuted/6lAn9VQ',
'info_dict': {
'id': '6lAn9VQ',
'title': 'Penguins !',
},
'playlist_count': 3,
}, {
'url': 'https://imgur.com/t/unmuted/kx2uD3C',
'add_ies': ['Imgur'],
'info_dict': {
'id': 'ZVMv45i',
'ext': 'mp4',
'title': 'Intruder',
'timestamp': 1528129683,
'upload_date': '20180604',
'release_timestamp': 1528129683,
'release_date': '20180604',
'like_count': int,
'dislike_count': int,
'comment_count': int,
'duration': 30.03,
'thumbnail': 'https://i.imgur.com/ZVMv45ih.jpg',
},
}, {
'url': 'https://imgur.com/t/unmuted/wXSK0YH',
'add_ies': ['Imgur'],
'info_dict': {
'id': 'JCAP4io',
'ext': 'mp4',
'title': 're:I got the blues$',
'description': 'Lukas vocal stylings.\n\nFP edit: dont encourage me. Ill never stop posting Luka and friends.',
'timestamp': 1527809525,
'upload_date': '20180531',
'like_count': int,
'dislike_count': int,
'duration': 30.03,
'comment_count': int,
'release_timestamp': 1527809525,
'thumbnail': 'https://i.imgur.com/JCAP4ioh.jpg',
'release_date': '20180531',
},
}]
def _real_extract(self, url):
gallery_id = self._match_id(url)
data = self._download_json(
'https://imgur.com/gallery/%s.json' % gallery_id,
gallery_id)['data']['image']
if data.get('is_album'):
entries = [
self.url_result('http://imgur.com/%s' % image['hash'], ImgurIE.ie_key(), image['hash'])
for image in data['album_images']['images'] if image.get('hash')]
return self.playlist_result(entries, gallery_id, data.get('title'), data.get('description'))
return self.url_result('http://imgur.com/%s' % gallery_id, ImgurIE.ie_key(), gallery_id)
class ImgurAlbumIE(ImgurGalleryIE): # XXX: Do not subclass from concrete IE
class ImgurAlbumIE(ImgurGalleryBaseIE):
IE_NAME = 'imgur:album'
_VALID_URL = r'https?://(?:i\.)?imgur\.com/a/(?P<id>[a-zA-Z0-9]+)'
_GALLERY = False
_TESTS = [{
# TODO: only static images - replace with animated/video gallery
'url': 'http://imgur.com/a/j6Orj',
'only_matching': True,
},
# from https://github.com/ytdl-org/youtube-dl/pull/21693
{
'url': 'https://imgur.com/a/iX265HX',
'info_dict': {
'id': 'j6Orj',
'title': 'A Literary Analysis of "Star Wars: The Force Awakens"',
'id': 'iX265HX',
'title': 'enen-no-shouboutai'
},
'playlist_count': 12,
'playlist_count': 2,
}, {
'url': 'https://imgur.com/a/8pih2Ed',
'info_dict': {
'id': '8pih2Ed'
},
'playlist_mincount': 1,
}]

View File

@ -1,52 +1,133 @@
from __future__ import annotations
import json
from functools import partial
from textwrap import dedent
from .common import InfoExtractor
from ..utils import ExtractorError, format_field, int_or_none, parse_iso8601
from ..utils.traversal import traverse_obj
def _fmt_url(url):
return partial(format_field, template=url, default=None)
class TelewebionIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?telewebion\.com/#!/episode/(?P<id>\d+)'
_TEST = {
'url': 'http://www.telewebion.com/#!/episode/1263668/',
_VALID_URL = r'https?://(?:www\.)?telewebion\.com/episode/(?P<id>(?:0x[a-fA-F\d]+|\d+))'
_TESTS = [{
'url': 'http://www.telewebion.com/episode/0x1b3139c/',
'info_dict': {
'id': '1263668',
'id': '0x1b3139c',
'ext': 'mp4',
'title': 'قرعه\u200cکشی لیگ قهرمانان اروپا',
'thumbnail': r're:^https?://.*\.jpg',
'title': 'قرعه‌کشی لیگ قهرمانان اروپا',
'series': '+ فوتبال',
'series_id': '0x1b2505c',
'channel': 'شبکه 3',
'channel_id': '0x1b1a761',
'channel_url': 'https://telewebion.com/live/tv3',
'timestamp': 1425522414,
'upload_date': '20150305',
'release_timestamp': 1425517020,
'release_date': '20150305',
'duration': 420,
'view_count': int,
'tags': ['ورزشی', 'لیگ اروپا', 'اروپا'],
'thumbnail': 'https://static.telewebion.com/episodeImages/YjFhM2MxMDBkMDNiZTU0MjE5YjQ3ZDY0Mjk1ZDE0ZmUwZWU3OTE3OWRmMDAyODNhNzNkNjdmMWMzMWIyM2NmMA/default',
},
'params': {
# m3u8 download
'skip_download': True,
'skip_download': 'm3u8',
}, {
'url': 'https://telewebion.com/episode/162175536',
'info_dict': {
'id': '0x9aa9a30',
'ext': 'mp4',
'title': 'کارما یعنی این !',
'series': 'پاورقی',
'series_id': '0x29a7426',
'channel': 'شبکه 2',
'channel_id': '0x1b1a719',
'channel_url': 'https://telewebion.com/live/tv2',
'timestamp': 1699979968,
'upload_date': '20231114',
'release_timestamp': 1699991638,
'release_date': '20231114',
'duration': 78,
'view_count': int,
'tags': ['کلیپ های منتخب', ' کلیپ طنز ', ' کلیپ سیاست ', 'پاورقی', 'ویژه فلسطین'],
'thumbnail': 'https://static.telewebion.com/episodeImages/871e9455-7567-49a5-9648-34c22c197f5f/default',
},
}
'skip_download': 'm3u8',
}]
def _call_graphql_api(
self, operation, video_id, query,
variables: dict[str, tuple[str, str]] | None = None,
note='Downloading GraphQL JSON metadata',
):
parameters = ''
if variables:
parameters = ', '.join(f'${name}: {type_}' for name, (type_, _) in variables.items())
parameters = f'({parameters})'
result = self._download_json('https://graph.telewebion.com/graphql', video_id, note, data=json.dumps({
'operationName': operation,
'query': f'query {operation}{parameters} @cacheControl(maxAge: 60) {{{query}\n}}\n',
'variables': {name: value for name, (_, value) in (variables or {}).items()}
}, separators=(',', ':')).encode(), headers={
'Content-Type': 'application/json',
'Accept': 'application/json',
})
if not result or traverse_obj(result, 'errors'):
message = ', '.join(traverse_obj(result, ('errors', ..., 'message', {str})))
raise ExtractorError(message or 'Unknown GraphQL API error')
return result['data']
def _real_extract(self, url):
video_id = self._match_id(url)
if not video_id.startswith('0x'):
video_id = hex(int(video_id))
secure_token = self._download_webpage(
'http://m.s2.telewebion.com/op/op?action=getSecurityToken', video_id)
episode_details = self._download_json(
'http://m.s2.telewebion.com/op/op', video_id,
query={'action': 'getEpisodeDetails', 'episode_id': video_id})
episode_data = self._call_graphql_api('getEpisodeDetail', video_id, dedent('''
queryEpisode(filter: {EpisodeID: $EpisodeId}, first: 1) {
title
program {
ProgramID
title
}
image
view_count
duration
started_at
created_at
channel {
ChannelID
name
descriptor
}
tags {
name
}
}
'''), {'EpisodeId': ('[ID!]', video_id)})
m3u8_url = 'http://m.s1.telewebion.com/smil/%s.m3u8?filepath=%s&m3u8=1&secure_token=%s' % (
video_id, episode_details['file_path'], secure_token)
formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', m3u8_id='hls')
picture_paths = [
episode_details.get('picture_path'),
episode_details.get('large_picture_path'),
]
thumbnails = [{
'url': picture_path,
'preference': idx,
} for idx, picture_path in enumerate(picture_paths) if picture_path is not None]
return {
'id': video_id,
'title': episode_details['title'],
'formats': formats,
'thumbnails': thumbnails,
'view_count': episode_details.get('view_count'),
}
info_dict = traverse_obj(episode_data, ('queryEpisode', 0, {
'title': ('title', {str}),
'view_count': ('view_count', {int_or_none}),
'duration': ('duration', {int_or_none}),
'tags': ('tags', ..., 'name', {str}),
'release_timestamp': ('started_at', {parse_iso8601}),
'timestamp': ('created_at', {parse_iso8601}),
'series': ('program', 'title', {str}),
'series_id': ('program', 'ProgramID', {str}),
'channel': ('channel', 'name', {str}),
'channel_id': ('channel', 'ChannelID', {str}),
'channel_url': ('channel', 'descriptor', {_fmt_url('https://telewebion.com/live/%s')}),
'thumbnail': ('image', {_fmt_url('https://static.telewebion.com/episodeImages/%s/default')}),
'formats': (
'channel', 'descriptor', {str},
{_fmt_url(f'https://cdna.telewebion.com/%s/episode/{video_id}/playlist.m3u8')},
{partial(self._extract_m3u8_formats, video_id=video_id, ext='mp4', m3u8_id='hls')}),
}))
info_dict['id'] = video_id
return info_dict

View File

@ -636,7 +636,7 @@ def sanitize_filename(s, restricted=False, is_id=NO_DEFAULT):
elif char in '\\/|*<>':
return '\0_'
if restricted and (char in '!&\'()[]{}$;`^,#' or char.isspace() or ord(char) > 127):
return '\0_'
return '' if unicodedata.category(char)[0] in 'CM' else '\0_'
return char
# Replace look-alike Unicode glyphs