Compare commits

...

19 Commits

Author SHA1 Message Date
pukkandan
e9ce4e9250
[extractor/foxnews] Add FoxNewsVideo extractor
Closes #5133
2022-11-07 03:00:01 +05:30
pukkandan
5da08bde9e
[extractor/vlive] Extract release_timestamp
Closes #5424
2022-11-07 02:49:03 +05:30
pukkandan
ff48fc04d0
[update] Use error code 100 for update errors
This error code was previously used for
"Exiting to finish update", but is no longer used

Closes #5198
2022-11-07 02:40:36 +05:30
pukkandan
46d09f8707
[cleanup] Lint and misc cleanup 2022-11-07 02:32:36 +05:30
pukkandan
db4678e448
Update to ytdl-commit-de39d128
[extractor/ceskatelevize] Back-port extractor from yt-dlp
de39d1281c

Closes #5361, Closes #4634, Closes #5210
2022-11-07 02:18:30 +05:30
zulaport
a349d4d641
[extractor/stripchat] Fix hostname for HLS stream (#5445)
Closes #5227 
Authored by: zulaport
2022-11-07 02:09:09 +05:30
Matthew
ac8e69dd32
Do not backport Python 3.10 SSL configuration for LibreSSL (#5464)
Until further investigation.

Fixes regression in 5b9f253fa0

Authored by: coletdjnz
2022-11-06 20:30:55 +00:00
bashonly
96b9e9cf62
[extractor/telegram] Add playlist support and more metadata (#5358)
Authored by: bashonly, bsun0000
2022-11-06 19:05:09 +00:00
Jeff Huffman
cb1553e966
[extractor/crunchyroll] Beta is now the only layout (#5294)
Closes #5292
Authored by: tejing1
2022-11-07 00:18:55 +05:30
Alex Karabanov
0d2a0ecac3
[extractor/listennotes] Add extractor (#5310)
Closes #5262
Authored by: lksj, pukkandan
2022-11-07 00:00:59 +05:30
changren-wcr
c94df4d19d
[extractor/qingting] Add extractor (#5329)
Closes #5323
Authored by: changren-wcr, bashonly
2022-11-06 23:41:53 +05:30
lauren
728f4b5c2e
[extractor/tvp] Update extractors (#5346)
Closes #5328
Authored by: selfisekai
2022-11-06 23:40:06 +05:30
Kevin Wood
8c188d5d09
[extractor/redgifs] Refresh auth token for 401 (#5352)
Closes #5351
Authored by: endotronic, pukkandan
2022-11-06 23:15:45 +05:30
Bruno Guerreiro
e14ea7fbd9
[extractor/youtube] Update piped instances (#5441)
Closes #5286
Authored by: Generator
2022-11-06 23:12:23 +05:30
Richard Gibson
7053aa3a48
[extractor/epoch] Support videos without data-trailer (#5387)
Closes #5359
Authored by: gibson042, pukkandan
2022-11-06 22:53:16 +05:30
HobbyistDev
049565df2e
[extractor/swearnet] Add extractor (#5371)
Authored by: HobbyistDev
2022-11-06 22:41:33 +05:30
CrankDatSouljaBoy
cc1d3bf96b
[extractor/deuxm] Add extractors (#5388)
Authored by: CrankDatSouljaBoy
2022-11-06 22:21:15 +05:30
Matthew
5b9f253fa0
Backport SSL configuration from Python 3.10 (#5437)
Partial fix for https://github.com/yt-dlp/yt-dlp/pull/5294#issuecomment-1289363572, https://github.com/yt-dlp/yt-dlp/issues/4627

Authored by: coletdjnz
2022-11-06 22:07:23 +05:30
nixxo
d715b0e413
[extractor/skyit] Fix extractors (#5442)
Closes #5392
Authored by: nixxo
2022-11-06 21:51:12 +05:30
37 changed files with 1184 additions and 1116 deletions

View File

@ -12,7 +12,7 @@
[![License: Unlicense](https://img.shields.io/badge/-Unlicense-blue.svg?style=for-the-badge)](LICENSE "License")
[![CI Status](https://img.shields.io/github/workflow/status/yt-dlp/yt-dlp/Core%20Tests/master?label=Tests&style=for-the-badge)](https://github.com/yt-dlp/yt-dlp/actions "CI Status")
[![Commits](https://img.shields.io/github/commit-activity/m/yt-dlp/yt-dlp?label=commits&style=for-the-badge)](https://github.com/yt-dlp/yt-dlp/commits "Commit History")
[![Last Commit](https://img.shields.io/github/last-commit/yt-dlp/yt-dlp/master?label=&style=for-the-badge)](https://github.com/yt-dlp/yt-dlp/commits "Commit History")
[![Last Commit](https://img.shields.io/github/last-commit/yt-dlp/yt-dlp/master?label=&style=for-the-badge&display_timestamp=committer)](https://github.com/yt-dlp/yt-dlp/commits "Commit History")
</div>
<!-- MANPAGE: END EXCLUDED SECTION -->
@ -1642,9 +1642,9 @@ # MODIFYING METADATA
`--replace-in-metadata FIELDS REGEX REPLACE` is used to replace text in any metadata field using [python regular expression](https://docs.python.org/3/library/re.html#regular-expression-syntax). [Backreferences](https://docs.python.org/3/library/re.html?highlight=backreferences#re.sub) can be used in the replace string for advanced use.
The general syntax of `--parse-metadata FROM:TO` is to give the name of a field or an [output template](#output-template) to extract data from, and the format to interpret it as, separated by a colon `:`. Either a [python regular expression](https://docs.python.org/3/library/re.html#regular-expression-syntax) with named capture groups or a similar syntax to the [output template](#output-template) (only `%(field)s` formatting is supported) can be used for `TO`. The option can be used multiple times to parse and modify various fields.
The general syntax of `--parse-metadata FROM:TO` is to give the name of a field or an [output template](#output-template) to extract data from, and the format to interpret it as, separated by a colon `:`. Either a [python regular expression](https://docs.python.org/3/library/re.html#regular-expression-syntax) with named capture groups, a single field name, or a similar syntax to the [output template](#output-template) (only `%(field)s` formatting is supported) can be used for `TO`. The option can be used multiple times to parse and modify various fields.
Note that any field created by this can be used in the [output template](#output-template) and will also affect the media file's metadata added when using `--embed-metadata`.
Note that these options preserve their relative order, allowing replacements to be made in parsed fields and viceversa. Also, any field thus created can be used in the [output template](#output-template) and will also affect the media file's metadata added when using `--embed-metadata`.
This option also has a few special uses:
@ -1733,11 +1733,7 @@ #### funimation
* `language`: Audio languages to extract, e.g. `funimation:language=english,japanese`
* `version`: The video version to extract - `uncut` or `simulcast`
#### crunchyroll
* `language`: Audio languages to extract, e.g. `crunchyroll:language=jaJp`
* `hardsub`: Which hard-sub versions to extract, e.g. `crunchyroll:hardsub=None,enUS`
#### crunchyrollbeta
#### crunchyrollbeta (Crunchyroll)
* `format`: Which stream type(s) to extract (default: `adaptive_hls`). Potentially useful values include `adaptive_hls`, `adaptive_dash`, `vo_adaptive_hls`, `vo_adaptive_dash`, `download_hls`, `download_dash`, `multitrack_adaptive_hls_v2`
* `hardsub`: Preference order for which hardsub versions to extract, or `all` (default: `None` = no hardsubs), e.g. `crunchyrollbeta:hardsub=en-US,None`

View File

@ -23,7 +23,7 @@ # Supported sites
- **9now.com.au**
- **abc.net.au**
- **abc.net.au:iview**
- **abc.net.au:iview:showseries**
- **abc.net.au:iview:showseries**
- **abcnews**
- **abcnews:video**
- **abcotvs**: ABC Owned Television Stations
@ -124,8 +124,8 @@ # Supported sites
- **bbc**: [<abbr title="netrc machine"><em>bbc</em></abbr>] BBC
- **bbc.co.uk**: [<abbr title="netrc machine"><em>bbc</em></abbr>] BBC iPlayer
- **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:episodes**
- **bbc.co.uk:iplayer:group**
- **bbc.co.uk:iplayer:episodes**
- **bbc.co.uk:iplayer:group**
- **bbc.co.uk:playlist**
- **BBVTV**: [<abbr title="netrc machine"><em>bbvtv</em></abbr>]
- **BBVTVLive**: [<abbr title="netrc machine"><em>bbvtv</em></abbr>]
@ -274,7 +274,7 @@ # Supported sites
- **crunchyroll**: [<abbr title="netrc machine"><em>crunchyroll</em></abbr>]
- **crunchyroll:beta**: [<abbr title="netrc machine"><em>crunchyroll</em></abbr>]
- **crunchyroll:playlist**: [<abbr title="netrc machine"><em>crunchyroll</em></abbr>]
- **crunchyroll:playlist:beta**: [<abbr title="netrc machine"><em>crunchyroll</em></abbr>]
- **crunchyroll:playlist:beta**: [<abbr title="netrc machine"><em>crunchyroll</em></abbr>]
- **CSpan**: C-SPAN
- **CSpanCongress**
- **CtsNews**: 華視新聞
@ -483,7 +483,7 @@ # Supported sites
- **Golem**
- **goodgame:stream**
- **google:podcasts**
- **google:podcasts:feed**
- **google:podcasts:feed**
- **GoogleDrive**
- **GoogleDrive:Folder**
- **GoPlay**: [<abbr title="netrc machine"><em>goplay</em></abbr>]
@ -618,7 +618,7 @@ # Supported sites
- **kuwo:singer**: 酷我音乐 - 歌手
- **kuwo:song**: 酷我音乐
- **la7.it**
- **la7.it:pod:episode**
- **la7.it:pod:episode**
- **la7.it:podcast**
- **laola1tv**
- **laola1tv:embed**
@ -652,7 +652,7 @@ # Supported sites
- **LineLiveChannel**
- **LinkedIn**: [<abbr title="netrc machine"><em>linkedin</em></abbr>]
- **linkedin:learning**: [<abbr title="netrc machine"><em>linkedin</em></abbr>]
- **linkedin:learning:course**: [<abbr title="netrc machine"><em>linkedin</em></abbr>]
- **linkedin:learning:course**: [<abbr title="netrc machine"><em>linkedin</em></abbr>]
- **LinuxAcademy**: [<abbr title="netrc machine"><em>linuxacademy</em></abbr>]
- **Liputan6**
- **LiTV**
@ -673,7 +673,7 @@ # Supported sites
- **MagentaMusik360**
- **mailru**: Видео@Mail.Ru
- **mailru:music**: Музыка@Mail.Ru
- **mailru:music:search**: Музыка@Mail.Ru
- **mailru:music:search**: Музыка@Mail.Ru
- **MainStreaming**: MainStreaming Player
- **MallTV**
- **mangomolo:live**
@ -718,7 +718,7 @@ # Supported sites
- **microsoftstream**: Microsoft Stream
- **mildom**: Record ongoing live by specific user in Mildom
- **mildom:clip**: Clip in Mildom
- **mildom:user:vod**: Download all VODs from specific user in Mildom
- **mildom:user:vod**: Download all VODs from specific user in Mildom
- **mildom:vod**: VOD in Mildom
- **minds**
- **minds:channel**
@ -803,7 +803,7 @@ # Supported sites
- **navernow**
- **NBA**
- **nba:watch**
- **nba:watch:collection**
- **nba:watch:collection**
- **NBAChannel**
- **NBAEmbed**
- **NBAWatchEmbed**
@ -817,7 +817,7 @@ # Supported sites
- **NBCStations**
- **ndr**: NDR.de - Norddeutscher Rundfunk
- **ndr:embed**
- **ndr:embed:base**
- **ndr:embed:base**
- **NDTV**
- **Nebula**: [<abbr title="netrc machine"><em>watchnebula</em></abbr>]
- **nebula:channel**: [<abbr title="netrc machine"><em>watchnebula</em></abbr>]
@ -869,7 +869,7 @@ # Supported sites
- **niconico:tag**: NicoNico video tag URLs
- **NiconicoUser**
- **nicovideo:search**: Nico video search; "nicosearch:" prefix
- **nicovideo:search:date**: Nico video search, newest first; "nicosearchdate:" prefix
- **nicovideo:search:date**: Nico video search, newest first; "nicosearchdate:" prefix
- **nicovideo:search_url**: Nico video search URLs
- **Nintendo**
- **Nitter**
@ -892,7 +892,7 @@ # Supported sites
- **npo**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **npo.nl:live**
- **npo.nl:radio**
- **npo.nl:radio:fragment**
- **npo.nl:radio:fragment**
- **Npr**
- **NRK**
- **NRKPlaylist**
@ -933,7 +933,7 @@ # Supported sites
- **openrec:capture**
- **openrec:movie**
- **OraTV**
- **orf:fm4:story**: fm4.orf.at stories
- **orf:fm4:story**: fm4.orf.at stories
- **orf:iptv**: iptv.ORF.at
- **orf:radio**
- **orf:tvthek**: ORF TVthek
@ -981,7 +981,7 @@ # Supported sites
- **Pinterest**
- **PinterestCollection**
- **pixiv:sketch**
- **pixiv:sketch:user**
- **pixiv:sketch:user**
- **Pladform**
- **PlanetMarathi**
- **Platzi**: [<abbr title="netrc machine"><em>platzi</em></abbr>]
@ -1010,7 +1010,7 @@ # Supported sites
- **polskieradio:kierowcow**
- **polskieradio:player**
- **polskieradio:podcast**
- **polskieradio:podcast:list**
- **polskieradio:podcast:list**
- **PolskieRadioCategory**
- **Popcorntimes**
- **PopcornTV**
@ -1122,7 +1122,7 @@ # Supported sites
- **rtl.nl**: rtl.nl and rtlxl.nl
- **rtl2**
- **rtl2:you**
- **rtl2:you:series**
- **rtl2:you:series**
- **RTLLuLive**
- **RTLLuRadio**
- **RTNews**
@ -1198,9 +1198,9 @@ # Supported sites
- **Skeb**
- **sky.it**
- **sky:news**
- **sky:news:story**
- **sky:news:story**
- **sky:sports**
- **sky:sports:news**
- **sky:sports:news**
- **skyacademy.it**
- **SkylineWebcams**
- **skynewsarabia:article**
@ -1289,7 +1289,7 @@ # Supported sites
- **Teachable**: [<abbr title="netrc machine"><em>teachable</em></abbr>]
- **TeachableCourse**: [<abbr title="netrc machine"><em>teachable</em></abbr>]
- **teachertube**: teachertube.com videos
- **teachertube:user:collection**: teachertube.com user and collection videos
- **teachertube:user:collection**: teachertube.com user and collection videos
- **TeachingChannel**
- **Teamcoco**
- **TeamTreeHouse**: [<abbr title="netrc machine"><em>teamtreehouse</em></abbr>]
@ -1614,12 +1614,12 @@ # Supported sites
- **XXXYMovies**
- **Yahoo**: Yahoo screen and movies
- **yahoo:gyao**
- **yahoo:gyao:player**
- **yahoo:gyao:player**
- **yahoo:japannews**: Yahoo! Japan News
- **YandexDisk**
- **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:artist:albums**: Яндекс.Музыка - Артист - Альбомы
- **yandexmusic:artist:tracks**: Яндекс.Музыка - Артист - Треки
- **yandexmusic:artist:albums**: Яндекс.Музыка - Артист - Альбомы
- **yandexmusic:artist:tracks**: Яндекс.Музыка - Артист - Треки
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
- **yandexmusic:track**: Яндекс.Музыка - Трек
- **YandexVideo**
@ -1641,14 +1641,14 @@ # Supported sites
- **youtube:clip**
- **youtube:favorites**: YouTube liked videos; ":ytfav" keyword (requires cookies)
- **youtube:history**: Youtube watch history; ":ythis" keyword (requires cookies)
- **youtube:music:search_url**: YouTube music search URLs with selectable sections, e.g. #songs
- **youtube:music:search_url**: YouTube music search URLs with selectable sections, e.g. #songs
- **youtube:notif**: YouTube notifications; ":ytnotif" keyword (requires cookies)
- **youtube:playlist**: YouTube playlists
- **youtube:recommended**: YouTube recommended videos; ":ytrec" keyword
- **youtube:search**: YouTube search; "ytsearch:" prefix
- **youtube:search:date**: YouTube search, newest videos first; "ytsearchdate:" prefix
- **youtube:search:date**: YouTube search, newest videos first; "ytsearchdate:" prefix
- **youtube:search_url**: YouTube search URLs with sorting and filter support
- **youtube:shorts:pivot:audio**: YouTube Shorts audio pivot (Shorts using audio of a given video)
- **youtube:shorts:pivot:audio**: YouTube Shorts audio pivot (Shorts using audio of a given video)
- **youtube:stories**: YouTube channel stories; "ytstories:" prefix
- **youtube:subscriptions**: YouTube subscriptions feed; ":ytsubs" keyword (requires cookies)
- **youtube:tab**: YouTube Tabs

View File

@ -260,8 +260,8 @@ def _repr(v):
info_dict_str += ''.join(
f' {_repr(k)}: {_repr(test_info_dict[k])},\n'
for k in missing_keys)
write_string(
'\n\'info_dict\': {\n' + info_dict_str + '},\n', out=sys.stderr)
info_dict_str = '\n\'info_dict\': {\n' + info_dict_str + '},\n'
write_string(info_dict_str.replace('\n', '\n '), out=sys.stderr)
self.assertFalse(
missing_keys,
'Missing keys in test definition: %s' % (

View File

@ -11,7 +11,6 @@
import base64
from yt_dlp.aes import (
BLOCK_SIZE_BYTES,
aes_cbc_decrypt,
aes_cbc_decrypt_bytes,
aes_cbc_encrypt,
@ -103,8 +102,7 @@ def test_decrypt_text(self):
def test_ecb_encrypt(self):
data = bytes_to_intlist(self.secret_msg)
data += [0x08] * (BLOCK_SIZE_BYTES - len(data) % BLOCK_SIZE_BYTES)
encrypted = intlist_to_bytes(aes_ecb_encrypt(data, self.key, self.iv))
encrypted = intlist_to_bytes(aes_ecb_encrypt(data, self.key))
self.assertEqual(
encrypted,
b'\xaa\x86]\x81\x97>\x02\x92\x9d\x1bR[[L/u\xd3&\xd1(h\xde{\x81\x94\xba\x02\xae\xbd\xa6\xd0:')

View File

@ -28,11 +28,23 @@ def aes_cbc_encrypt_bytes(data, key, iv, **kwargs):
return intlist_to_bytes(aes_cbc_encrypt(*map(bytes_to_intlist, (data, key, iv)), **kwargs))
BLOCK_SIZE_BYTES = 16
def unpad_pkcs7(data):
return data[:-compat_ord(data[-1])]
BLOCK_SIZE_BYTES = 16
def pkcs7_padding(data):
"""
PKCS#7 padding
@param {int[]} data cleartext
@returns {int[]} padding data
"""
remaining_length = BLOCK_SIZE_BYTES - len(data) % BLOCK_SIZE_BYTES
return data + [remaining_length] * remaining_length
def pad_block(block, padding_mode):
@ -64,7 +76,7 @@ def pad_block(block, padding_mode):
def aes_ecb_encrypt(data, key, iv=None):
"""
Encrypt with aes in ECB mode
Encrypt with aes in ECB mode. Using PKCS#7 padding
@param {int[]} data cleartext
@param {int[]} key 16/24/32-Byte cipher key
@ -77,8 +89,7 @@ def aes_ecb_encrypt(data, key, iv=None):
encrypted_data = []
for i in range(block_count):
block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES]
encrypted_data += aes_encrypt(block, expanded_key)
encrypted_data = encrypted_data[:len(data)]
encrypted_data += aes_encrypt(pkcs7_padding(block), expanded_key)
return encrypted_data
@ -551,5 +562,6 @@ def ghash(subkey, data):
'key_expansion',
'pad_block',
'pkcs7_padding',
'unpad_pkcs7',
]

View File

@ -14,7 +14,7 @@
# HTMLParseError has been deprecated in Python 3.3 and removed in
# Python 3.5. Introducing dummy exception for Python >3.5 for compatible
# and uniform cross-version exception handling
class compat_HTMLParseError(Exception):
class compat_HTMLParseError(ValueError):
pass

View File

@ -48,6 +48,7 @@ def compat_setenv(key, value, env=os.environ):
compat_basestring = str
compat_casefold = str.casefold
compat_chr = chr
compat_collections_abc = collections.abc
compat_cookiejar = http.cookiejar

View File

@ -372,8 +372,6 @@
CrowdBunkerChannelIE,
)
from .crunchyroll import (
CrunchyrollIE,
CrunchyrollShowPlaylistIE,
CrunchyrollBetaIE,
CrunchyrollBetaShowIE,
)
@ -470,6 +468,10 @@
)
from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE
from .deuxm import (
DeuxMIE,
DeuxMNewsIE
)
from .digitalconcerthall import DigitalConcertHallIE
from .discovery import DiscoveryIE
from .disney import DisneyIE
@ -586,6 +588,7 @@
from .foxnews import (
FoxNewsIE,
FoxNewsArticleIE,
FoxNewsVideoIE,
)
from .foxsports import FoxSportsIE
from .fptplay import FptplayIE
@ -908,6 +911,7 @@
)
from .linuxacademy import LinuxAcademyIE
from .liputan6 import Liputan6IE
from .listennotes import ListenNotesIE
from .litv import LiTVIE
from .livejournal import LiveJournalIE
from .livestream import (
@ -1427,6 +1431,7 @@
)
from .puls4 import Puls4IE
from .pyvideo import PyvideoIE
from .qingting import QingTingIE
from .qqmusic import (
QQMusicIE,
QQMusicSingerIE,
@ -1640,7 +1645,6 @@
SkyItVideoIE,
SkyItVideoLiveIE,
SkyItIE,
SkyItAcademyIE,
SkyItArteIE,
CieloTVItIE,
TV8ItIE,
@ -1760,6 +1764,7 @@
SVTPlayIE,
SVTSeriesIE,
)
from .swearnet import SwearnetEpisodeIE
from .swrmediathek import SWRMediathekIE
from .syvdk import SYVDKIE
from .syfy import SyfyIE
@ -1960,7 +1965,8 @@
TVPEmbedIE,
TVPIE,
TVPStreamIE,
TVPWebsiteIE,
TVPVODSeriesIE,
TVPVODVideoIE,
)
from .tvplay import (
TVPlayIE,

View File

@ -161,7 +161,7 @@ class AcFunBangumiIE(AcFunVideoBaseIE):
def _real_extract(self, url):
video_id = self._match_id(url)
ac_idx = parse_qs(url).get('ac', [None])[-1]
video_id = f'{video_id}{format_field(ac_idx, template="__%s")}'
video_id = f'{video_id}{format_field(ac_idx, None, "__%s")}'
webpage = self._download_webpage(url, video_id)
json_bangumi_data = self._search_json(r'window.bangumiData\s*=', webpage, 'bangumiData', video_id)

View File

@ -28,30 +28,34 @@
class ADNIE(InfoExtractor):
IE_DESC = 'Anime Digital Network'
_VALID_URL = r'https?://(?:www\.)?animedigitalnetwork\.fr/video/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://animedigitalnetwork.fr/video/blue-exorcist-kyoto-saga/7778-episode-1-debut-des-hostilites',
'md5': '0319c99885ff5547565cacb4f3f9348d',
IE_DESC = 'Animation Digital Network'
_VALID_URL = r'https?://(?:www\.)?(?:animation|anime)digitalnetwork\.fr/video/[^/]+/(?P<id>\d+)'
_TESTS = [{
'url': 'https://animationdigitalnetwork.fr/video/fruits-basket/9841-episode-1-a-ce-soir',
'md5': '1c9ef066ceb302c86f80c2b371615261',
'info_dict': {
'id': '7778',
'id': '9841',
'ext': 'mp4',
'title': 'Blue Exorcist - Kyôto Saga - Episode 1',
'description': 'md5:2f7b5aa76edbc1a7a92cedcda8a528d5',
'series': 'Blue Exorcist - Kyôto Saga',
'duration': 1467,
'release_date': '20170106',
'title': 'Fruits Basket - Episode 1',
'description': 'md5:14be2f72c3c96809b0ca424b0097d336',
'series': 'Fruits Basket',
'duration': 1437,
'release_date': '20190405',
'comment_count': int,
'average_rating': float,
'season_number': 2,
'episode': 'Début des hostilités',
'season_number': 1,
'episode': 'À ce soir !',
'episode_number': 1,
}
}
},
'skip': 'Only available in region (FR, ...)',
}, {
'url': 'http://animedigitalnetwork.fr/video/blue-exorcist-kyoto-saga/7778-episode-1-debut-des-hostilites',
'only_matching': True,
}]
_NETRC_MACHINE = 'animedigitalnetwork'
_BASE_URL = 'http://animedigitalnetwork.fr'
_API_BASE_URL = 'https://gw.api.animedigitalnetwork.fr/'
_NETRC_MACHINE = 'animationdigitalnetwork'
_BASE = 'animationdigitalnetwork.fr'
_API_BASE_URL = 'https://gw.api.' + _BASE + '/'
_PLAYER_BASE_URL = _API_BASE_URL + 'player/'
_HEADERS = {}
_LOGIN_ERR_MESSAGE = 'Unable to log in'
@ -75,11 +79,11 @@ def _get_subtitles(self, sub_url, video_id):
if subtitle_location:
enc_subtitles = self._download_webpage(
subtitle_location, video_id, 'Downloading subtitles data',
fatal=False, headers={'Origin': 'https://animedigitalnetwork.fr'})
fatal=False, headers={'Origin': 'https://' + self._BASE})
if not enc_subtitles:
return None
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
# http://animationdigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
dec_subtitles = unpad_pkcs7(aes_cbc_decrypt_bytes(
compat_b64decode(enc_subtitles[24:]),
binascii.unhexlify(self._K + '7fac1178830cfe0c'),

View File

@ -368,7 +368,7 @@ def _real_extract(self, url):
or '正在观看预览,大会员免费看全片' in webpage):
self.raise_login_required('This video is for premium members only')
play_info = self._search_json(r'window\.__playinfo__\s*=\s*', webpage, 'play info', video_id)['data']
play_info = self._search_json(r'window\.__playinfo__\s*=', webpage, 'play info', video_id)['data']
formats = self.extract_formats(play_info)
if (not formats and '成为大会员抢先看' in webpage
and play_info.get('durl') and not play_info.get('dash')):

View File

@ -9,6 +9,7 @@
ExtractorError,
float_or_none,
sanitized_Request,
str_or_none,
traverse_obj,
urlencode_postdata,
USER_AGENTS,
@ -16,13 +17,13 @@
class CeskaTelevizeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ceskatelevize\.cz/(?:ivysilani|porady)/(?:[^/?#&]+/)*(?P<id>[^/#?]+)'
_VALID_URL = r'https?://(?:www\.)?ceskatelevize\.cz/(?:ivysilani|porady|zive)/(?:[^/?#&]+/)*(?P<id>[^/#?]+)'
_TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/10441294653-hyde-park-civilizace/215411058090502/bonus/20641-bonus-01-en',
'info_dict': {
'id': '61924494877028507',
'ext': 'mp4',
'title': 'Hyde Park Civilizace: Bonus 01 - En',
'title': 'Bonus 01 - En - Hyde Park Civilizace',
'description': 'English Subtittles',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 81.3,
@ -33,18 +34,29 @@ class CeskaTelevizeIE(InfoExtractor):
},
}, {
# live stream
'url': 'http://www.ceskatelevize.cz/ivysilani/zive/ct4/',
'url': 'http://www.ceskatelevize.cz/zive/ct1/',
'info_dict': {
'id': 402,
'id': '102',
'ext': 'mp4',
'title': r're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'title': r'ČT1 - živé vysílání online',
'description': 'Sledujte živé vysílání kanálu ČT1 online. Vybírat si můžete i z dalších kanálů České televize na kterémkoli z vašich zařízení.',
'is_live': True,
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Georestricted to Czech Republic',
}, {
# another
'url': 'http://www.ceskatelevize.cz/ivysilani/zive/ct4/',
'only_matching': True,
'info_dict': {
'id': 402,
'ext': 'mp4',
'title': r're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'is_live': True,
},
# 'skip': 'Georestricted to Czech Republic',
}, {
'url': 'http://www.ceskatelevize.cz/ivysilani/embed/iFramePlayer.php?hash=d6a3e1370d2e4fa76296b90bad4dfc19673b641e&IDEC=217 562 22150/0004&channelID=1&width=100%25',
'only_matching': True,
@ -53,21 +65,21 @@ class CeskaTelevizeIE(InfoExtractor):
'url': 'http://www.ceskatelevize.cz/porady/10520528904-queer/215562210900007-bogotart/',
'info_dict': {
'id': '215562210900007-bogotart',
'title': 'Queer: Bogotart',
'description': 'Hlavní město Kolumbie v doprovodu queer umělců. Vroucí svět plný vášně, sebevědomí, ale i násilí a bolesti. Připravil Peter Serge Butko',
'title': 'Bogotart - Queer',
'description': 'Hlavní město Kolumbie v doprovodu queer umělců. Vroucí svět plný vášně, sebevědomí, ale i násilí a bolesti',
},
'playlist': [{
'info_dict': {
'id': '61924494877311053',
'ext': 'mp4',
'title': 'Queer: Bogotart (Varování 18+)',
'title': 'Bogotart - Queer (Varování 18+)',
'duration': 11.9,
},
}, {
'info_dict': {
'id': '61924494877068022',
'ext': 'mp4',
'title': 'Queer: Bogotart (Queer)',
'title': 'Bogotart - Queer (Queer)',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 1558.3,
},
@ -84,28 +96,42 @@ class CeskaTelevizeIE(InfoExtractor):
def _real_extract(self, url):
playlist_id = self._match_id(url)
parsed_url = compat_urllib_parse_urlparse(url)
webpage = self._download_webpage(url, playlist_id)
site_name = self._og_search_property('site_name', webpage, fatal=False, default=None)
webpage, urlh = self._download_webpage_handle(url, playlist_id)
parsed_url = compat_urllib_parse_urlparse(urlh.geturl())
site_name = self._og_search_property('site_name', webpage, fatal=False, default='Česká televize')
playlist_title = self._og_search_title(webpage, default=None)
if site_name and playlist_title:
playlist_title = playlist_title.replace(f'{site_name}', '', 1)
playlist_title = re.split(r'\s*[—|]\s*%s' % (site_name, ), playlist_title, 1)[0]
playlist_description = self._og_search_description(webpage, default=None)
if playlist_description:
playlist_description = playlist_description.replace('\xa0', ' ')
if parsed_url.path.startswith('/porady/'):
type_ = 'IDEC'
if re.search(r'(^/porady|/zive)/', parsed_url.path):
next_data = self._search_nextjs_data(webpage, playlist_id)
idec = traverse_obj(next_data, ('props', 'pageProps', 'data', ('show', 'mediaMeta'), 'idec'), get_all=False)
if '/zive/' in parsed_url.path:
idec = traverse_obj(next_data, ('props', 'pageProps', 'data', 'liveBroadcast', 'current', 'idec'), get_all=False)
else:
idec = traverse_obj(next_data, ('props', 'pageProps', 'data', ('show', 'mediaMeta'), 'idec'), get_all=False)
if not idec:
idec = traverse_obj(next_data, ('props', 'pageProps', 'data', 'videobonusDetail', 'bonusId'), get_all=False)
if idec:
type_ = 'bonus'
if not idec:
raise ExtractorError('Failed to find IDEC id')
iframe_hash = self._download_webpage('https://www.ceskatelevize.cz/v-api/iframe-hash/', playlist_id)
webpage = self._download_webpage('https://www.ceskatelevize.cz/ivysilani/embed/iFramePlayer.php', playlist_id,
query={'hash': iframe_hash, 'origin': 'iVysilani', 'autoStart': 'true', 'IDEC': idec})
iframe_hash = self._download_webpage(
'https://www.ceskatelevize.cz/v-api/iframe-hash/',
playlist_id, note='Getting IFRAME hash')
query = {'hash': iframe_hash, 'origin': 'iVysilani', 'autoStart': 'true', type_: idec, }
webpage = self._download_webpage(
'https://www.ceskatelevize.cz/ivysilani/embed/iFramePlayer.php',
playlist_id, note='Downloading player', query=query)
NOT_AVAILABLE_STRING = 'This content is not available at your territory due to limited copyright.'
if '%s</p>' % NOT_AVAILABLE_STRING in webpage:
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
self.raise_geo_restricted(NOT_AVAILABLE_STRING)
if any(not_found in webpage for not_found in ('Neplatný parametr pro videopřehrávač', 'IDEC nebyl nalezen', )):
raise ExtractorError('no video with IDEC available', video_id=idec, expected=True)
type_ = None
episode_id = None
@ -174,7 +200,6 @@ def _real_extract(self, url):
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item.get('streamUrls', {}).items():
stream_url = stream_url.replace('https://', 'http://')
if 'playerType=flash' in stream_url:
stream_formats = self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4', 'm3u8_native',
@ -196,7 +221,7 @@ def _real_extract(self, url):
entries[num]['formats'].extend(formats)
continue
item_id = item.get('id') or item['assetId']
item_id = str_or_none(item.get('id') or item['assetId'])
title = item['title']
duration = float_or_none(item.get('duration'))
@ -227,6 +252,8 @@ def _real_extract(self, url):
for e in entries:
self._sort_formats(e['formats'])
if len(entries) == 1:
return entries[0]
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
def _get_subtitles(self, episode_id, subs):

View File

@ -3725,7 +3725,8 @@ def description(cls, *, markdown=True, search_examples=None):
if not cls.working():
desc += ' (**Currently broken**)' if markdown else ' (Currently broken)'
name = f' - **{cls.IE_NAME}**' if markdown else cls.IE_NAME
# Escape emojis. Ref: https://github.com/github/markup/issues/1153
name = (' - **%s**' % re.sub(r':(\w+:)', ':\u200B\\g<1>', cls.IE_NAME)) if markdown else cls.IE_NAME
return f'{name}:{desc}' if desc else name
def extract_subtitles(self, *args, **kwargs):

View File

@ -1,40 +1,16 @@
import base64
import json
import re
import urllib.request
import xml.etree.ElementTree
import zlib
from hashlib import sha1
from math import floor, pow, sqrt
import urllib.parse
from .common import InfoExtractor
from .vrv import VRVBaseIE
from ..aes import aes_cbc_decrypt
from ..compat import (
compat_b64decode,
compat_etree_fromstring,
compat_str,
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import (
ExtractorError,
bytes_to_intlist,
extract_attributes,
float_or_none,
format_field,
int_or_none,
intlist_to_bytes,
join_nonempty,
lowercase_escape,
merge_dicts,
parse_iso8601,
qualities,
remove_end,
sanitized_Request,
traverse_obj,
try_get,
xpath_text,
)
@ -42,16 +18,7 @@ class CrunchyrollBaseIE(InfoExtractor):
_LOGIN_URL = 'https://www.crunchyroll.com/welcome/login'
_API_BASE = 'https://api.crunchyroll.com'
_NETRC_MACHINE = 'crunchyroll'
def _call_rpc_api(self, method, video_id, note=None, data=None):
data = data or {}
data['req'] = 'RpcApi' + method
data = compat_urllib_parse_urlencode(data).encode('utf-8')
return self._download_xml(
'https://www.crunchyroll.com/xml/',
video_id, note, fatal=False, data=data, headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
params = None
def _perform_login(self, username, password):
if self._get_cookies(self._LOGIN_URL).get('etp_rt'):
@ -72,7 +39,7 @@ def _perform_login(self, username, password):
login_response = self._download_json(
f'{self._API_BASE}/login.1.json', None, 'Logging in',
data=compat_urllib_parse_urlencode({
data=urllib.parse.urlencode({
'account': username,
'password': password,
'session_id': session_id
@ -82,652 +49,23 @@ def _perform_login(self, username, password):
if not self._get_cookies(self._LOGIN_URL).get('etp_rt'):
raise ExtractorError('Login succeeded but did not set etp_rt cookie')
# Beta-specific, but needed for redirects
def _get_beta_embedded_json(self, webpage, display_id):
def _get_embedded_json(self, webpage, display_id):
initial_state = self._parse_json(self._search_regex(
r'__INITIAL_STATE__\s*=\s*({.+?})\s*;', webpage, 'initial state'), display_id)
app_config = self._parse_json(self._search_regex(
r'__APP_CONFIG__\s*=\s*({.+?})\s*;', webpage, 'app config'), display_id)
return initial_state, app_config
def _redirect_to_beta(self, webpage, iekey, video_id):
if not self._get_cookies(self._LOGIN_URL).get('etp_rt'):
raise ExtractorError('Received a beta page from non-beta url when not logged in.')
initial_state, app_config = self._get_beta_embedded_json(webpage, video_id)
url = app_config['baseSiteUrl'] + initial_state['router']['locations']['current']['pathname']
self.to_screen(f'{video_id}: Redirected to beta site - {url}')
return self.url_result(f'{url}', iekey, video_id)
@staticmethod
def _add_skip_wall(url):
parsed_url = compat_urlparse.urlparse(url)
qs = compat_urlparse.parse_qs(parsed_url.query)
# Always force skip_wall to bypass maturity wall, namely 18+ confirmation message:
# > This content may be inappropriate for some people.
# > Are you sure you want to continue?
# since it's not disabled by default in crunchyroll account's settings.
# See https://github.com/ytdl-org/youtube-dl/issues/7202.
qs['skip_wall'] = ['1']
return compat_urlparse.urlunparse(
parsed_url._replace(query=compat_urllib_parse_urlencode(qs, True)))
class CrunchyrollIE(CrunchyrollBaseIE, VRVBaseIE):
IE_NAME = 'crunchyroll'
_VALID_URL = r'''(?x)
https?://(?:(?P<prefix>www|m)\.)?(?P<url>
crunchyroll\.(?:com|fr)/(?:
media(?:-|/\?id=)|
(?!series/|watch/)(?:[^/]+/){1,2}[^/?&#]*?
)(?P<id>[0-9]+)
)(?:[/?&#]|$)'''
_TESTS = [{
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
'info_dict': {
'id': '645513',
'ext': 'mp4',
'title': 'Wanna be the Strongest in the World Episode 1 An Idol-Wrestler is Born!',
'description': 'md5:2d17137920c64f2f49981a7797d275ef',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Yomiuri Telecasting Corporation (YTV)',
'upload_date': '20131013',
'url': 're:(?!.*&amp)',
},
'params': {
# rtmp
'skip_download': True,
},
'skip': 'Video gone',
}, {
'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1',
'info_dict': {
'id': '589804',
'ext': 'flv',
'title': 'Culture Japan Episode 1 Rebuilding Japan after the 3.11',
'description': 'md5:2fbc01f90b87e8e9137296f37b461c12',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Danny Choo Network',
'upload_date': '20120213',
},
'params': {
# rtmp
'skip_download': True,
},
'skip': 'Video gone',
}, {
'url': 'http://www.crunchyroll.com/rezero-starting-life-in-another-world-/episode-5-the-morning-of-our-promise-is-still-distant-702409',
'info_dict': {
'id': '702409',
'ext': 'mp4',
'title': compat_str,
'description': compat_str,
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Re:Zero Partners',
'timestamp': 1462098900,
'upload_date': '20160501',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://www.crunchyroll.com/konosuba-gods-blessing-on-this-wonderful-world/episode-1-give-me-deliverance-from-this-judicial-injustice-727589',
'info_dict': {
'id': '727589',
'ext': 'mp4',
'title': compat_str,
'description': compat_str,
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Kadokawa Pictures Inc.',
'timestamp': 1484130900,
'upload_date': '20170111',
'series': compat_str,
'season': "KONOSUBA -God's blessing on this wonderful world! 2",
'season_number': 2,
'episode': 'Give Me Deliverance From This Judicial Injustice!',
'episode_number': 1,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://www.crunchyroll.fr/girl-friend-beta/episode-11-goodbye-la-mode-661697',
'only_matching': True,
}, {
# geo-restricted (US), 18+ maturity wall, non-premium available
'url': 'http://www.crunchyroll.com/cosplay-complex-ova/episode-1-the-birth-of-the-cosplay-club-565617',
'only_matching': True,
}, {
# A description with double quotes
'url': 'http://www.crunchyroll.com/11eyes/episode-1-piros-jszaka-red-night-535080',
'info_dict': {
'id': '535080',
'ext': 'mp4',
'title': compat_str,
'description': compat_str,
'uploader': 'Marvelous AQL Inc.',
'timestamp': 1255512600,
'upload_date': '20091014',
},
'params': {
# Just test metadata extraction
'skip_download': True,
},
}, {
# make sure we can extract an uploader name that's not a link
'url': 'http://www.crunchyroll.com/hakuoki-reimeiroku/episode-1-dawn-of-the-divine-warriors-606899',
'info_dict': {
'id': '606899',
'ext': 'mp4',
'title': 'Hakuoki Reimeiroku Episode 1 Dawn of the Divine Warriors',
'description': 'Ryunosuke was left to die, but Serizawa-san asked him a simple question "Do you want to live?"',
'uploader': 'Geneon Entertainment',
'upload_date': '20120717',
},
'params': {
# just test metadata extraction
'skip_download': True,
},
'skip': 'Video gone',
}, {
# A video with a vastly different season name compared to the series name
'url': 'http://www.crunchyroll.com/nyarko-san-another-crawling-chaos/episode-1-test-590532',
'info_dict': {
'id': '590532',
'ext': 'mp4',
'title': compat_str,
'description': compat_str,
'uploader': 'TV TOKYO',
'timestamp': 1330956000,
'upload_date': '20120305',
'series': 'Nyarko-san: Another Crawling Chaos',
'season': 'Haiyoru! Nyaruani (ONA)',
},
'params': {
# Just test metadata extraction
'skip_download': True,
},
}, {
'url': 'http://www.crunchyroll.com/media-723735',
'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/en-gb/mob-psycho-100/episode-2-urban-legends-encountering-rumors-780921',
'only_matching': True,
}]
_FORMAT_IDS = {
'360': ('60', '106'),
'480': ('61', '106'),
'720': ('62', '106'),
'1080': ('80', '108'),
}
def _download_webpage(self, url_or_request, *args, **kwargs):
request = (url_or_request if isinstance(url_or_request, urllib.request.Request)
else sanitized_Request(url_or_request))
# Accept-Language must be set explicitly to accept any language to avoid issues
# similar to https://github.com/ytdl-org/youtube-dl/issues/6797.
# Along with IP address Crunchyroll uses Accept-Language to guess whether georestriction
# should be imposed or not (from what I can see it just takes the first language
# ignoring the priority and requires it to correspond the IP). By the way this causes
# Crunchyroll to not work in georestriction cases in some browsers that don't place
# the locale lang first in header. However allowing any language seems to workaround the issue.
request.add_header('Accept-Language', '*')
return super(CrunchyrollBaseIE, self)._download_webpage(request, *args, **kwargs)
def _decrypt_subtitles(self, data, iv, id):
data = bytes_to_intlist(compat_b64decode(data))
iv = bytes_to_intlist(compat_b64decode(iv))
id = int(id)
def obfuscate_key_aux(count, modulo, start):
output = list(start)
for _ in range(count):
output.append(output[-1] + output[-2])
# cut off start values
output = output[2:]
output = list(map(lambda x: x % modulo + 33, output))
return output
def obfuscate_key(key):
num1 = int(floor(pow(2, 25) * sqrt(6.9)))
num2 = (num1 ^ key) << 5
num3 = key ^ num1
num4 = num3 ^ (num3 >> 3) ^ num2
prefix = intlist_to_bytes(obfuscate_key_aux(20, 97, (1, 2)))
shaHash = bytes_to_intlist(sha1(prefix + str(num4).encode('ascii')).digest())
# Extend 160 Bit hash to 256 Bit
return shaHash + [0] * 12
key = obfuscate_key(id)
decrypted_data = intlist_to_bytes(aes_cbc_decrypt(data, key, iv))
return zlib.decompress(decrypted_data)
def _convert_subtitles_to_srt(self, sub_root):
output = ''
for i, event in enumerate(sub_root.findall('./events/event'), 1):
start = event.attrib['start'].replace('.', ',')
end = event.attrib['end'].replace('.', ',')
text = event.attrib['text'].replace('\\N', '\n')
output += '%d\n%s --> %s\n%s\n\n' % (i, start, end, text)
return output
def _convert_subtitles_to_ass(self, sub_root):
output = ''
def ass_bool(strvalue):
assvalue = '0'
if strvalue == '1':
assvalue = '-1'
return assvalue
output = '[Script Info]\n'
output += 'Title: %s\n' % sub_root.attrib['title']
output += 'ScriptType: v4.00+\n'
output += 'WrapStyle: %s\n' % sub_root.attrib['wrap_style']
output += 'PlayResX: %s\n' % sub_root.attrib['play_res_x']
output += 'PlayResY: %s\n' % sub_root.attrib['play_res_y']
output += """
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
"""
for style in sub_root.findall('./styles/style'):
output += 'Style: ' + style.attrib['name']
output += ',' + style.attrib['font_name']
output += ',' + style.attrib['font_size']
output += ',' + style.attrib['primary_colour']
output += ',' + style.attrib['secondary_colour']
output += ',' + style.attrib['outline_colour']
output += ',' + style.attrib['back_colour']
output += ',' + ass_bool(style.attrib['bold'])
output += ',' + ass_bool(style.attrib['italic'])
output += ',' + ass_bool(style.attrib['underline'])
output += ',' + ass_bool(style.attrib['strikeout'])
output += ',' + style.attrib['scale_x']
output += ',' + style.attrib['scale_y']
output += ',' + style.attrib['spacing']
output += ',' + style.attrib['angle']
output += ',' + style.attrib['border_style']
output += ',' + style.attrib['outline']
output += ',' + style.attrib['shadow']
output += ',' + style.attrib['alignment']
output += ',' + style.attrib['margin_l']
output += ',' + style.attrib['margin_r']
output += ',' + style.attrib['margin_v']
output += ',' + style.attrib['encoding']
output += '\n'
output += """
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
"""
for event in sub_root.findall('./events/event'):
output += 'Dialogue: 0'
output += ',' + event.attrib['start']
output += ',' + event.attrib['end']
output += ',' + event.attrib['style']
output += ',' + event.attrib['name']
output += ',' + event.attrib['margin_l']
output += ',' + event.attrib['margin_r']
output += ',' + event.attrib['margin_v']
output += ',' + event.attrib['effect']
output += ',' + event.attrib['text']
output += '\n'
return output
def _extract_subtitles(self, subtitle):
sub_root = compat_etree_fromstring(subtitle)
return [{
'ext': 'srt',
'data': self._convert_subtitles_to_srt(sub_root),
}, {
'ext': 'ass',
'data': self._convert_subtitles_to_ass(sub_root),
}]
def _get_subtitles(self, video_id, webpage):
subtitles = {}
for sub_id, sub_name in re.findall(r'\bssid=([0-9]+)"[^>]+?\btitle="([^"]+)', webpage):
sub_doc = self._call_rpc_api(
'Subtitle_GetXml', video_id,
'Downloading subtitles for ' + sub_name, data={
'subtitle_script_id': sub_id,
})
if not isinstance(sub_doc, xml.etree.ElementTree.Element):
continue
sid = sub_doc.get('id')
iv = xpath_text(sub_doc, 'iv', 'subtitle iv')
data = xpath_text(sub_doc, 'data', 'subtitle data')
if not sid or not iv or not data:
continue
subtitle = self._decrypt_subtitles(data, iv, sid).decode('utf-8')
lang_code = self._search_regex(r'lang_code=["\']([^"\']+)', subtitle, 'subtitle_lang_code', fatal=False)
if not lang_code:
continue
subtitles[lang_code] = self._extract_subtitles(subtitle)
return subtitles
def _real_extract(self, url):
mobj = self._match_valid_url(url)
video_id = mobj.group('id')
if mobj.group('prefix') == 'm':
mobile_webpage = self._download_webpage(url, video_id, 'Downloading mobile webpage')
webpage_url = self._search_regex(r'<link rel="canonical" href="([^"]+)" />', mobile_webpage, 'webpage_url')
else:
webpage_url = 'http://www.' + mobj.group('url')
webpage = self._download_webpage(
self._add_skip_wall(webpage_url), video_id,
headers=self.geo_verification_headers())
if re.search(r'<div id="preload-data">', webpage):
return self._redirect_to_beta(webpage, CrunchyrollBetaIE.ie_key(), video_id)
note_m = self._html_search_regex(
r'<div class="showmedia-trailer-notice">(.+?)</div>',
webpage, 'trailer-notice', default='')
if note_m:
raise ExtractorError(note_m, expected=True)
mobj = re.search(r'Page\.messaging_box_controller\.addItems\(\[(?P<msg>{.+?})\]\)', webpage)
if mobj:
msg = json.loads(mobj.group('msg'))
if msg.get('type') == 'error':
raise ExtractorError('crunchyroll returned error: %s' % msg['message_body'], expected=True)
if 'To view this, please log in to verify you are 18 or older.' in webpage:
self.raise_login_required()
media = self._parse_json(self._search_regex(
r'vilos\.config\.media\s*=\s*({.+?});',
webpage, 'vilos media', default='{}'), video_id)
media_metadata = media.get('metadata') or {}
language = self._search_regex(
r'(?:vilos\.config\.player\.language|LOCALE)\s*=\s*(["\'])(?P<lang>(?:(?!\1).)+)\1',
webpage, 'language', default=None, group='lang')
video_title = self._html_search_regex(
(r'(?s)<h1[^>]*>((?:(?!<h1).)*?<(?:span[^>]+itemprop=["\']title["\']|meta[^>]+itemprop=["\']position["\'])[^>]*>(?:(?!<h1).)+?)</h1>',
r'<title>(.+?),\s+-\s+.+? Crunchyroll'),
webpage, 'video_title', default=None)
if not video_title:
video_title = re.sub(r'^Watch\s+', '', self._og_search_description(webpage))
video_title = re.sub(r' {2,}', ' ', video_title)
video_description = (self._parse_json(self._html_search_regex(
r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
webpage, 'description', default='{}'), video_id) or media_metadata).get('description')
thumbnails = []
thumbnail_url = (self._parse_json(self._html_search_regex(
r'<script type="application\/ld\+json">\n\s*(.+?)<\/script>',
webpage, 'thumbnail_url', default='{}'), video_id)).get('image')
if thumbnail_url:
thumbnails.append({
'url': thumbnail_url,
'width': 1920,
'height': 1080
})
if video_description:
video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
video_uploader = self._html_search_regex(
# try looking for both an uploader that's a link and one that's not
[r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
webpage, 'video_uploader', default=False)
requested_languages = self._configuration_arg('language')
requested_hardsubs = [('' if val == 'none' else val) for val in self._configuration_arg('hardsub')]
language_preference = qualities((requested_languages or [language or ''])[::-1])
hardsub_preference = qualities((requested_hardsubs or ['', language or ''])[::-1])
formats = []
for stream in media.get('streams', []):
audio_lang = stream.get('audio_lang') or ''
hardsub_lang = stream.get('hardsub_lang') or ''
if (requested_languages and audio_lang.lower() not in requested_languages
or requested_hardsubs and hardsub_lang.lower() not in requested_hardsubs):
continue
vrv_formats = self._extract_vrv_formats(
stream.get('url'), video_id, stream.get('format'),
audio_lang, hardsub_lang)
for f in vrv_formats:
f['language_preference'] = language_preference(audio_lang)
f['quality'] = hardsub_preference(hardsub_lang)
formats.extend(vrv_formats)
if not formats:
available_fmts = []
for a, fmt in re.findall(r'(<a[^>]+token=["\']showmedia\.([0-9]{3,4})p["\'][^>]+>)', webpage):
attrs = extract_attributes(a)
href = attrs.get('href')
if href and '/freetrial' in href:
continue
available_fmts.append(fmt)
if not available_fmts:
for p in (r'token=["\']showmedia\.([0-9]{3,4})p"', r'showmedia\.([0-9]{3,4})p'):
available_fmts = re.findall(p, webpage)
if available_fmts:
break
if not available_fmts:
available_fmts = self._FORMAT_IDS.keys()
video_encode_ids = []
for fmt in available_fmts:
stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt + 'p'
stream_infos = []
streamdata = self._call_rpc_api(
'VideoPlayer_GetStandardConfig', video_id,
'Downloading media info for %s' % video_format, data={
'media_id': video_id,
'video_format': stream_format,
'video_quality': stream_quality,
'current_page': url,
})
if isinstance(streamdata, xml.etree.ElementTree.Element):
stream_info = streamdata.find('./{default}preload/stream_info')
if stream_info is not None:
stream_infos.append(stream_info)
stream_info = self._call_rpc_api(
'VideoEncode_GetStreamInfo', video_id,
'Downloading stream info for %s' % video_format, data={
'media_id': video_id,
'video_format': stream_format,
'video_encode_quality': stream_quality,
})
if isinstance(stream_info, xml.etree.ElementTree.Element):
stream_infos.append(stream_info)
for stream_info in stream_infos:
video_encode_id = xpath_text(stream_info, './video_encode_id')
if video_encode_id in video_encode_ids:
continue
video_encode_ids.append(video_encode_id)
video_file = xpath_text(stream_info, './file')
if not video_file:
continue
if video_file.startswith('http'):
formats.extend(self._extract_m3u8_formats(
video_file, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
video_url = xpath_text(stream_info, './host')
if not video_url:
continue
metadata = stream_info.find('./metadata')
format_info = {
'format': video_format,
'height': int_or_none(xpath_text(metadata, './height')),
'width': int_or_none(xpath_text(metadata, './width')),
}
if '.fplive.net/' in video_url:
video_url = re.sub(r'^rtmpe?://', 'http://', video_url.strip())
parsed_video_url = compat_urlparse.urlparse(video_url)
direct_video_url = compat_urlparse.urlunparse(parsed_video_url._replace(
netloc='v.lvlt.crcdn.net',
path='%s/%s' % (remove_end(parsed_video_url.path, '/'), video_file.split(':')[-1])))
if self._is_valid_url(direct_video_url, video_id, video_format):
format_info.update({
'format_id': 'http-' + video_format,
'url': direct_video_url,
})
formats.append(format_info)
continue
format_info.update({
'format_id': 'rtmp-' + video_format,
'url': video_url,
'play_path': video_file,
'ext': 'flv',
})
formats.append(format_info)
self._sort_formats(formats)
metadata = self._call_rpc_api(
'VideoPlayer_GetMediaMetadata', video_id,
note='Downloading media info', data={
'media_id': video_id,
})
subtitles = {}
for subtitle in media.get('subtitles', []):
subtitle_url = subtitle.get('url')
if not subtitle_url:
continue
subtitles.setdefault(subtitle.get('language', 'enUS'), []).append({
'url': subtitle_url,
'ext': subtitle.get('format', 'ass'),
})
if not subtitles:
subtitles = self.extract_subtitles(video_id, webpage)
# webpage provide more accurate data than series_title from XML
series = self._html_search_regex(
r'(?s)<h\d[^>]+\bid=["\']showmedia_about_episode_num[^>]+>(.+?)</h\d',
webpage, 'series', fatal=False)
season = episode = episode_number = duration = None
if isinstance(metadata, xml.etree.ElementTree.Element):
season = xpath_text(metadata, 'series_title')
episode = xpath_text(metadata, 'episode_title')
episode_number = int_or_none(xpath_text(metadata, 'episode_number'))
duration = float_or_none(media_metadata.get('duration'), 1000)
if not episode:
episode = media_metadata.get('title')
if not episode_number:
episode_number = int_or_none(media_metadata.get('episode_number'))
thumbnail_url = try_get(media, lambda x: x['thumbnail']['url'])
if thumbnail_url:
thumbnails.append({
'url': thumbnail_url,
'width': 640,
'height': 360
})
season_number = int_or_none(self._search_regex(
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
webpage, 'season number', default=None))
info = self._search_json_ld(webpage, video_id, default={})
return merge_dicts({
'id': video_id,
'title': video_title,
'description': video_description,
'duration': duration,
'thumbnails': thumbnails,
'uploader': video_uploader,
'series': series,
'season': season,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
'subtitles': subtitles,
'formats': formats,
}, info)
class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
IE_NAME = 'crunchyroll:playlist'
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?:\w{2}(?:-\w{2})?/)?(?!(?:news|anime-news|library|forum|launchcalendar|lineup|store|comics|freetrial|login|media-\d+))(?P<id>[\w\-]+))/?(?:\?|$)'
_TESTS = [{
'url': 'https://www.crunchyroll.com/a-bridge-to-the-starry-skies-hoshizora-e-kakaru-hashi',
'info_dict': {
'id': 'a-bridge-to-the-starry-skies-hoshizora-e-kakaru-hashi',
'title': 'A Bridge to the Starry Skies - Hoshizora e Kakaru Hashi'
},
'playlist_count': 13,
}, {
# geo-restricted (US), 18+ maturity wall, non-premium available
'url': 'http://www.crunchyroll.com/cosplay-complex-ova',
'info_dict': {
'id': 'cosplay-complex-ova',
'title': 'Cosplay Complex OVA'
},
'playlist_count': 3,
'skip': 'Georestricted',
}, {
# geo-restricted (US), 18+ maturity wall, non-premium will be available since 2015.11.14
'url': 'http://www.crunchyroll.com/ladies-versus-butlers?skip_wall=1',
'only_matching': True,
}, {
'url': 'http://www.crunchyroll.com/fr/ladies-versus-butlers',
'only_matching': True,
}]
def _real_extract(self, url):
show_id = self._match_id(url)
webpage = self._download_webpage(
# https:// gives a 403, but http:// does not
self._add_skip_wall(url).replace('https://', 'http://'), show_id,
headers=self.geo_verification_headers())
if re.search(r'<div id="preload-data">', webpage):
return self._redirect_to_beta(webpage, CrunchyrollBetaShowIE.ie_key(), show_id)
title = self._html_search_meta('name', webpage, default=None)
episode_re = r'<li id="showview_videos_media_(\d+)"[^>]+>.*?<a href="([^"]+)"'
season_re = r'<a [^>]+season-dropdown[^>]+>([^<]+)'
paths = re.findall(f'(?s){episode_re}|{season_re}', webpage)
entries, current_season = [], None
for ep_id, ep, season in paths:
if season:
current_season = season
continue
entries.append(self.url_result(
f'http://www.crunchyroll.com{ep}', CrunchyrollIE.ie_key(), ep_id, season=current_season))
return {
'_type': 'playlist',
'id': show_id,
'title': title,
'entries': reversed(entries),
}
class CrunchyrollBetaBaseIE(CrunchyrollBaseIE):
params = None
def _get_params(self, lang):
if not CrunchyrollBetaBaseIE.params:
if self._get_cookies(f'https://beta.crunchyroll.com/{lang}').get('etp_rt'):
if not CrunchyrollBaseIE.params:
if self._get_cookies(f'https://www.crunchyroll.com/{lang}').get('etp_rt'):
grant_type, key = 'etp_rt_cookie', 'accountAuthClientId'
else:
grant_type, key = 'client_id', 'anonClientId'
initial_state, app_config = self._get_beta_embedded_json(self._download_webpage(
f'https://beta.crunchyroll.com/{lang}', None, note='Retrieving main page'), None)
api_domain = app_config['cxApiParams']['apiDomain']
initial_state, app_config = self._get_embedded_json(self._download_webpage(
f'https://www.crunchyroll.com/{lang}', None, note='Retrieving main page'), None)
api_domain = app_config['cxApiParams']['apiDomain'].replace('beta.crunchyroll.com', 'www.crunchyroll.com')
auth_response = self._download_json(
f'{api_domain}/auth/v1/token', None, note=f'Authenticating with grant_type={grant_type}',
@ -739,7 +77,7 @@ def _get_params(self, lang):
headers={
'Authorization': auth_response['token_type'] + ' ' + auth_response['access_token']
})
cms = traverse_obj(policy_response, 'cms_beta', 'cms')
cms = policy_response.get('cms_web')
bucket = cms['bucket']
params = {
'Policy': cms['policy'],
@ -749,19 +87,19 @@ def _get_params(self, lang):
locale = traverse_obj(initial_state, ('localization', 'locale'))
if locale:
params['locale'] = locale
CrunchyrollBetaBaseIE.params = (api_domain, bucket, params)
return CrunchyrollBetaBaseIE.params
CrunchyrollBaseIE.params = (api_domain, bucket, params)
return CrunchyrollBaseIE.params
class CrunchyrollBetaIE(CrunchyrollBetaBaseIE):
IE_NAME = 'crunchyroll:beta'
class CrunchyrollBetaIE(CrunchyrollBaseIE):
IE_NAME = 'crunchyroll'
_VALID_URL = r'''(?x)
https?://beta\.crunchyroll\.com/
https?://(?:beta|www)\.crunchyroll\.com/
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
watch/(?P<id>\w+)
(?:/(?P<display_id>[\w-]+))?/?(?:[?#]|$)'''
_TESTS = [{
'url': 'https://beta.crunchyroll.com/watch/GY2P1Q98Y/to-the-future',
'url': 'https://www.crunchyroll.com/watch/GY2P1Q98Y/to-the-future',
'info_dict': {
'id': 'GY2P1Q98Y',
'ext': 'mp4',
@ -777,11 +115,11 @@ class CrunchyrollBetaIE(CrunchyrollBetaBaseIE):
'season_number': 1,
'episode': 'To the Future',
'episode_number': 73,
'thumbnail': r're:^https://beta.crunchyroll.com/imgsrv/.*\.jpeg$',
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg$',
},
'params': {'skip_download': 'm3u8', 'format': 'all[format_id~=hardsub]'},
}, {
'url': 'https://beta.crunchyroll.com/watch/GYE5WKQGR',
'url': 'https://www.crunchyroll.com/watch/GYE5WKQGR',
'info_dict': {
'id': 'GYE5WKQGR',
'ext': 'mp4',
@ -797,12 +135,12 @@ class CrunchyrollBetaIE(CrunchyrollBetaBaseIE):
'season_number': 1,
'episode': 'Porter Robinson presents Shelter the Animation',
'episode_number': 0,
'thumbnail': r're:^https://beta.crunchyroll.com/imgsrv/.*\.jpeg$',
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg$',
},
'params': {'skip_download': True},
'skip': 'Video is Premium only',
}, {
'url': 'https://beta.crunchyroll.com/watch/GY2P1Q98Y',
'url': 'https://www.crunchyroll.com/watch/GY2P1Q98Y',
'only_matching': True,
}, {
'url': 'https://beta.crunchyroll.com/pt-br/watch/G8WUN8VKP/the-ruler-of-conspiracy',
@ -901,15 +239,15 @@ def _real_extract(self, url):
}
class CrunchyrollBetaShowIE(CrunchyrollBetaBaseIE):
IE_NAME = 'crunchyroll:playlist:beta'
class CrunchyrollBetaShowIE(CrunchyrollBaseIE):
IE_NAME = 'crunchyroll:playlist'
_VALID_URL = r'''(?x)
https?://beta\.crunchyroll\.com/
https?://(?:beta|www)\.crunchyroll\.com/
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
series/(?P<id>\w+)
(?:/(?P<display_id>[\w-]+))?/?(?:[?#]|$)'''
_TESTS = [{
'url': 'https://beta.crunchyroll.com/series/GY19NQ2QR/Girl-Friend-BETA',
'url': 'https://www.crunchyroll.com/series/GY19NQ2QR/Girl-Friend-BETA',
'info_dict': {
'id': 'GY19NQ2QR',
'title': 'Girl Friend BETA',
@ -942,7 +280,7 @@ def entries():
episode_display_id = episode['slug_title']
yield {
'_type': 'url',
'url': f'https://beta.crunchyroll.com/{lang}watch/{episode_id}/{episode_display_id}',
'url': f'https://www.crunchyroll.com/{lang}watch/{episode_id}/{episode_display_id}',
'ie_key': CrunchyrollBetaIE.ie_key(),
'id': episode_id,
'title': '%s Episode %s %s' % (episode.get('season_title'), episode.get('episode'), episode.get('title')),

76
yt_dlp/extractor/deuxm.py Normal file
View File

@ -0,0 +1,76 @@
from .common import InfoExtractor
from ..utils import url_or_none
class DeuxMIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?2m\.ma/[^/]+/replay/single/(?P<id>([\w.]{1,24})+)'
_TESTS = [{
'url': 'https://2m.ma/fr/replay/single/6351d439b15e1a613b3debe8',
'md5': '5f761f04c9d686e553b685134dca5d32',
'info_dict': {
'id': '6351d439b15e1a613b3debe8',
'ext': 'mp4',
'title': 'Grand Angle : Jeudi 20 Octobre 2022',
'thumbnail': r're:^https?://2msoread-ww.amagi.tv/mediasfiles/videos/images/.*\.png$'
}
}, {
'url': 'https://2m.ma/fr/replay/single/635c0aeab4eec832622356da',
'md5': 'ad6af2f5e4d5b2ad2194a84b6e890b4c',
'info_dict': {
'id': '635c0aeab4eec832622356da',
'ext': 'mp4',
'title': 'Journal Amazigh : Vendredi 28 Octobre 2022',
'thumbnail': r're:^https?://2msoread-ww.amagi.tv/mediasfiles/videos/images/.*\.png$'
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
f'https://2m.ma/api/watchDetail/{video_id}', video_id)['response']['News']
return {
'id': video_id,
'title': video.get('titre'),
'url': video['url'],
'description': video.get('description'),
'thumbnail': url_or_none(video.get('image')),
}
class DeuxMNewsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?2m\.ma/(?P<lang>\w+)/news/(?P<id>[^/#?]+)'
_TESTS = [{
'url': 'https://2m.ma/fr/news/Kan-Ya-Mkan-d%C3%A9poussi%C3%A8re-l-histoire-du-phare-du-Cap-Beddouza-20221028',
'md5': '43d5e693a53fa0b71e8a5204c7d4542a',
'info_dict': {
'id': '635c5d1233b83834e35b282e',
'ext': 'mp4',
'title': 'Kan Ya Mkan d\u00e9poussi\u00e8re l\u2019histoire du phare du Cap Beddouza',
'description': 'md5:99dcf29b82f1d7f2a4acafed1d487527',
'thumbnail': r're:^https?://2msoread-ww.amagi.tv/mediasfiles/videos/images/.*\.png$'
}
}, {
'url': 'https://2m.ma/fr/news/Interview-Casablanca-hors-des-sentiers-battus-avec-Abderrahim-KASSOU-Replay--20221017',
'md5': '7aca29f02230945ef635eb8290283c0c',
'info_dict': {
'id': '634d9e108b70d40bc51a844b',
'ext': 'mp4',
'title': 'Interview: Casablanca hors des sentiers battus avec Abderrahim KASSOU (Replay) ',
'description': 'md5:3b8e78111de9fcc6ef7f7dd6cff2430c',
'thumbnail': r're:^https?://2msoread-ww.amagi.tv/mediasfiles/videos/images/.*\.png$'
}
}]
def _real_extract(self, url):
article_name, lang = self._match_valid_url(url).group('id', 'lang')
video = self._download_json(
f'https://2m.ma/api/articlesByUrl?lang={lang}&url=/news/{article_name}', article_name)['response']['article'][0]
return {
'id': video['id'],
'title': video.get('title'),
'url': video['image'][0],
'description': video.get('content'),
'thumbnail': url_or_none(video.get('cover')),
}

View File

@ -1,4 +1,5 @@
from .common import InfoExtractor
from ..utils import extract_attributes, get_element_html_by_id
class EpochIE(InfoExtractor):
@ -28,13 +29,21 @@ class EpochIE(InfoExtractor):
'title': 'Kash Patel: A 6-Year-Saga of Government Corruption, From Russiagate to Mar-a-Lago',
}
},
{
'url': 'https://www.theepochtimes.com/dick-morris-discusses-his-book-the-return-trumps-big-2024-comeback_4819205.html',
'info_dict': {
'id': '9489f994-2a20-4812-b233-ac0e5c345632',
'ext': 'mp4',
'title': 'Dick Morris Discusses His Book The Return: Trumps Big 2024 Comeback',
}
},
]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
youmaker_video_id = self._search_regex(r'data-trailer="[\w-]+" data-id="([\w-]+)"', webpage, 'url')
youmaker_video_id = extract_attributes(get_element_html_by_id('videobox', webpage))['data-id']
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
f'http://vs1.youmaker.com/assets/{youmaker_video_id}/playlist.m3u8', video_id, 'mp4', m3u8_id='hls')

View File

@ -75,6 +75,29 @@ def _real_extract(self, url):
return info
class FoxNewsVideoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?foxnews\.com/video/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.foxnews.com/video/6313058664112',
'info_dict': {
'id': '6313058664112',
'ext': 'mp4',
'thumbnail': r're:https://.+/1280x720/match/image\.jpg',
'upload_date': '20220930',
'description': 'New York City, Kids Therapy, Biden',
'duration': 2415,
'title': 'Gutfeld! - Thursday, September 29',
'timestamp': 1664527538,
},
'expected_warnings': ['Ignoring subtitle tracks'],
'params': {'skip_download': 'm3u8'},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(f'https://video.foxnews.com/v/{video_id}', FoxNewsIE, video_id)
class FoxNewsArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:insider\.)?foxnews\.com/(?!v)([^/]+/)+(?P<id>[a-z-]+)'
IE_NAME = 'foxnews:article'

View File

@ -0,0 +1,86 @@
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
extract_attributes,
get_element_by_class,
get_element_html_by_id,
get_element_text_and_html_by_tag,
parse_duration,
strip_or_none,
traverse_obj,
try_call,
)
class ListenNotesIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?listennotes\.com/podcasts/[^/]+/[^/]+-(?P<id>.+)/'
_TESTS = [{
'url': 'https://www.listennotes.com/podcasts/thriving-on-overload/tim-oreilly-on-noticing-KrDgvNb_u1n/',
'md5': '5b91a32f841e5788fb82b72a1a8af7f7',
'info_dict': {
'id': 'KrDgvNb_u1n',
'ext': 'mp3',
'title': 'md5:32236591a921adf17bbdbf0441b6c0e9',
'description': 'md5:c581ed197eeddcee55a67cdb547c8cbd',
'duration': 2148.0,
'channel': 'Thriving on Overload',
'channel_id': 'ed84wITivxF',
'episode_id': 'e1312583fa7b4e24acfbb5131050be00',
'thumbnail': 'https://production.listennotes.com/podcasts/thriving-on-overload-ross-dawson-1wb_KospA3P-ed84wITivxF.300x300.jpg',
'channel_url': 'https://www.listennotes.com/podcasts/thriving-on-overload-ross-dawson-ed84wITivxF/',
'cast': ['Tim OReilly', 'Cookie Monster', 'Lao Tzu', 'Wallace Steven', 'Eric Raymond', 'Christine Peterson', 'John Maynard Keyne', 'Ross Dawson'],
}
}, {
'url': 'https://www.listennotes.com/podcasts/ask-noah-show/episode-177-wireguard-with-lwEA3154JzG/',
'md5': '62fb4ffe7fc525632a1138bf72a5ce53',
'info_dict': {
'id': 'lwEA3154JzG',
'ext': 'mp3',
'title': 'Episode 177: WireGuard with Jason Donenfeld',
'description': 'md5:24744f36456a3e95f83c1193a3458594',
'duration': 3861.0,
'channel': 'Ask Noah Show',
'channel_id': '4DQTzdS5-j7',
'episode_id': '8c8954b95e0b4859ad1eecec8bf6d3a4',
'channel_url': 'https://www.listennotes.com/podcasts/ask-noah-show-noah-j-chelliah-4DQTzdS5-j7/',
'thumbnail': 'https://production.listennotes.com/podcasts/ask-noah-show-noah-j-chelliah-cfbRUw9Gs3F-4DQTzdS5-j7.300x300.jpg',
'cast': ['noah showlink', 'noah show', 'noah dashboard', 'jason donenfeld'],
}
}]
def _clean_description(self, description):
return clean_html(re.sub(r'(</?(div|p)>\s*)+', '<br/><br/>', description or ''))
def _real_extract(self, url):
audio_id = self._match_id(url)
webpage = self._download_webpage(url, audio_id)
data = self._search_json(
r'<script id="original-content"[^>]+\btype="application/json">', webpage, 'content', audio_id)
data.update(extract_attributes(get_element_html_by_id(
r'episode-play-button-toolbar|episode-no-play-button-toolbar', webpage, escape_value=False)))
duration, description = self._search_regex(
r'(?P<duration>[\d:]+)\s*-\s*(?P<description>.+)',
self._html_search_meta(['og:description', 'description', 'twitter:description'], webpage),
'description', fatal=False, group=('duration', 'description')) or (None, None)
return {
'id': audio_id,
'url': data['audio'],
'title': (data.get('data-title')
or try_call(lambda: get_element_text_and_html_by_tag('h1', webpage)[0])
or self._html_search_meta(('og:title', 'title', 'twitter:title'), webpage, 'title')),
'description': (self._clean_description(get_element_by_class('ln-text-p', webpage))
or strip_or_none(description)),
'duration': parse_duration(traverse_obj(data, 'audio_length', 'data-duration') or duration),
'episode_id': traverse_obj(data, 'uuid', 'data-episode-uuid'),
**traverse_obj(data, {
'thumbnail': 'data-image',
'channel': 'data-channel-title',
'cast': ('nlp_entities', ..., 'name'),
'channel_url': 'channel_url',
'channel_id': 'channel_short_uuid',
})
}

View File

@ -1,8 +1,12 @@
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
extract_attributes,
int_or_none,
str_to_int,
url_or_none,
urlencode_postdata,
)
@ -17,17 +21,20 @@ class ManyVidsIE(InfoExtractor):
'id': '133957',
'ext': 'mp4',
'title': 'everthing about me (Preview)',
'uploader': 'ellyxxix',
'view_count': int,
'like_count': int,
},
}, {
# full video
'url': 'https://www.manyvids.com/Video/935718/MY-FACE-REVEAL/',
'md5': 'f3e8f7086409e9b470e2643edb96bdcc',
'md5': 'bb47bab0e0802c2a60c24ef079dfe60f',
'info_dict': {
'id': '935718',
'ext': 'mp4',
'title': 'MY FACE REVEAL',
'description': 'md5:ec5901d41808b3746fed90face161612',
'uploader': 'Sarah Calanthe',
'view_count': int,
'like_count': int,
},
@ -36,17 +43,50 @@ class ManyVidsIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
real_url = 'https://www.manyvids.com/video/%s/gtm.js' % (video_id, )
try:
webpage = self._download_webpage(real_url, video_id)
except Exception:
# probably useless fallback
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r'data-(?:video-filepath|meta-video)\s*=s*(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'video URL', group='url')
info = self._search_regex(
r'''(<div\b[^>]*\bid\s*=\s*(['"])pageMetaDetails\2[^>]*>)''',
webpage, 'meta details', default='')
info = extract_attributes(info)
title = self._html_search_regex(
(r'<span[^>]+class=["\']item-title[^>]+>([^<]+)',
r'<h2[^>]+class=["\']h2 m-0["\'][^>]*>([^<]+)'),
webpage, 'title', default=None) or self._html_search_meta(
'twitter:title', webpage, 'title', fatal=True)
player = self._search_regex(
r'''(<div\b[^>]*\bid\s*=\s*(['"])rmpPlayerStream\2[^>]*>)''',
webpage, 'player details', default='')
player = extract_attributes(player)
video_urls_and_ids = (
(info.get('data-meta-video'), 'video'),
(player.get('data-video-transcoded'), 'transcoded'),
(player.get('data-video-filepath'), 'filepath'),
(self._og_search_video_url(webpage, secure=False, default=None), 'og_video'),
)
def txt_or_none(s, default=None):
return (s.strip() or default) if isinstance(s, str) else default
uploader = txt_or_none(info.get('data-meta-author'))
def mung_title(s):
if uploader:
s = re.sub(r'^\s*%s\s+[|-]' % (re.escape(uploader), ), '', s)
return txt_or_none(s)
title = (
mung_title(info.get('data-meta-title'))
or self._html_search_regex(
(r'<span[^>]+class=["\']item-title[^>]+>([^<]+)',
r'<h2[^>]+class=["\']h2 m-0["\'][^>]*>([^<]+)'),
webpage, 'title', default=None)
or self._html_search_meta(
'twitter:title', webpage, 'title', fatal=True))
title = re.sub(r'\s*[|-]\s+ManyVids\s*$', '', title) or title
if any(p in webpage for p in ('preview_videos', '_preview.mp4')):
title += ' (Preview)'
@ -59,7 +99,8 @@ def _real_extract(self, url):
# Sets some cookies
self._download_webpage(
'https://www.manyvids.com/includes/ajax_repository/you_had_me_at_hello.php',
video_id, fatal=False, data=urlencode_postdata({
video_id, note='Setting format cookies', fatal=False,
data=urlencode_postdata({
'mvtoken': mv_token,
'vid': video_id,
}), headers={
@ -67,24 +108,56 @@ def _real_extract(self, url):
'X-Requested-With': 'XMLHttpRequest'
})
if determine_ext(video_url) == 'm3u8':
formats = self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
else:
formats = [{'url': video_url}]
formats = []
for v_url, fmt in video_urls_and_ids:
v_url = url_or_none(v_url)
if not v_url:
continue
if determine_ext(v_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
v_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls'))
else:
formats.append({
'url': v_url,
'format_id': fmt,
})
like_count = int_or_none(self._search_regex(
r'data-likes=["\'](\d+)', webpage, 'like count', default=None))
view_count = str_to_int(self._html_search_regex(
r'(?s)<span[^>]+class="views-wrapper"[^>]*>(.+?)</span', webpage,
'view count', default=None))
self._remove_duplicate_formats(formats)
for f in formats:
if f.get('height') is None:
f['height'] = int_or_none(
self._search_regex(r'_(\d{2,3}[02468])_', f['url'], 'video height', default=None))
if '/preview/' in f['url']:
f['format_id'] = '_'.join(filter(None, (f.get('format_id'), 'preview')))
f['preference'] = -10
if 'transcoded' in f['format_id']:
f['preference'] = f.get('preference', -1) - 1
self._sort_formats(formats)
def get_likes():
likes = self._search_regex(
r'''(<a\b[^>]*\bdata-id\s*=\s*(['"])%s\2[^>]*>)''' % (video_id, ),
webpage, 'likes', default='')
likes = extract_attributes(likes)
return int_or_none(likes.get('data-likes'))
def get_views():
return str_to_int(self._html_search_regex(
r'''(?s)<span\b[^>]*\bclass\s*=["']views-wrapper\b[^>]+>.+?<span\b[^>]+>\s*(\d[\d,.]*)\s*</span>''',
webpage, 'view count', default=None))
return {
'id': video_id,
'title': title,
'view_count': view_count,
'like_count': like_count,
'formats': formats,
'uploader': self._html_search_regex(r'<meta[^>]+name="author"[^>]*>([^<]+)', webpage, 'uploader'),
'description': txt_or_none(info.get('data-meta-description')),
'uploader': txt_or_none(info.get('data-meta-author')),
'thumbnail': (
url_or_none(info.get('data-meta-image'))
or url_or_none(player.get('data-video-screenshot'))),
'view_count': get_views(),
'like_count': get_likes(),
}

View File

@ -69,7 +69,7 @@ class MotherlessIE(InfoExtractor):
'title': 'a/ Hot Teens',
'categories': list,
'upload_date': '20210104',
'uploader_id': 'yonbiw',
'uploader_id': 'anonymous',
'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18,
},
@ -123,11 +123,12 @@ def _real_extract(self, url):
kwargs = {_AGO_UNITS.get(uploaded_ago[-1]): delta}
upload_date = (datetime.datetime.utcnow() - datetime.timedelta(**kwargs)).strftime('%Y%m%d')
comment_count = webpage.count('class="media-comment-contents"')
comment_count = len(re.findall(r'''class\s*=\s*['"]media-comment-contents\b''', webpage))
uploader_id = self._html_search_regex(
(r'"media-meta-member">\s+<a href="/m/([^"]+)"',
r'<span\b[^>]+\bclass="username">([^<]+)</span>'),
(r'''<span\b[^>]+\bclass\s*=\s*["']username\b[^>]*>([^<]+)</span>''',
r'''(?s)['"](?:media-meta-member|thumb-member-username)\b[^>]+>\s*<a\b[^>]+\bhref\s*=\s*['"]/m/([^"']+)'''),
webpage, 'uploader_id', fatal=False)
categories = self._html_search_meta('keywords', webpage, default=None)
if categories:
categories = [cat.strip() for cat in categories.split(',')]
@ -217,23 +218,23 @@ def _real_extract(self, url):
r'<title>([\w\s]+\w)\s+-', webpage, 'title', fatal=False)
description = self._html_search_meta(
'description', webpage, fatal=False)
page_count = self._int(self._search_regex(
r'(\d+)</(?:a|span)><(?:a|span)[^>]+rel="next">',
webpage, 'page_count', default=0), 'page_count')
page_count = str_to_int(self._search_regex(
r'(\d+)\s*</(?:a|span)>\s*<(?:a|span)[^>]+(?:>\s*NEXT|\brel\s*=\s*["\']?next)\b',
webpage, 'page_count', default=0))
if not page_count:
message = self._search_regex(
r'class="error-page"[^>]*>\s*<p[^>]*>\s*(?P<error_msg>[^<]+)(?<=\S)\s*',
r'''class\s*=\s*['"]error-page\b[^>]*>\s*<p[^>]*>\s*(?P<error_msg>[^<]+)(?<=\S)\s*''',
webpage, 'error_msg', default=None) or 'This group has no videos.'
self.report_warning(message, group_id)
page_count = 1
PAGE_SIZE = 80
def _get_page(idx):
if not page_count:
return
webpage = self._download_webpage(
page_url, group_id, query={'page': idx + 1},
note='Downloading page %d/%d' % (idx + 1, page_count)
)
if idx > 0:
webpage = self._download_webpage(
page_url, group_id, query={'page': idx + 1},
note='Downloading page %d/%d' % (idx + 1, page_count)
)
for entry in self._extract_entries(webpage, url):
yield entry

View File

@ -1,12 +1,26 @@
import itertools
import json
import re
import time
from base64 import b64encode
from binascii import hexlify
from datetime import datetime
from hashlib import md5
from random import randint
from .common import InfoExtractor
from ..compat import compat_str, compat_urllib_parse_urlencode
from ..utils import float_or_none, sanitized_Request
from ..aes import aes_ecb_encrypt, pkcs7_padding
from ..compat import compat_urllib_parse_urlencode
from ..utils import (
ExtractorError,
bytes_to_intlist,
error_to_compat_str,
float_or_none,
int_or_none,
intlist_to_bytes,
sanitized_Request,
try_get,
)
class NetEaseMusicBaseIE(InfoExtractor):
@ -17,7 +31,7 @@ class NetEaseMusicBaseIE(InfoExtractor):
@classmethod
def _encrypt(cls, dfsid):
salt_bytes = bytearray(cls._NETEASE_SALT.encode('utf-8'))
string_bytes = bytearray(compat_str(dfsid).encode('ascii'))
string_bytes = bytearray(str(dfsid).encode('ascii'))
salt_len = len(salt_bytes)
for i in range(len(string_bytes)):
string_bytes[i] = string_bytes[i] ^ salt_bytes[i % salt_len]
@ -26,32 +40,105 @@ def _encrypt(cls, dfsid):
result = b64encode(m.digest()).decode('ascii')
return result.replace('/', '_').replace('+', '-')
def make_player_api_request_data_and_headers(self, song_id, bitrate):
KEY = b'e82ckenh8dichen8'
URL = '/api/song/enhance/player/url'
now = int(time.time() * 1000)
rand = randint(0, 1000)
cookie = {
'osver': None,
'deviceId': None,
'appver': '8.0.0',
'versioncode': '140',
'mobilename': None,
'buildver': '1623435496',
'resolution': '1920x1080',
'__csrf': '',
'os': 'pc',
'channel': None,
'requestId': '{0}_{1:04}'.format(now, rand),
}
request_text = json.dumps(
{'ids': '[{0}]'.format(song_id), 'br': bitrate, 'header': cookie},
separators=(',', ':'))
message = 'nobody{0}use{1}md5forencrypt'.format(
URL, request_text).encode('latin1')
msg_digest = md5(message).hexdigest()
data = '{0}-36cd479b6b5-{1}-36cd479b6b5-{2}'.format(
URL, request_text, msg_digest)
data = pkcs7_padding(bytes_to_intlist(data))
encrypted = intlist_to_bytes(aes_ecb_encrypt(data, bytes_to_intlist(KEY)))
encrypted_params = hexlify(encrypted).decode('ascii').upper()
cookie = '; '.join(
['{0}={1}'.format(k, v if v is not None else 'undefined')
for [k, v] in cookie.items()])
headers = {
'User-Agent': self.extractor.get_param('http_headers')['User-Agent'],
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': 'https://music.163.com',
'Cookie': cookie,
}
return ('params={0}'.format(encrypted_params), headers)
def _call_player_api(self, song_id, bitrate):
url = 'https://interface3.music.163.com/eapi/song/enhance/player/url'
data, headers = self.make_player_api_request_data_and_headers(song_id, bitrate)
try:
msg = 'empty result'
result = self._download_json(
url, song_id, data=data.encode('ascii'), headers=headers)
if result:
return result
except ExtractorError as e:
if type(e.cause) in (ValueError, TypeError):
# JSON load failure
raise
except Exception as e:
msg = error_to_compat_str(e)
self.report_warning('%s API call (%s) failed: %s' % (
song_id, bitrate, msg))
return {}
def extract_formats(self, info):
err = 0
formats = []
song_id = info['id']
for song_format in self._FORMATS:
details = info.get(song_format)
if not details:
continue
song_file_path = '/%s/%s.%s' % (
self._encrypt(details['dfsId']), details['dfsId'], details['extension'])
# 203.130.59.9, 124.40.233.182, 115.231.74.139, etc is a reverse proxy-like feature
# from NetEase's CDN provider that can be used if m5.music.126.net does not
# work, especially for users outside of Mainland China
# via: https://github.com/JixunMoe/unblock-163/issues/3#issuecomment-163115880
for host in ('http://m5.music.126.net', 'http://115.231.74.139/m1.music.126.net',
'http://124.40.233.182/m1.music.126.net', 'http://203.130.59.9/m1.music.126.net'):
song_url = host + song_file_path
bitrate = int_or_none(details.get('bitrate')) or 999000
data = self._call_player_api(song_id, bitrate)
for song in try_get(data, lambda x: x['data'], list) or []:
song_url = try_get(song, lambda x: x['url'])
if not song_url:
continue
if self._is_valid_url(song_url, info['id'], 'song'):
formats.append({
'url': song_url,
'ext': details.get('extension'),
'abr': float_or_none(details.get('bitrate'), scale=1000),
'abr': float_or_none(song.get('br'), scale=1000),
'format_id': song_format,
'filesize': details.get('size'),
'asr': details.get('sr')
'filesize': int_or_none(song.get('size')),
'asr': int_or_none(details.get('sr')),
})
break
elif err == 0:
err = try_get(song, lambda x: x['code'], int)
if not formats:
msg = 'No media links found'
if err != 0 and (err < 200 or err >= 400):
raise ExtractorError(
'%s (site code %d)' % (msg, err, ), expected=True)
else:
self.raise_geo_restricted(
msg + ': probably this video is not available from your location due to geo restriction.',
countries=['CN'])
return formats
@classmethod
@ -67,33 +154,19 @@ def query_api(self, endpoint, video_id, note):
class NetEaseMusicIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:song'
IE_DESC = '网易云音乐'
_VALID_URL = r'https?://music\.163\.com/(#/)?song\?id=(?P<id>[0-9]+)'
_VALID_URL = r'https?://(y\.)?music\.163\.com/(?:[#m]/)?song\?.*?\bid=(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://music.163.com/#/song?id=32102397',
'md5': 'f2e97280e6345c74ba9d5677dd5dcb45',
'md5': '3e909614ce09b1ccef4a3eb205441190',
'info_dict': {
'id': '32102397',
'ext': 'mp3',
'title': 'Bad Blood (feat. Kendrick Lamar)',
'title': 'Bad Blood',
'creator': 'Taylor Swift / Kendrick Lamar',
'upload_date': '20150517',
'timestamp': 1431878400,
'description': 'md5:a10a54589c2860300d02e1de821eb2ef',
'upload_date': '20150516',
'timestamp': 1431792000,
'description': 'md5:25fc5f27e47aad975aa6d36382c7833c',
},
'skip': 'Blocked outside Mainland China',
}, {
'note': 'No lyrics translation.',
'url': 'http://music.163.com/#/song?id=29822014',
'info_dict': {
'id': '29822014',
'ext': 'mp3',
'title': '听见下雨的声音',
'creator': '周杰伦',
'upload_date': '20141225',
'timestamp': 1419523200,
'description': 'md5:a4d8d89f44656af206b7b2555c0bce6c',
},
'skip': 'Blocked outside Mainland China',
}, {
'note': 'No lyrics.',
'url': 'http://music.163.com/song?id=17241424',
@ -103,9 +176,9 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'title': 'Opus 28',
'creator': 'Dustin O\'Halloran',
'upload_date': '20080211',
'description': 'md5:f12945b0f6e0365e3b73c5032e1b0ff4',
'timestamp': 1202745600,
},
'skip': 'Blocked outside Mainland China',
}, {
'note': 'Has translated name.',
'url': 'http://music.163.com/#/song?id=22735043',
@ -119,7 +192,18 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'timestamp': 1264608000,
'alt_title': '说出愿望吧(Genie)',
},
'skip': 'Blocked outside Mainland China',
}, {
'url': 'https://y.music.163.com/m/song?app_version=8.8.45&id=95670&uct2=sKnvS4+0YStsWkqsPhFijw%3D%3D&dlt=0846',
'md5': '95826c73ea50b1c288b22180ec9e754d',
'info_dict': {
'id': '95670',
'ext': 'mp3',
'title': '国际歌',
'creator': '马备',
'upload_date': '19911130',
'timestamp': 691516800,
'description': 'md5:1ba2f911a2b0aa398479f595224f2141',
},
}]
def _process_lyrics(self, lyrics_info):

View File

@ -58,8 +58,7 @@ def _call_api(self, path, video_id, item=None, note=None, fatal=True, query=None
return self._download_json(
urljoin('https://psapi.nrk.no/', path),
video_id, note or 'Downloading %s JSON' % item,
fatal=fatal, query=query,
headers={'Accept-Encoding': 'gzip, deflate, br'})
fatal=fatal, query=query)
class NRKIE(NRKBaseIE):

View File

@ -0,0 +1,47 @@
from .common import InfoExtractor
from ..utils import traverse_obj
class QingTingIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|m\.)?(?:qingting\.fm|qtfm\.cn)/v?channels/(?P<channel>\d+)/programs/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.qingting.fm/channels/378005/programs/22257411/',
'md5': '47e6a94f4e621ed832c316fd1888fb3c',
'info_dict': {
'id': '22257411',
'title': '用了十年才修改,谁在乎教科书?',
'channel_id': '378005',
'channel': '睡前消息',
'uploader': '马督工',
'ext': 'm4a',
}
}, {
'url': 'https://m.qtfm.cn/vchannels/378005/programs/23023573/',
'md5': '2703120b6abe63b5fa90b975a58f4c0e',
'info_dict': {
'id': '23023573',
'title': '【睡前消息488】重庆山火之后有图≠真相',
'channel_id': '378005',
'channel': '睡前消息',
'uploader': '马督工',
'ext': 'm4a',
}
}]
def _real_extract(self, url):
channel_id, pid = self._match_valid_url(url).group('channel', 'id')
webpage = self._download_webpage(
f'https://m.qtfm.cn/vchannels/{channel_id}/programs/{pid}/', pid)
info = self._search_json(r'window\.__initStores\s*=', webpage, 'program info', pid)
return {
'id': pid,
'title': traverse_obj(info, ('ProgramStore', 'programInfo', 'title')),
'channel_id': channel_id,
'channel': traverse_obj(info, ('ProgramStore', 'channelInfo', 'title')),
'uploader': traverse_obj(info, ('ProgramStore', 'podcasterInfo', 'podcaster', 'nickname')),
'url': traverse_obj(info, ('ProgramStore', 'programInfo', 'audioUrl')),
'vcodec': 'none',
'acodec': 'm4a',
'ext': 'm4a',
}

View File

@ -1,4 +1,5 @@
import functools
import urllib
from .common import InfoExtractor
from ..compat import compat_parse_qs
@ -72,14 +73,20 @@ def _fetch_oauth_token(self, video_id):
self._API_HEADERS['authorization'] = f'Bearer {auth["token"]}'
def _call_api(self, ep, video_id, *args, **kwargs):
if 'authorization' not in self._API_HEADERS:
self._fetch_oauth_token(video_id)
assert 'authorization' in self._API_HEADERS
for attempt in range(2):
if 'authorization' not in self._API_HEADERS:
self._fetch_oauth_token(video_id)
try:
headers = dict(self._API_HEADERS)
headers['x-customheader'] = f'https://www.redgifs.com/watch/{video_id}'
data = self._download_json(
f'https://api.redgifs.com/v2/{ep}', video_id, headers=headers, *args, **kwargs)
break
except ExtractorError as e:
if not attempt and isinstance(e.cause, urllib.error.HTTPError) and e.cause.code == 401:
del self._API_HEADERS['authorization'] # refresh the token
raise
headers = dict(self._API_HEADERS)
headers['x-customheader'] = f'https://www.redgifs.com/watch/{video_id}'
data = self._download_json(
f'https://api.redgifs.com/v2/{ep}', video_id, headers=headers, *args, **kwargs)
if 'error' in data:
raise ExtractorError(f'RedGifs said: {data["error"]}', expected=True, video_id=video_id)
return data

View File

@ -25,7 +25,6 @@ class SkyItPlayerIE(InfoExtractor):
'salesforce': 'C6D585FD1615272C98DE38235F38BD86',
'sitocommerciale': 'VJwfFuSGnLKnd9Phe9y96WkXgYDCguPMJ2dLhGMb2RE',
'sky': 'F96WlOd8yoFmLQgiqv6fNQRvHZcsWk5jDaYnDvhbiJk',
'skyacademy': 'A6LAn7EkO2Q26FRy0IAMBekX6jzDXYL3',
'skyarte': 'LWk29hfiU39NNdq87ePeRach3nzTSV20o0lTv2001Cd',
'theupfront': 'PRSGmDMsg6QMGc04Obpoy7Vsbn7i2Whp',
}
@ -42,11 +41,7 @@ def _parse_video(self, video, video_id):
if not hls_url and video.get('geoblock' if is_live else 'geob'):
self.raise_geo_restricted(countries=['IT'])
if is_live:
formats = self._extract_m3u8_formats(hls_url, video_id, 'mp4')
else:
formats = self._extract_akamai_formats(
hls_url, video_id, {'http': 'videoplatform.sky.it'})
formats = self._extract_m3u8_formats(hls_url, video_id, 'mp4')
self._sort_formats(formats)
return {
@ -80,14 +75,17 @@ class SkyItVideoIE(SkyItPlayerIE):
_VALID_URL = r'https?://(?:masterchef|video|xfactor)\.sky\.it(?:/[^/]+)*/video/[0-9a-z-]+-(?P<id>\d+)'
_TESTS = [{
'url': 'https://video.sky.it/news/mondo/video/uomo-ucciso-da-uno-squalo-in-australia-631227',
'md5': 'fe5c91e59a84a3437eaa0bca6e134ccd',
'md5': '5b858a62d9ffe2ab77b397553024184a',
'info_dict': {
'id': '631227',
'ext': 'mp4',
'title': 'Uomo ucciso da uno squalo in Australia',
'timestamp': 1606036192,
'upload_date': '20201122',
}
'duration': 26,
'thumbnail': 'https://video.sky.it/captures/thumbs/631227/631227_thumb_880x494.jpg',
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://xfactor.sky.it/video/x-factor-2020-replay-audizioni-1-615820',
'only_matching': True,
@ -110,7 +108,8 @@ class SkyItVideoLiveIE(SkyItPlayerIE):
'id': '1',
'ext': 'mp4',
'title': r're:Diretta TG24 \d{4}-\d{2}-\d{2} \d{2}:\d{2}',
'description': 'Guarda la diretta streaming di SkyTg24, segui con Sky tutti gli appuntamenti e gli speciali di Tg24.',
'description': r're:(?:Clicca play e )?[Gg]uarda la diretta streaming di SkyTg24, segui con Sky tutti gli appuntamenti e gli speciali di Tg24\.',
'live_status': 'is_live',
},
'params': {
# m3u8 download
@ -132,15 +131,17 @@ class SkyItIE(SkyItPlayerIE):
IE_NAME = 'sky.it'
_VALID_URL = r'https?://(?:sport|tg24)\.sky\.it(?:/[^/]+)*/\d{4}/\d{2}/\d{2}/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://sport.sky.it/calcio/serie-a/2020/11/21/juventus-cagliari-risultato-gol',
'url': 'https://sport.sky.it/calcio/serie-a/2022/11/03/brozovic-inter-news',
'info_dict': {
'id': '631201',
'id': '789222',
'ext': 'mp4',
'title': 'Un rosso alla violenza: in campo per i diritti delle donne',
'upload_date': '20201121',
'timestamp': 1605995753,
'title': 'Brozovic con il gruppo: verso convocazione per Juve-Inter',
'upload_date': '20221103',
'timestamp': 1667484130,
'duration': 22,
'thumbnail': 'https://videoplatform.sky.it/still/2022/11/03/1667480526353_brozovic_videostill_1.jpg',
},
'expected_warnings': ['Unable to download f4m manifest'],
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://tg24.sky.it/mondo/2020/11/22/australia-squalo-uccide-uomo',
'md5': 'fe5c91e59a84a3437eaa0bca6e134ccd',
@ -150,7 +151,10 @@ class SkyItIE(SkyItPlayerIE):
'title': 'Uomo ucciso da uno squalo in Australia',
'timestamp': 1606036192,
'upload_date': '20201122',
'duration': 26,
'thumbnail': 'https://video.sky.it/captures/thumbs/631227/631227_thumb_880x494.jpg',
},
'params': {'skip_download': 'm3u8'},
}]
_VIDEO_ID_REGEX = r'data-videoid="(\d+)"'
@ -162,40 +166,25 @@ def _real_extract(self, url):
return self._player_url_result(video_id)
class SkyItAcademyIE(SkyItIE):
IE_NAME = 'skyacademy.it'
_VALID_URL = r'https?://(?:www\.)?skyacademy\.it(?:/[^/]+)*/\d{4}/\d{2}/\d{2}/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://www.skyacademy.it/eventi-speciali/2019/07/05/a-lezione-di-cinema-con-sky-academy-/',
'md5': 'ced5c26638b7863190cbc44dd6f6ba08',
'info_dict': {
'id': '523458',
'ext': 'mp4',
'title': 'Sky Academy "The Best CineCamp 2019"',
'timestamp': 1562843784,
'upload_date': '20190711',
}
}]
_DOMAIN = 'skyacademy'
_VIDEO_ID_REGEX = r'id="news-videoId_(\d+)"'
class SkyItArteIE(SkyItIE):
IE_NAME = 'arte.sky.it'
_VALID_URL = r'https?://arte\.sky\.it/video/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://arte.sky.it/video/serie-musei-venezia-collezionismo-12-novembre/',
'url': 'https://arte.sky.it/video/oliviero-toscani-torino-galleria-mazzoleni-788962',
'md5': '515aee97b87d7a018b6c80727d3e7e17',
'info_dict': {
'id': '627926',
'id': '788962',
'ext': 'mp4',
'title': "Musei Galleria Franchetti alla Ca' d'Oro Palazzo Grimani",
'upload_date': '20201106',
'timestamp': 1604664493,
}
'title': 'La fotografia di Oliviero Toscani conquista Torino',
'upload_date': '20221102',
'timestamp': 1667399996,
'duration': 12,
'thumbnail': 'https://videoplatform.sky.it/still/2022/11/02/1667396388552_oliviero-toscani-torino-galleria-mazzoleni_videostill_1.jpg',
},
'params': {'skip_download': 'm3u8'},
}]
_DOMAIN = 'skyarte'
_VIDEO_ID_REGEX = r'(?s)<iframe[^>]+src="(?:https:)?//player\.sky\.it/player/external\.html\?[^"]*\bid=(\d+)'
_VIDEO_ID_REGEX = r'"embedUrl"\s*:\s*"(?:https:)?//player\.sky\.it/player/external\.html\?[^"]*\bid=(\d+)'
class CieloTVItIE(SkyItIE):
@ -210,7 +199,10 @@ class CieloTVItIE(SkyItIE):
'title': 'Il lunedì è sempre un dramma',
'upload_date': '20190329',
'timestamp': 1553862178,
}
'duration': 30,
'thumbnail': 'https://videoplatform.sky.it/still/2019/03/29/1553858575610_lunedi_dramma_mant_videostill_1.jpg',
},
'params': {'skip_download': 'm3u8'},
}]
_DOMAIN = 'cielo'
_VIDEO_ID_REGEX = r'videoId\s*=\s*"(\d+)"'
@ -218,9 +210,9 @@ class CieloTVItIE(SkyItIE):
class TV8ItIE(SkyItVideoIE):
IE_NAME = 'tv8.it'
_VALID_URL = r'https?://tv8\.it/showvideo/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?tv8\.it/(?:show)?video/[0-9a-z-]+-(?P<id>\d+)'
_TESTS = [{
'url': 'https://tv8.it/showvideo/630529/ogni-mattina-ucciso-asino-di-andrea-lo-cicero/18-11-2020/',
'url': 'https://www.tv8.it/video/ogni-mattina-ucciso-asino-di-andrea-lo-cicero-630529',
'md5': '9ab906a3f75ea342ed928442f9dabd21',
'info_dict': {
'id': '630529',
@ -228,6 +220,9 @@ class TV8ItIE(SkyItVideoIE):
'title': 'Ogni mattina - Ucciso asino di Andrea Lo Cicero',
'timestamp': 1605721374,
'upload_date': '20201118',
}
'duration': 114,
'thumbnail': 'https://videoplatform.sky.it/still/2020/11/18/1605717753954_ogni-mattina-ucciso-asino-di-andrea-lo-cicero_videostill_1.jpg',
},
'params': {'skip_download': 'm3u8'},
}]
_DOMAIN = 'mtv8'

View File

@ -1,22 +1,15 @@
from .common import InfoExtractor
from ..compat import (
compat_str,
)
from ..utils import (
ExtractorError,
lowercase_escape,
try_get,
)
from ..utils import ExtractorError, lowercase_escape, traverse_obj
class StripchatIE(InfoExtractor):
_VALID_URL = r'https?://stripchat\.com/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://stripchat.com/feel_me',
'url': 'https://stripchat.com/Joselin_Flower',
'info_dict': {
'id': 'feel_me',
'id': 'Joselin_Flower',
'ext': 'mp4',
'title': 're:^feel_me [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'title': 're:^Joselin_Flower [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': str,
'is_live': True,
'age_limit': 18,
@ -39,18 +32,22 @@ def _real_extract(self, url):
if not data:
raise ExtractorError('Unable to find configuration for stream.')
if try_get(data, lambda x: x['viewCam']['show'], dict):
if traverse_obj(data, ('viewCam', 'show'), expected_type=dict):
raise ExtractorError('Model is in private show', expected=True)
elif not try_get(data, lambda x: x['viewCam']['model']['isLive'], bool):
elif not traverse_obj(data, ('viewCam', 'model', 'isLive'), expected_type=bool):
raise ExtractorError('Model is offline', expected=True)
server = try_get(data, lambda x: x['viewCam']['viewServers']['flashphoner-hls'], compat_str)
host = try_get(data, lambda x: x['config']['data']['hlsStreamHost'], compat_str)
model_id = try_get(data, lambda x: x['viewCam']['model']['id'], int)
server = traverse_obj(data, ('viewCam', 'viewServers', 'flashphoner-hls'), expected_type=str)
model_id = traverse_obj(data, ('viewCam', 'model', 'id'), expected_type=int)
for host in traverse_obj(data, (
'config', 'data', (('featuresV2', 'hlsFallback', 'fallbackDomains', ...), 'hlsStreamHost'))):
formats = self._extract_m3u8_formats(
f'https://b-{server}.{host}/hls/{model_id}/{model_id}.m3u8',
video_id, ext='mp4', m3u8_id='hls', fatal=False, live=True)
if formats:
break
formats = self._extract_m3u8_formats(
'https://b-%s.%s/hls/%d/%d.m3u8' % (server, host, model_id, model_id),
video_id, ext='mp4', m3u8_id='hls', fatal=False, live=True)
self._sort_formats(formats)
return {

View File

@ -0,0 +1,73 @@
from .common import InfoExtractor
from ..utils import int_or_none, traverse_obj
class SwearnetEpisodeIE(InfoExtractor):
_VALID_URL = r'https?://www\.swearnet\.com/shows/(?P<id>[\w-]+)/seasons/(?P<season_num>\d+)/episodes/(?P<episode_num>\d+)'
_TESTS = [{
'url': 'https://www.swearnet.com/shows/gettin-learnt-with-ricky/seasons/1/episodes/1',
'info_dict': {
'id': '232819',
'ext': 'mp4',
'episode_number': 1,
'episode': 'Episode 1',
'duration': 719,
'description': 'md5:c48ef71440ce466284c07085cd7bd761',
'season': 'Season 1',
'title': 'Episode 1 - Grilled Cheese Sammich',
'season_number': 1,
'thumbnail': 'https://cdn.vidyard.com/thumbnails/232819/_RX04IKIq60a2V6rIRqq_Q_small.jpg',
}
}]
def _get_formats_and_subtitle(self, video_source, video_id):
video_source = video_source or {}
formats, subtitles = [], {}
for key, value in video_source.items():
if key == 'hls':
for video_hls in value:
fmts, subs = self._extract_m3u8_formats_and_subtitles(video_hls.get('url'), video_id)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
formats.extend({
'url': video_mp4.get('url'),
'ext': 'mp4'
} for video_mp4 in value)
return formats, subtitles
def _get_direct_subtitle(self, caption_json):
subs = {}
for caption in caption_json:
subs.setdefault(caption.get('language') or 'und', []).append({
'url': caption.get('vttUrl'),
'name': caption.get('name')
})
return subs
def _real_extract(self, url):
display_id, season_number, episode_number = self._match_valid_url(url).group('id', 'season_num', 'episode_num')
webpage = self._download_webpage(url, display_id)
external_id = self._search_regex(r'externalid\s*=\s*"([^"]+)', webpage, 'externalid')
json_data = self._download_json(
f'https://play.vidyard.com/player/{external_id}.json', display_id)['payload']['chapters'][0]
formats, subtitles = self._get_formats_and_subtitle(json_data['sources'], display_id)
self._merge_subtitles(self._get_direct_subtitle(json_data.get('captions')), target=subtitles)
return {
'id': str(json_data['videoId']),
'title': json_data.get('name') or self._html_search_meta(['og:title', 'twitter:title'], webpage),
'description': (json_data.get('description')
or self._html_search_meta(['og:description', 'twitter:description'])),
'duration': int_or_none(json_data.get('seconds')),
'formats': formats,
'subtitles': subtitles,
'season_number': int_or_none(season_number),
'episode_number': int_or_none(episode_number),
'thumbnails': [{'url': thumbnail_url}
for thumbnail_url in traverse_obj(json_data, ('thumbnailUrls', ...))]
}

View File

@ -1,41 +1,137 @@
import re
from .common import InfoExtractor
from ..utils import clean_html, get_element_by_class
from ..utils import (
clean_html,
format_field,
get_element_by_class,
parse_duration,
parse_qs,
traverse_obj,
unified_timestamp,
update_url_query,
url_basename,
)
class TelegramEmbedIE(InfoExtractor):
IE_NAME = 'telegram:embed'
_VALID_URL = r'https?://t\.me/(?P<channel_name>[^/]+)/(?P<id>\d+)'
_VALID_URL = r'https?://t\.me/(?P<channel_id>[^/]+)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://t.me/europa_press/613',
'md5': 'dd707708aea958c11a590e8068825f22',
'info_dict': {
'id': '613',
'ext': 'mp4',
'title': 'Europa Press',
'description': '6ce2d7e8d56eda16d80607b23db7b252',
'thumbnail': r're:^https?:\/\/cdn.*?telesco\.pe\/file\/\w+',
'title': 'md5:6ce2d7e8d56eda16d80607b23db7b252',
'description': 'md5:6ce2d7e8d56eda16d80607b23db7b252',
'channel_id': 'europa_press',
'channel': 'Europa Press ✔',
'thumbnail': r're:^https?://.+',
'timestamp': 1635631203,
'upload_date': '20211030',
'duration': 61,
},
}, {
# 2-video post
'url': 'https://t.me/vorposte/29342',
'info_dict': {
'id': 'vorposte-29342',
'title': 'Форпост 29342',
'description': 'md5:9d92e22169a3e136d5d69df25f82c3dc',
},
'playlist_count': 2,
'params': {
'skip_download': True,
},
}, {
# 2-video post with --no-playlist
'url': 'https://t.me/vorposte/29343',
'md5': '1724e96053c18e788c8464038876e245',
'info_dict': {
'id': '29343',
'ext': 'mp4',
'title': 'md5:9d92e22169a3e136d5d69df25f82c3dc',
'description': 'md5:9d92e22169a3e136d5d69df25f82c3dc',
'channel_id': 'vorposte',
'channel': 'Форпост',
'thumbnail': r're:^https?://.+',
'timestamp': 1666384480,
'upload_date': '20221021',
'duration': 35,
},
'params': {
'noplaylist': True,
}
}, {
# 2-video post with 'single' query param
'url': 'https://t.me/vorposte/29342?single',
'md5': 'd20b202f1e41400a9f43201428add18f',
'info_dict': {
'id': '29342',
'ext': 'mp4',
'title': 'md5:9d92e22169a3e136d5d69df25f82c3dc',
'description': 'md5:9d92e22169a3e136d5d69df25f82c3dc',
'channel_id': 'vorposte',
'channel': 'Форпост',
'thumbnail': r're:^https?://.+',
'timestamp': 1666384480,
'upload_date': '20221021',
'duration': 33,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, query={'embed': 0})
webpage_embed = self._download_webpage(url, video_id, query={'embed': 1}, note='Downloading ermbed page')
channel_id, msg_id = self._match_valid_url(url).group('channel_id', 'id')
embed = self._download_webpage(
url, msg_id, query={'embed': '1', 'single': []}, note='Downloading embed frame')
formats = [{
'url': self._proto_relative_url(self._search_regex(
'<video[^>]+src="([^"]+)"', webpage_embed, 'source')),
'ext': 'mp4',
}]
self._sort_formats(formats)
def clean_text(html_class, html):
text = clean_html(get_element_by_class(html_class, html))
return text.replace('\n', ' ') if text else None
return {
'id': video_id,
'title': self._html_search_meta(['og:title', 'twitter:title'], webpage, default=None),
'description': self._html_search_meta(
['og:description', 'twitter:description'], webpage,
default=clean_html(get_element_by_class('tgme_widget_message_text', webpage_embed))),
'thumbnail': self._search_regex(
r'tgme_widget_message_video_thumb"[^>]+background-image:url\(\'([^\']+)\'\)',
webpage_embed, 'thumbnail'),
'formats': formats,
description = clean_text('tgme_widget_message_text', embed)
message = {
'title': description or '',
'description': description,
'channel': clean_text('tgme_widget_message_author', embed),
'channel_id': channel_id,
'timestamp': unified_timestamp(self._search_regex(
r'<time[^>]*datetime="([^"]*)"', embed, 'timestamp', fatal=False)),
}
videos = []
for video in re.findall(r'<a class="tgme_widget_message_video_player(?s:.+?)</time>', embed):
video_url = self._search_regex(
r'<video[^>]+src="([^"]+)"', video, 'video URL', fatal=False)
webpage_url = self._search_regex(
r'<a class="tgme_widget_message_video_player[^>]+href="([^"]+)"',
video, 'webpage URL', fatal=False)
if not video_url or not webpage_url:
continue
formats = [{
'url': video_url,
'ext': 'mp4',
}]
self._sort_formats(formats)
videos.append({
'id': url_basename(webpage_url),
'webpage_url': update_url_query(webpage_url, {'single': True}),
'duration': parse_duration(self._search_regex(
r'<time[^>]+duration[^>]*>([\d:]+)</time>', video, 'duration', fatal=False)),
'thumbnail': self._search_regex(
r'tgme_widget_message_video_thumb"[^>]+background-image:url\(\'([^\']+)\'\)',
video, 'thumbnail', fatal=False),
'formats': formats,
**message,
})
playlist_id = None
if len(videos) > 1 and 'single' not in parse_qs(url, keep_blank_values=True):
playlist_id = f'{channel_id}-{msg_id}'
if self._yes_playlist(playlist_id, msg_id):
return self.playlist_result(
videos, playlist_id, format_field(message, 'channel', f'%s {msg_id}'), description)
else:
return traverse_obj(videos, lambda _, x: x['id'] == msg_id, get_all=False)

View File

@ -4,40 +4,51 @@
from .common import InfoExtractor
from ..utils import (
clean_html,
determine_ext,
dict_get,
ExtractorError,
int_or_none,
js_to_json,
orderedSet,
str_or_none,
strip_or_none,
traverse_obj,
try_get,
url_or_none,
)
class TVPIE(InfoExtractor):
IE_NAME = 'tvp'
IE_DESC = 'Telewizja Polska'
_VALID_URL = r'https?://(?:[^/]+\.)?(?:tvp(?:parlament)?\.(?:pl|info)|polandin\.com)/(?:video/(?:[^,\s]*,)*|(?:(?!\d+/)[^/]+/)*)(?P<id>\d+)'
_VALID_URL = r'https?://(?:[^/]+\.)?(?:tvp(?:parlament)?\.(?:pl|info)|tvpworld\.com|swipeto\.pl)/(?:(?!\d+/)[^/]+/)*(?P<id>\d+)'
_TESTS = [{
# TVPlayer 2 in js wrapper
'url': 'https://vod.tvp.pl/video/czas-honoru,i-seria-odc-13,194536',
'url': 'https://swipeto.pl/64095316/uliczny-foxtrot-wypozyczalnia-kaset-kto-pamieta-dvdvideo',
'info_dict': {
'id': '194536',
'id': '64095316',
'ext': 'mp4',
'title': 'Czas honoru, odc. 13 Władek',
'description': 'md5:437f48b93558370b031740546b696e24',
'age_limit': 12,
'title': 'Uliczny Foxtrot — Wypożyczalnia kaset. Kto pamięta DVD-Video?',
'age_limit': 0,
'duration': 374,
'thumbnail': r're:https://.+',
},
'expected_warnings': [
'Failed to download ISM manifest: HTTP Error 404: Not Found',
'Failed to download m3u8 information: HTTP Error 404: Not Found',
],
}, {
# TVPlayer legacy
'url': 'http://www.tvp.pl/there-can-be-anything-so-i-shortened-it/17916176',
'url': 'https://www.tvp.pl/polska-press-video-uploader/wideo/62042351',
'info_dict': {
'id': '17916176',
'id': '62042351',
'ext': 'mp4',
'title': 'TVP Gorzów pokaże filmy studentów z podroży dookoła świata',
'description': 'TVP Gorzów pokaże filmy studentów z podroży dookoła świata',
'title': 'Wideo',
'description': 'Wideo Kamera',
'duration': 24,
'age_limit': 0,
'thumbnail': r're:https://.+',
},
}, {
# TVPlayer 2 in iframe
@ -48,6 +59,8 @@ class TVPIE(InfoExtractor):
'title': 'Dzieci na sprzedaż dla homoseksualistów',
'description': 'md5:7d318eef04e55ddd9f87a8488ac7d590',
'age_limit': 12,
'duration': 259,
'thumbnail': r're:https://.+',
},
}, {
# TVPlayer 2 in client-side rendered website (regional; window.__newsData)
@ -58,7 +71,11 @@ class TVPIE(InfoExtractor):
'title': 'Studio Yayo',
'upload_date': '20160616',
'timestamp': 1466075700,
}
'age_limit': 0,
'duration': 20,
'thumbnail': r're:https://.+',
},
'skip': 'Geo-blocked outside PL',
}, {
# TVPlayer 2 in client-side rendered website (tvp.info; window.__videoData)
'url': 'https://www.tvp.info/52880236/09042021-0800',
@ -66,7 +83,10 @@ class TVPIE(InfoExtractor):
'id': '52880236',
'ext': 'mp4',
'title': '09.04.2021, 08:00',
'age_limit': 0,
'thumbnail': r're:https://.+',
},
'skip': 'Geo-blocked outside PL',
}, {
# client-side rendered (regional) program (playlist) page
'url': 'https://opole.tvp.pl/9660819/rozmowa-dnia',
@ -122,7 +142,7 @@ class TVPIE(InfoExtractor):
'url': 'https://www.tvpparlament.pl/retransmisje-vod/inne/wizyta-premiera-mateusza-morawieckiego-w-firmie-berotu-sp-z-oo/48857277',
'only_matching': True,
}, {
'url': 'https://polandin.com/47942651/pln-10-billion-in-subsidies-transferred-to-companies-pm',
'url': 'https://tvpworld.com/48583640/tescos-polish-business-bought-by-danish-chain-netto',
'only_matching': True,
}]
@ -151,16 +171,13 @@ def _extract_vue_video(self, video_data, page_id=None):
is_website = video_data.get('type') == 'website'
if is_website:
url = video_data['url']
fucked_up_url_parts = re.match(r'https?://vod\.tvp\.pl/(\d+)/([^/?#]+)', url)
if fucked_up_url_parts:
url = f'https://vod.tvp.pl/website/{fucked_up_url_parts.group(2)},{fucked_up_url_parts.group(1)}'
else:
url = 'tvp:' + str_or_none(video_data.get('_id') or page_id)
return {
'_type': 'url_transparent',
'id': str_or_none(video_data.get('_id') or page_id),
'url': url,
'ie_key': 'TVPEmbed' if not is_website else 'TVPWebsite',
'ie_key': (TVPIE if is_website else TVPEmbedIE).ie_key(),
'title': str_or_none(video_data.get('title')),
'description': str_or_none(video_data.get('lead')),
'timestamp': int_or_none(video_data.get('release_date_long')),
@ -217,8 +234,9 @@ def _real_extract(self, url):
# The URL may redirect to a VOD
# example: https://vod.tvp.pl/48463890/wadowickie-spotkania-z-janem-pawlem-ii
if TVPWebsiteIE.suitable(urlh.url):
return self.url_result(urlh.url, ie=TVPWebsiteIE.ie_key(), video_id=page_id)
for ie_cls in (TVPVODSeriesIE, TVPVODVideoIE):
if ie_cls.suitable(urlh.url):
return self.url_result(urlh.url, ie=ie_cls.ie_key(), video_id=page_id)
if re.search(
r'window\.__(?:video|news|website|directory)Data\s*=',
@ -297,12 +315,13 @@ def _real_extract(self, url):
class TVPEmbedIE(InfoExtractor):
IE_NAME = 'tvp:embed'
IE_DESC = 'Telewizja Polska'
_GEO_BYPASS = False
_VALID_URL = r'''(?x)
(?:
tvp:
|https?://
(?:[^/]+\.)?
(?:tvp(?:parlament)?\.pl|tvp\.info|polandin\.com)/
(?:tvp(?:parlament)?\.pl|tvp\.info|tvpworld\.com|swipeto\.pl)/
(?:sess/
(?:tvplayer\.php\?.*?object_id
|TVPlayer2/(?:embed|api)\.php\?.*[Ii][Dd])
@ -320,6 +339,12 @@ class TVPEmbedIE(InfoExtractor):
'title': 'Czas honoru, odc. 13 Władek',
'description': 'md5:76649d2014f65c99477be17f23a4dead',
'age_limit': 12,
'duration': 2652,
'series': 'Czas honoru',
'episode': 'Episode 13',
'episode_number': 13,
'season': 'sezon 1',
'thumbnail': r're:https://.+',
},
}, {
'url': 'https://www.tvp.pl/sess/tvplayer.php?object_id=51247504&amp;autoplay=false',
@ -327,6 +352,9 @@ class TVPEmbedIE(InfoExtractor):
'id': '51247504',
'ext': 'mp4',
'title': 'Razmova 091220',
'duration': 876,
'age_limit': 0,
'thumbnail': r're:https://.+',
},
}, {
# TVPlayer2 embed URL
@ -361,40 +389,48 @@ def _real_extract(self, url):
# stripping JSONP padding
datastr = webpage[15 + len(callback):-3]
if datastr.startswith('null,'):
error = self._parse_json(datastr[5:], video_id)
raise ExtractorError(error[0]['desc'])
error = self._parse_json(datastr[5:], video_id, fatal=False)
error_desc = traverse_obj(error, (0, 'desc'))
if error_desc == 'Obiekt wymaga płatności':
raise ExtractorError('Video requires payment and log-in, but log-in is not implemented')
raise ExtractorError(error_desc or 'unexpected JSON error')
content = self._parse_json(datastr, video_id)['content']
info = content['info']
is_live = try_get(info, lambda x: x['isLive'], bool)
if info.get('isGeoBlocked'):
# actual country list is not provided, we just assume it's always available in PL
self.raise_geo_restricted(countries=['PL'])
formats = []
for file in content['files']:
video_url = file.get('url')
video_url = url_or_none(file.get('url'))
if not video_url:
continue
if video_url.endswith('.m3u8'):
ext = determine_ext(video_url, None)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(video_url, video_id, m3u8_id='hls', fatal=False, live=is_live))
elif video_url.endswith('.mpd'):
elif ext == 'mpd':
if is_live:
# doesn't work with either ffmpeg or native downloader
continue
formats.extend(self._extract_mpd_formats(video_url, video_id, mpd_id='dash', fatal=False))
elif video_url.endswith('.f4m'):
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(video_url, video_id, f4m_id='hds', fatal=False))
elif video_url.endswith('.ism/manifest'):
formats.extend(self._extract_ism_formats(video_url, video_id, ism_id='mss', fatal=False))
else:
# mp4, wmv or something
quality = file.get('quality', {})
formats.append({
'format_id': 'direct',
'url': video_url,
'ext': determine_ext(video_url, file['type']),
'fps': int_or_none(quality.get('fps')),
'tbr': int_or_none(quality.get('bitrate')),
'width': int_or_none(quality.get('width')),
'height': int_or_none(quality.get('height')),
'ext': ext or file.get('type'),
'fps': int_or_none(traverse_obj(file, ('quality', 'fps'))),
'tbr': int_or_none(traverse_obj(file, ('quality', 'bitrate')), scale=1000),
'width': int_or_none(traverse_obj(file, ('quality', 'width'))),
'height': int_or_none(traverse_obj(file, ('quality', 'height'))),
})
self._sort_formats(formats)
@ -449,57 +485,105 @@ def _real_extract(self, url):
return info_dict
class TVPWebsiteIE(InfoExtractor):
IE_NAME = 'tvp:series'
_VALID_URL = r'https?://vod\.tvp\.pl/website/(?P<display_id>[^,]+),(?P<id>\d+)'
class TVPVODBaseIE(InfoExtractor):
_API_BASE_URL = 'https://vod.tvp.pl/api/products'
def _call_api(self, resource, video_id, **kwargs):
return self._download_json(
f'{self._API_BASE_URL}/{resource}', video_id,
query={'lang': 'pl', 'platform': 'BROWSER'}, **kwargs)
def _parse_video(self, video):
return {
'_type': 'url',
'url': 'tvp:' + video['externalUid'],
'ie_key': TVPEmbedIE.ie_key(),
'title': video.get('title'),
'description': traverse_obj(video, ('lead', 'description')),
'age_limit': int_or_none(video.get('rating')),
'duration': int_or_none(video.get('duration')),
}
class TVPVODVideoIE(TVPVODBaseIE):
IE_NAME = 'tvp:vod'
_VALID_URL = r'https?://vod\.tvp\.pl/[a-z\d-]+,\d+/[a-z\d-]+(?<!-odcinki)(?:-odcinki,\d+/odcinek-\d+,S\d+E\d+)?,(?P<id>\d+)(?:\?[^#]+)?(?:#.+)?$'
_TESTS = [{
# series
'url': 'https://vod.tvp.pl/website/wspaniale-stulecie,17069012/video',
'url': 'https://vod.tvp.pl/dla-dzieci,24/laboratorium-alchemika-odcinki,309338/odcinek-24,S01E24,311357',
'info_dict': {
'id': '17069012',
},
'playlist_count': 312,
}, {
# film
'url': 'https://vod.tvp.pl/website/krzysztof-krawczyk-cale-moje-zycie,51374466',
'info_dict': {
'id': '51374509',
'id': '60468609',
'ext': 'mp4',
'title': 'Krzysztof Krawczyk całe moje życie, Krzysztof Krawczyk całe moje życie',
'description': 'md5:2e80823f00f5fc263555482f76f8fa42',
'age_limit': 12,
'title': 'Laboratorium alchemika, Tusze termiczne. Jak zobaczyć niewidoczne. Odcinek 24',
'description': 'md5:1d4098d3e537092ccbac1abf49b7cd4c',
'duration': 300,
'episode_number': 24,
'episode': 'Episode 24',
'age_limit': 0,
'series': 'Laboratorium alchemika',
'thumbnail': 're:https://.+',
},
'params': {
'skip_download': True,
},
'add_ie': ['TVPEmbed'],
}, {
'url': 'https://vod.tvp.pl/website/lzy-cennet,38678312',
'url': 'https://vod.tvp.pl/filmy-dokumentalne,163/ukrainski-sluga-narodu,339667',
'info_dict': {
'id': '51640077',
'ext': 'mp4',
'title': 'Ukraiński sługa narodu, Ukraiński sługa narodu',
'series': 'Ukraiński sługa narodu',
'description': 'md5:b7940c0a8e439b0c81653a986f544ef3',
'age_limit': 12,
'episode': 'Episode 0',
'episode_number': 0,
'duration': 3051,
'thumbnail': 're:https://.+',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self._parse_video(self._call_api(f'vods/{video_id}', video_id))
class TVPVODSeriesIE(TVPVODBaseIE):
IE_NAME = 'tvp:vod:series'
_VALID_URL = r'https?://vod\.tvp\.pl/[a-z\d-]+,\d+/[a-z\d-]+-odcinki,(?P<id>\d+)(?:\?[^#]+)?(?:#.+)?$'
_TESTS = [{
'url': 'https://vod.tvp.pl/seriale,18/ranczo-odcinki,316445',
'info_dict': {
'id': '316445',
'title': 'Ranczo',
'age_limit': 12,
'categories': ['seriale'],
},
'playlist_count': 129,
}, {
'url': 'https://vod.tvp.pl/programy,88/rolnik-szuka-zony-odcinki,284514',
'only_matching': True,
}, {
'url': 'https://vod.tvp.pl/dla-dzieci,24/laboratorium-alchemika-odcinki,309338',
'only_matching': True,
}]
def _entries(self, display_id, playlist_id):
url = 'https://vod.tvp.pl/website/%s,%s/video' % (display_id, playlist_id)
for page_num in itertools.count(1):
page = self._download_webpage(
url, display_id, 'Downloading page %d' % page_num,
query={'page': page_num})
video_ids = orderedSet(re.findall(
r'<a[^>]+\bhref=["\']/video/%s,[^,]+,(\d+)' % display_id,
page))
if not video_ids:
break
for video_id in video_ids:
yield self.url_result(
'tvp:%s' % video_id, ie=TVPEmbedIE.ie_key(),
video_id=video_id)
def _entries(self, seasons, playlist_id):
for season in seasons:
episodes = self._call_api(
f'vods/serials/{playlist_id}/seasons/{season["id"]}/episodes', playlist_id,
note=f'Downloading episode list for {season["title"]}')
yield from map(self._parse_video, episodes)
def _real_extract(self, url):
mobj = self._match_valid_url(url)
display_id, playlist_id = mobj.group('display_id', 'id')
playlist_id = self._match_id(url)
metadata = self._call_api(
f'vods/serials/{playlist_id}', playlist_id,
note='Downloading serial metadata')
seasons = self._call_api(
f'vods/serials/{playlist_id}/seasons', playlist_id,
note='Downloading season list')
return self.playlist_result(
self._entries(display_id, playlist_id), playlist_id)
self._entries(seasons, playlist_id), playlist_id, strip_or_none(metadata.get('title')),
clean_html(traverse_obj(metadata, ('description', 'lead'), expected_type=strip_or_none)),
categories=[traverse_obj(metadata, ('mainCategory', 'name'))],
age_limit=int_or_none(metadata.get('rating')),
)

View File

@ -870,7 +870,7 @@ def _real_extract(self, url):
if '://player.vimeo.com/video/' in url:
config = self._parse_json(self._search_regex(
r'\bconfig\s*=\s*({.+?})\s*;', webpage, 'info section'), video_id)
r'\b(?:playerC|c)onfig\s*=\s*({.+?})\s*;', webpage, 'info section'), video_id)
if config.get('view') == 4:
config = self._verify_player_video_password(
redirect_url, video_id, headers)

View File

@ -13,6 +13,7 @@
merge_dicts,
str_or_none,
strip_or_none,
traverse_obj,
try_get,
urlencode_postdata,
url_or_none,
@ -81,6 +82,13 @@ class VLiveIE(VLiveBaseIE):
'upload_date': '20150817',
'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
'timestamp': 1439816449,
'like_count': int,
'channel': 'Girl\'s Day',
'channel_id': 'FDF27',
'comment_count': int,
'release_timestamp': 1439818140,
'release_date': '20150817',
'duration': 1014,
},
'params': {
'skip_download': True,
@ -98,6 +106,13 @@ class VLiveIE(VLiveBaseIE):
'upload_date': '20161112',
'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
'timestamp': 1478923074,
'like_count': int,
'channel': 'EXO',
'channel_id': 'F94BD',
'comment_count': int,
'release_timestamp': 1478924280,
'release_date': '20161112',
'duration': 906,
},
'params': {
'skip_download': True,
@ -169,6 +184,7 @@ def get_common_fields():
'like_count': int_or_none(video.get('likeCount')),
'comment_count': int_or_none(video.get('commentCount')),
'timestamp': int_or_none(video.get('createdAt'), scale=1000),
'release_timestamp': int_or_none(traverse_obj(video, 'onAirStartAt', 'willStartAt'), scale=1000),
'thumbnail': video.get('thumb'),
}

View File

@ -255,7 +255,7 @@ class ZenYandexIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
redirect = self._search_json(r'var it\s*=\s*', webpage, 'redirect', id, default={}).get('retpath')
redirect = self._search_json(r'var it\s*=', webpage, 'redirect', id, default={}).get('retpath')
if redirect:
video_id = self._match_id(redirect)
webpage = self._download_webpage(redirect, video_id, note='Redirecting')
@ -373,7 +373,7 @@ def _real_extract(self, url):
item_id = self._match_id(url)
webpage = self._download_webpage(url, item_id)
redirect = self._search_json(
r'var it\s*=\s*', webpage, 'redirect', item_id, default={}).get('retpath')
r'var it\s*=', webpage, 'redirect', item_id, default={}).get('retpath')
if redirect:
item_id = self._match_id(redirect)
webpage = self._download_webpage(redirect, item_id, note='Redirecting')

View File

@ -369,14 +369,24 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
r'(?:www\.)?hpniueoejy4opn7bc4ftgazyqjoeqwlvh2uiku2xqku6zpoa4bf5ruid\.onion',
# piped instances from https://github.com/TeamPiped/Piped/wiki/Instances
r'(?:www\.)?piped\.kavin\.rocks',
r'(?:www\.)?piped\.silkky\.cloud',
r'(?:www\.)?piped\.tokhmi\.xyz',
r'(?:www\.)?piped\.moomoo\.me',
r'(?:www\.)?il\.ax',
r'(?:www\.)?piped\.syncpundit\.com',
r'(?:www\.)?piped\.syncpundit\.io',
r'(?:www\.)?piped\.mha\.fi',
r'(?:www\.)?watch\.whatever\.social',
r'(?:www\.)?piped\.garudalinux\.org',
r'(?:www\.)?piped\.rivo\.lol',
r'(?:www\.)?piped-libre\.kavin\.rocks',
r'(?:www\.)?yt\.jae\.fi',
r'(?:www\.)?piped\.mint\.lgbt',
r'(?:www\.)?piped\.privacy\.com\.de',
r'(?:www\.)?il\.ax',
r'(?:www\.)?piped\.esmailelbob\.xyz',
r'(?:www\.)?piped\.projectsegfau\.lt',
r'(?:www\.)?piped\.privacydev\.net',
r'(?:www\.)?piped\.palveluntarjoaja\.eu',
r'(?:www\.)?piped\.smnz\.de',
r'(?:www\.)?piped\.adminforge\.de',
r'(?:www\.)?watch\.whatevertinfoil\.de',
r'(?:www\.)?piped\.qdi\.fi',
)
# extracted from account/account_menu ep

View File

@ -3,13 +3,14 @@
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
NO_DEFAULT,
ExtractorError,
determine_ext,
extract_attributes,
float_or_none,
int_or_none,
join_nonempty,
merge_dicts,
NO_DEFAULT,
orderedSet,
parse_codecs,
qualities,
traverse_obj,
@ -188,7 +189,7 @@ class ZDFIE(ZDFBaseIE):
},
}, {
'url': 'https://www.zdf.de/funk/druck-11790/funk-alles-ist-verzaubert-102.html',
'md5': '57af4423db0455a3975d2dc4578536bc',
'md5': '1b93bdec7d02fc0b703c5e7687461628',
'info_dict': {
'ext': 'mp4',
'id': 'video_funk_1770473',
@ -250,17 +251,15 @@ def _extract_entry(self, url, player, content, video_id):
title = content.get('title') or content['teaserHeadline']
t = content['mainVideoContent']['http://zdf.de/rels/target']
ptmd_path = t.get('http://zdf.de/rels/streams/ptmd')
ptmd_path = traverse_obj(t, (
(('streams', 'default'), None),
('http://zdf.de/rels/streams/ptmd', 'http://zdf.de/rels/streams/ptmd-template')
), get_all=False)
if not ptmd_path:
ptmd_path = traverse_obj(
t, ('streams', 'default', 'http://zdf.de/rels/streams/ptmd-template'),
'http://zdf.de/rels/streams/ptmd-template').replace(
'{playerId}', 'ngplayer_2_4')
raise ExtractorError('Could not extract ptmd_path')
info = self._extract_ptmd(
urljoin(url, ptmd_path), video_id, player['apiToken'], url)
urljoin(url, ptmd_path.replace('{playerId}', 'ngplayer_2_4')), video_id, player['apiToken'], url)
thumbnails = []
layouts = try_get(
@ -309,15 +308,16 @@ def _extract_mobile(self, video_id):
'https://zdf-cdn.live.cellular.de/mediathekV2/document/%s' % video_id,
video_id)
document = video['document']
title = document['titel']
content_id = document['basename']
formats = []
format_urls = set()
for f in document['formitaeten']:
self._extract_format(content_id, formats, format_urls, f)
formitaeten = try_get(video, lambda x: x['document']['formitaeten'], list)
document = formitaeten and video['document']
if formitaeten:
title = document['titel']
content_id = document['basename']
format_urls = set()
for f in formitaeten or []:
self._extract_format(content_id, formats, format_urls, f)
self._sort_formats(formats)
thumbnails = []
@ -364,9 +364,9 @@ class ZDFChannelIE(ZDFBaseIE):
'url': 'https://www.zdf.de/sport/das-aktuelle-sportstudio',
'info_dict': {
'id': 'das-aktuelle-sportstudio',
'title': 'das aktuelle sportstudio | ZDF',
'title': 'das aktuelle sportstudio',
},
'playlist_mincount': 23,
'playlist_mincount': 18,
}, {
'url': 'https://www.zdf.de/dokumentation/planet-e',
'info_dict': {
@ -374,6 +374,14 @@ class ZDFChannelIE(ZDFBaseIE):
'title': 'planet e.',
},
'playlist_mincount': 50,
}, {
'url': 'https://www.zdf.de/gesellschaft/aktenzeichen-xy-ungeloest',
'info_dict': {
'id': 'aktenzeichen-xy-ungeloest',
'title': 'Aktenzeichen XY... ungelöst',
'entries': "lambda x: not any('xy580-fall1-kindermoerder-gesucht-100' in e['url'] for e in x)",
},
'playlist_mincount': 2,
}, {
'url': 'https://www.zdf.de/filme/taunuskrimi/',
'only_matching': True,
@ -383,60 +391,36 @@ class ZDFChannelIE(ZDFBaseIE):
def suitable(cls, url):
return False if ZDFIE.suitable(url) else super(ZDFChannelIE, cls).suitable(url)
def _og_search_title(self, webpage, fatal=False):
title = super(ZDFChannelIE, self)._og_search_title(webpage, fatal=fatal)
return re.split(r'\s+[-|]\s+ZDF(?:mediathek)?$', title or '')[0] or None
def _real_extract(self, url):
channel_id = self._match_id(url)
webpage = self._download_webpage(url, channel_id)
entries = [
self.url_result(item_url, ie=ZDFIE.ie_key())
for item_url in orderedSet(re.findall(
r'data-plusbar-url=["\'](http.+?\.html)', webpage))]
matches = re.finditer(
r'''<div\b[^>]*?\sdata-plusbar-id\s*=\s*(["'])(?P<p_id>[\w-]+)\1[^>]*?\sdata-plusbar-url=\1(?P<url>%s)\1''' % ZDFIE._VALID_URL,
webpage)
return self.playlist_result(
entries, channel_id, self._og_search_title(webpage, fatal=False))
if self._downloader.params.get('noplaylist', False):
entry = next(
(self.url_result(m.group('url'), ie=ZDFIE.ie_key()) for m in matches),
None)
self.to_screen('Downloading just the main video because of --no-playlist')
if entry:
return entry
else:
self.to_screen('Downloading playlist %s - add --no-playlist to download just the main video' % (channel_id, ))
r"""
player = self._extract_player(webpage, channel_id)
def check_video(m):
v_ref = self._search_regex(
r'''(<a\b[^>]*?\shref\s*=[^>]+?\sdata-target-id\s*=\s*(["'])%s\2[^>]*>)''' % (m.group('p_id'), ),
webpage, 'check id', default='')
v_ref = extract_attributes(v_ref)
return v_ref.get('data-target-video-type') != 'novideo'
channel_id = self._search_regex(
r'docId\s*:\s*(["\'])(?P<id>(?!\1).+?)\1', webpage,
'channel id', group='id')
channel = self._call_api(
'https://api.zdf.de/content/documents/%s.json' % channel_id,
player, url, channel_id)
items = []
for module in channel['module']:
for teaser in try_get(module, lambda x: x['teaser'], list) or []:
t = try_get(
teaser, lambda x: x['http://zdf.de/rels/target'], dict)
if not t:
continue
items.extend(try_get(
t,
lambda x: x['resultsWithVideo']['http://zdf.de/rels/search/results'],
list) or [])
items.extend(try_get(
module,
lambda x: x['filterRef']['resultsWithVideo']['http://zdf.de/rels/search/results'],
list) or [])
entries = []
entry_urls = set()
for item in items:
t = try_get(item, lambda x: x['http://zdf.de/rels/target'], dict)
if not t:
continue
sharing_url = t.get('http://zdf.de/rels/sharing-url')
if not sharing_url or not isinstance(sharing_url, compat_str):
continue
if sharing_url in entry_urls:
continue
entry_urls.add(sharing_url)
entries.append(self.url_result(
sharing_url, ie=ZDFIE.ie_key(), video_id=t.get('id')))
return self.playlist_result(entries, channel_id, channel.get('title'))
"""
return self.playlist_from_matches(
(m.group('url') for m in matches if check_video(m)),
channel_id, self._og_search_title(webpage, fatal=False))

View File

@ -294,9 +294,10 @@ def _create_alias(option, opt_str, value, parser):
aliases = (x if x.startswith('-') else f'--{x}' for x in map(str.strip, aliases.split(',')))
try:
args = [f'ARG{i}' for i in range(nargs)]
alias_group.add_option(
*aliases, help=opts, nargs=nargs, dest=parser.ALIAS_DEST, type='str' if nargs else None,
metavar=' '.join(f'ARG{i}' for i in range(nargs)), action='callback',
*aliases, nargs=nargs, dest=parser.ALIAS_DEST, type='str' if nargs else None,
metavar=' '.join(args), help=opts.format(*args), action='callback',
callback=_alias_callback, callback_kwargs={'opts': opts, 'nargs': nargs})
except Exception as err:
raise optparse.OptionValueError(f'wrong {opt_str} formatting; {err}')
@ -549,11 +550,11 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
selection.add_option(
'--min-filesize',
metavar='SIZE', dest='min_filesize', default=None,
help='Do not download any videos smaller than SIZE, e.g. 50k or 44.6M')
help='Abort download if filesize is smaller than SIZE, e.g. 50k or 44.6M')
selection.add_option(
'--max-filesize',
metavar='SIZE', dest='max_filesize', default=None,
help='Do not download any videos larger than SIZE, e.g. 50k or 44.6M')
help='Abort download if filesize if larger than SIZE, e.g. 50k or 44.6M')
selection.add_option(
'--date',
metavar='DATE', dest='date', default=None,

View File

@ -174,6 +174,7 @@ def release_hash(self):
def _report_error(self, msg, expected=False):
self.ydl.report_error(msg, tb=False if expected else None)
self.ydl._download_retcode = 100
def _report_permission_error(self, file):
self._report_error(f'Unable to write to {file}; Try running as administrator', True)

View File

@ -480,6 +480,7 @@ def handle_endtag(self, tag):
raise self.HTMLBreakOnClosingTagException()
# XXX: This should be far less strict
def get_element_text_and_html_by_tag(tag, html):
"""
For the first element with the specified tag in the passed HTML document
@ -524,6 +525,7 @@ def __init__(self):
def handle_starttag(self, tag, attrs):
self.attrs = dict(attrs)
raise compat_HTMLParseError('done')
class HTMLListAttrsParser(html.parser.HTMLParser):
@ -684,7 +686,8 @@ def replace_insane(char):
return '\0_'
return char
if restricted and is_id is NO_DEFAULT:
# Replace look-alike Unicode glyphs
if restricted and (is_id is NO_DEFAULT or not is_id):
s = unicodedata.normalize('NFKC', s)
s = re.sub(r'[0-9]+(?::[0-9]+)+', lambda m: m.group(0).replace(':', '_'), s) # Handle timestamps
result = ''.join(map(replace_insane, s))
@ -985,6 +988,25 @@ def make_HTTPS_handler(params, **kwargs):
context.options |= 4 # SSL_OP_LEGACY_SERVER_CONNECT
# Allow use of weaker ciphers in Python 3.10+. See https://bugs.python.org/issue43998
context.set_ciphers('DEFAULT')
elif (
sys.version_info < (3, 10)
and ssl.OPENSSL_VERSION_INFO >= (1, 1, 1)
and not ssl.OPENSSL_VERSION.startswith('LibreSSL')
):
# Backport the default SSL ciphers and minimum TLS version settings from Python 3.10 [1].
# This is to ensure consistent behavior across Python versions, and help avoid fingerprinting
# in some situations [2][3].
# Python 3.10 only supports OpenSSL 1.1.1+ [4]. Because this change is likely
# untested on older versions, we only apply this to OpenSSL 1.1.1+ to be safe.
# LibreSSL is excluded until further investigation due to cipher support issues [5][6].
# 1. https://github.com/python/cpython/commit/e983252b516edb15d4338b0a47631b59ef1e2536
# 2. https://github.com/yt-dlp/yt-dlp/issues/4627
# 3. https://github.com/yt-dlp/yt-dlp/pull/5294
# 4. https://peps.python.org/pep-0644/
# 5. https://peps.python.org/pep-0644/#libressl-support
# 6. https://github.com/yt-dlp/yt-dlp/commit/5b9f253fa0aee996cf1ed30185d4b502e00609c4#commitcomment-89054368
context.set_ciphers('@SECLEVEL=2:ECDH+AESGCM:ECDH+CHACHA20:ECDH+AES:DHE+AES:!aNULL:!eNULL:!aDSS:!SHA1:!AESCCM')
context.minimum_version = ssl.TLSVersion.TLSv1_2
context.verify_mode = ssl.CERT_REQUIRED if opts_check_certificate else ssl.CERT_NONE
if opts_check_certificate:
@ -1982,12 +2004,13 @@ def system_identifier():
with contextlib.suppress(OSError): # We may not have access to the executable
libc_ver = platform.libc_ver()
return 'Python %s (%s %s) - %s %s' % (
return 'Python %s (%s %s) - %s (%s%s)' % (
platform.python_version(),
python_implementation,
platform.architecture()[0],
platform.platform(),
format_field(join_nonempty(*libc_ver, delim=' '), None, '(%s)'),
ssl.OPENSSL_VERSION,
format_field(join_nonempty(*libc_ver, delim=' '), None, ', %s'),
)
@ -3078,8 +3101,8 @@ def escape_url(url):
).geturl()
def parse_qs(url):
return urllib.parse.parse_qs(urllib.parse.urlparse(url).query)
def parse_qs(url, **kwargs):
return urllib.parse.parse_qs(urllib.parse.urlparse(url).query, **kwargs)
def read_batch_urls(batch_fd):