mirror of
https://github.com/yt-dlp/yt-dlp.git
synced 2024-09-27 16:56:30 +02:00
Compare commits
19 Commits
6141346d18
...
e9ce4e9250
Author | SHA1 | Date | |
---|---|---|---|
|
e9ce4e9250 | ||
|
5da08bde9e | ||
|
ff48fc04d0 | ||
|
46d09f8707 | ||
|
db4678e448 | ||
|
a349d4d641 | ||
|
ac8e69dd32 | ||
|
96b9e9cf62 | ||
|
cb1553e966 | ||
|
0d2a0ecac3 | ||
|
c94df4d19d | ||
|
728f4b5c2e | ||
|
8c188d5d09 | ||
|
e14ea7fbd9 | ||
|
7053aa3a48 | ||
|
049565df2e | ||
|
cc1d3bf96b | ||
|
5b9f253fa0 | ||
|
d715b0e413 |
12
README.md
12
README.md
@ -12,7 +12,7 @@
|
||||
[![License: Unlicense](https://img.shields.io/badge/-Unlicense-blue.svg?style=for-the-badge)](LICENSE "License")
|
||||
[![CI Status](https://img.shields.io/github/workflow/status/yt-dlp/yt-dlp/Core%20Tests/master?label=Tests&style=for-the-badge)](https://github.com/yt-dlp/yt-dlp/actions "CI Status")
|
||||
[![Commits](https://img.shields.io/github/commit-activity/m/yt-dlp/yt-dlp?label=commits&style=for-the-badge)](https://github.com/yt-dlp/yt-dlp/commits "Commit History")
|
||||
[![Last Commit](https://img.shields.io/github/last-commit/yt-dlp/yt-dlp/master?label=&style=for-the-badge)](https://github.com/yt-dlp/yt-dlp/commits "Commit History")
|
||||
[![Last Commit](https://img.shields.io/github/last-commit/yt-dlp/yt-dlp/master?label=&style=for-the-badge&display_timestamp=committer)](https://github.com/yt-dlp/yt-dlp/commits "Commit History")
|
||||
|
||||
</div>
|
||||
<!-- MANPAGE: END EXCLUDED SECTION -->
|
||||
@ -1642,9 +1642,9 @@ # MODIFYING METADATA
|
||||
|
||||
`--replace-in-metadata FIELDS REGEX REPLACE` is used to replace text in any metadata field using [python regular expression](https://docs.python.org/3/library/re.html#regular-expression-syntax). [Backreferences](https://docs.python.org/3/library/re.html?highlight=backreferences#re.sub) can be used in the replace string for advanced use.
|
||||
|
||||
The general syntax of `--parse-metadata FROM:TO` is to give the name of a field or an [output template](#output-template) to extract data from, and the format to interpret it as, separated by a colon `:`. Either a [python regular expression](https://docs.python.org/3/library/re.html#regular-expression-syntax) with named capture groups or a similar syntax to the [output template](#output-template) (only `%(field)s` formatting is supported) can be used for `TO`. The option can be used multiple times to parse and modify various fields.
|
||||
The general syntax of `--parse-metadata FROM:TO` is to give the name of a field or an [output template](#output-template) to extract data from, and the format to interpret it as, separated by a colon `:`. Either a [python regular expression](https://docs.python.org/3/library/re.html#regular-expression-syntax) with named capture groups, a single field name, or a similar syntax to the [output template](#output-template) (only `%(field)s` formatting is supported) can be used for `TO`. The option can be used multiple times to parse and modify various fields.
|
||||
|
||||
Note that any field created by this can be used in the [output template](#output-template) and will also affect the media file's metadata added when using `--embed-metadata`.
|
||||
Note that these options preserve their relative order, allowing replacements to be made in parsed fields and viceversa. Also, any field thus created can be used in the [output template](#output-template) and will also affect the media file's metadata added when using `--embed-metadata`.
|
||||
|
||||
This option also has a few special uses:
|
||||
|
||||
@ -1733,11 +1733,7 @@ #### funimation
|
||||
* `language`: Audio languages to extract, e.g. `funimation:language=english,japanese`
|
||||
* `version`: The video version to extract - `uncut` or `simulcast`
|
||||
|
||||
#### crunchyroll
|
||||
* `language`: Audio languages to extract, e.g. `crunchyroll:language=jaJp`
|
||||
* `hardsub`: Which hard-sub versions to extract, e.g. `crunchyroll:hardsub=None,enUS`
|
||||
|
||||
#### crunchyrollbeta
|
||||
#### crunchyrollbeta (Crunchyroll)
|
||||
* `format`: Which stream type(s) to extract (default: `adaptive_hls`). Potentially useful values include `adaptive_hls`, `adaptive_dash`, `vo_adaptive_hls`, `vo_adaptive_dash`, `download_hls`, `download_dash`, `multitrack_adaptive_hls_v2`
|
||||
* `hardsub`: Preference order for which hardsub versions to extract, or `all` (default: `None` = no hardsubs), e.g. `crunchyrollbeta:hardsub=en-US,None`
|
||||
|
||||
|
@ -23,7 +23,7 @@ # Supported sites
|
||||
- **9now.com.au**
|
||||
- **abc.net.au**
|
||||
- **abc.net.au:iview**
|
||||
- **abc.net.au:iview:showseries**
|
||||
- **abc.net.au:iview:showseries**
|
||||
- **abcnews**
|
||||
- **abcnews:video**
|
||||
- **abcotvs**: ABC Owned Television Stations
|
||||
@ -124,8 +124,8 @@ # Supported sites
|
||||
- **bbc**: [<abbr title="netrc machine"><em>bbc</em></abbr>] BBC
|
||||
- **bbc.co.uk**: [<abbr title="netrc machine"><em>bbc</em></abbr>] BBC iPlayer
|
||||
- **bbc.co.uk:article**: BBC articles
|
||||
- **bbc.co.uk:iplayer:episodes**
|
||||
- **bbc.co.uk:iplayer:group**
|
||||
- **bbc.co.uk:iplayer:episodes**
|
||||
- **bbc.co.uk:iplayer:group**
|
||||
- **bbc.co.uk:playlist**
|
||||
- **BBVTV**: [<abbr title="netrc machine"><em>bbvtv</em></abbr>]
|
||||
- **BBVTVLive**: [<abbr title="netrc machine"><em>bbvtv</em></abbr>]
|
||||
@ -274,7 +274,7 @@ # Supported sites
|
||||
- **crunchyroll**: [<abbr title="netrc machine"><em>crunchyroll</em></abbr>]
|
||||
- **crunchyroll:beta**: [<abbr title="netrc machine"><em>crunchyroll</em></abbr>]
|
||||
- **crunchyroll:playlist**: [<abbr title="netrc machine"><em>crunchyroll</em></abbr>]
|
||||
- **crunchyroll:playlist:beta**: [<abbr title="netrc machine"><em>crunchyroll</em></abbr>]
|
||||
- **crunchyroll:playlist:beta**: [<abbr title="netrc machine"><em>crunchyroll</em></abbr>]
|
||||
- **CSpan**: C-SPAN
|
||||
- **CSpanCongress**
|
||||
- **CtsNews**: 華視新聞
|
||||
@ -483,7 +483,7 @@ # Supported sites
|
||||
- **Golem**
|
||||
- **goodgame:stream**
|
||||
- **google:podcasts**
|
||||
- **google:podcasts:feed**
|
||||
- **google:podcasts:feed**
|
||||
- **GoogleDrive**
|
||||
- **GoogleDrive:Folder**
|
||||
- **GoPlay**: [<abbr title="netrc machine"><em>goplay</em></abbr>]
|
||||
@ -618,7 +618,7 @@ # Supported sites
|
||||
- **kuwo:singer**: 酷我音乐 - 歌手
|
||||
- **kuwo:song**: 酷我音乐
|
||||
- **la7.it**
|
||||
- **la7.it:pod:episode**
|
||||
- **la7.it:pod:episode**
|
||||
- **la7.it:podcast**
|
||||
- **laola1tv**
|
||||
- **laola1tv:embed**
|
||||
@ -652,7 +652,7 @@ # Supported sites
|
||||
- **LineLiveChannel**
|
||||
- **LinkedIn**: [<abbr title="netrc machine"><em>linkedin</em></abbr>]
|
||||
- **linkedin:learning**: [<abbr title="netrc machine"><em>linkedin</em></abbr>]
|
||||
- **linkedin:learning:course**: [<abbr title="netrc machine"><em>linkedin</em></abbr>]
|
||||
- **linkedin:learning:course**: [<abbr title="netrc machine"><em>linkedin</em></abbr>]
|
||||
- **LinuxAcademy**: [<abbr title="netrc machine"><em>linuxacademy</em></abbr>]
|
||||
- **Liputan6**
|
||||
- **LiTV**
|
||||
@ -673,7 +673,7 @@ # Supported sites
|
||||
- **MagentaMusik360**
|
||||
- **mailru**: Видео@Mail.Ru
|
||||
- **mailru:music**: Музыка@Mail.Ru
|
||||
- **mailru:music:search**: Музыка@Mail.Ru
|
||||
- **mailru:music:search**: Музыка@Mail.Ru
|
||||
- **MainStreaming**: MainStreaming Player
|
||||
- **MallTV**
|
||||
- **mangomolo:live**
|
||||
@ -718,7 +718,7 @@ # Supported sites
|
||||
- **microsoftstream**: Microsoft Stream
|
||||
- **mildom**: Record ongoing live by specific user in Mildom
|
||||
- **mildom:clip**: Clip in Mildom
|
||||
- **mildom:user:vod**: Download all VODs from specific user in Mildom
|
||||
- **mildom:user:vod**: Download all VODs from specific user in Mildom
|
||||
- **mildom:vod**: VOD in Mildom
|
||||
- **minds**
|
||||
- **minds:channel**
|
||||
@ -803,7 +803,7 @@ # Supported sites
|
||||
- **navernow**
|
||||
- **NBA**
|
||||
- **nba:watch**
|
||||
- **nba:watch:collection**
|
||||
- **nba:watch:collection**
|
||||
- **NBAChannel**
|
||||
- **NBAEmbed**
|
||||
- **NBAWatchEmbed**
|
||||
@ -817,7 +817,7 @@ # Supported sites
|
||||
- **NBCStations**
|
||||
- **ndr**: NDR.de - Norddeutscher Rundfunk
|
||||
- **ndr:embed**
|
||||
- **ndr:embed:base**
|
||||
- **ndr:embed:base**
|
||||
- **NDTV**
|
||||
- **Nebula**: [<abbr title="netrc machine"><em>watchnebula</em></abbr>]
|
||||
- **nebula:channel**: [<abbr title="netrc machine"><em>watchnebula</em></abbr>]
|
||||
@ -869,7 +869,7 @@ # Supported sites
|
||||
- **niconico:tag**: NicoNico video tag URLs
|
||||
- **NiconicoUser**
|
||||
- **nicovideo:search**: Nico video search; "nicosearch:" prefix
|
||||
- **nicovideo:search:date**: Nico video search, newest first; "nicosearchdate:" prefix
|
||||
- **nicovideo:search:date**: Nico video search, newest first; "nicosearchdate:" prefix
|
||||
- **nicovideo:search_url**: Nico video search URLs
|
||||
- **Nintendo**
|
||||
- **Nitter**
|
||||
@ -892,7 +892,7 @@ # Supported sites
|
||||
- **npo**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
|
||||
- **npo.nl:live**
|
||||
- **npo.nl:radio**
|
||||
- **npo.nl:radio:fragment**
|
||||
- **npo.nl:radio:fragment**
|
||||
- **Npr**
|
||||
- **NRK**
|
||||
- **NRKPlaylist**
|
||||
@ -933,7 +933,7 @@ # Supported sites
|
||||
- **openrec:capture**
|
||||
- **openrec:movie**
|
||||
- **OraTV**
|
||||
- **orf:fm4:story**: fm4.orf.at stories
|
||||
- **orf:fm4:story**: fm4.orf.at stories
|
||||
- **orf:iptv**: iptv.ORF.at
|
||||
- **orf:radio**
|
||||
- **orf:tvthek**: ORF TVthek
|
||||
@ -981,7 +981,7 @@ # Supported sites
|
||||
- **Pinterest**
|
||||
- **PinterestCollection**
|
||||
- **pixiv:sketch**
|
||||
- **pixiv:sketch:user**
|
||||
- **pixiv:sketch:user**
|
||||
- **Pladform**
|
||||
- **PlanetMarathi**
|
||||
- **Platzi**: [<abbr title="netrc machine"><em>platzi</em></abbr>]
|
||||
@ -1010,7 +1010,7 @@ # Supported sites
|
||||
- **polskieradio:kierowcow**
|
||||
- **polskieradio:player**
|
||||
- **polskieradio:podcast**
|
||||
- **polskieradio:podcast:list**
|
||||
- **polskieradio:podcast:list**
|
||||
- **PolskieRadioCategory**
|
||||
- **Popcorntimes**
|
||||
- **PopcornTV**
|
||||
@ -1122,7 +1122,7 @@ # Supported sites
|
||||
- **rtl.nl**: rtl.nl and rtlxl.nl
|
||||
- **rtl2**
|
||||
- **rtl2:you**
|
||||
- **rtl2:you:series**
|
||||
- **rtl2:you:series**
|
||||
- **RTLLuLive**
|
||||
- **RTLLuRadio**
|
||||
- **RTNews**
|
||||
@ -1198,9 +1198,9 @@ # Supported sites
|
||||
- **Skeb**
|
||||
- **sky.it**
|
||||
- **sky:news**
|
||||
- **sky:news:story**
|
||||
- **sky:news:story**
|
||||
- **sky:sports**
|
||||
- **sky:sports:news**
|
||||
- **sky:sports:news**
|
||||
- **skyacademy.it**
|
||||
- **SkylineWebcams**
|
||||
- **skynewsarabia:article**
|
||||
@ -1289,7 +1289,7 @@ # Supported sites
|
||||
- **Teachable**: [<abbr title="netrc machine"><em>teachable</em></abbr>]
|
||||
- **TeachableCourse**: [<abbr title="netrc machine"><em>teachable</em></abbr>]
|
||||
- **teachertube**: teachertube.com videos
|
||||
- **teachertube:user:collection**: teachertube.com user and collection videos
|
||||
- **teachertube:user:collection**: teachertube.com user and collection videos
|
||||
- **TeachingChannel**
|
||||
- **Teamcoco**
|
||||
- **TeamTreeHouse**: [<abbr title="netrc machine"><em>teamtreehouse</em></abbr>]
|
||||
@ -1614,12 +1614,12 @@ # Supported sites
|
||||
- **XXXYMovies**
|
||||
- **Yahoo**: Yahoo screen and movies
|
||||
- **yahoo:gyao**
|
||||
- **yahoo:gyao:player**
|
||||
- **yahoo:gyao:player**
|
||||
- **yahoo:japannews**: Yahoo! Japan News
|
||||
- **YandexDisk**
|
||||
- **yandexmusic:album**: Яндекс.Музыка - Альбом
|
||||
- **yandexmusic:artist:albums**: Яндекс.Музыка - Артист - Альбомы
|
||||
- **yandexmusic:artist:tracks**: Яндекс.Музыка - Артист - Треки
|
||||
- **yandexmusic:artist:albums**: Яндекс.Музыка - Артист - Альбомы
|
||||
- **yandexmusic:artist:tracks**: Яндекс.Музыка - Артист - Треки
|
||||
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
|
||||
- **yandexmusic:track**: Яндекс.Музыка - Трек
|
||||
- **YandexVideo**
|
||||
@ -1641,14 +1641,14 @@ # Supported sites
|
||||
- **youtube:clip**
|
||||
- **youtube:favorites**: YouTube liked videos; ":ytfav" keyword (requires cookies)
|
||||
- **youtube:history**: Youtube watch history; ":ythis" keyword (requires cookies)
|
||||
- **youtube:music:search_url**: YouTube music search URLs with selectable sections, e.g. #songs
|
||||
- **youtube:music:search_url**: YouTube music search URLs with selectable sections, e.g. #songs
|
||||
- **youtube:notif**: YouTube notifications; ":ytnotif" keyword (requires cookies)
|
||||
- **youtube:playlist**: YouTube playlists
|
||||
- **youtube:recommended**: YouTube recommended videos; ":ytrec" keyword
|
||||
- **youtube:search**: YouTube search; "ytsearch:" prefix
|
||||
- **youtube:search:date**: YouTube search, newest videos first; "ytsearchdate:" prefix
|
||||
- **youtube:search:date**: YouTube search, newest videos first; "ytsearchdate:" prefix
|
||||
- **youtube:search_url**: YouTube search URLs with sorting and filter support
|
||||
- **youtube:shorts:pivot:audio**: YouTube Shorts audio pivot (Shorts using audio of a given video)
|
||||
- **youtube:shorts:pivot:audio**: YouTube Shorts audio pivot (Shorts using audio of a given video)
|
||||
- **youtube:stories**: YouTube channel stories; "ytstories:" prefix
|
||||
- **youtube:subscriptions**: YouTube subscriptions feed; ":ytsubs" keyword (requires cookies)
|
||||
- **youtube:tab**: YouTube Tabs
|
||||
|
@ -260,8 +260,8 @@ def _repr(v):
|
||||
info_dict_str += ''.join(
|
||||
f' {_repr(k)}: {_repr(test_info_dict[k])},\n'
|
||||
for k in missing_keys)
|
||||
write_string(
|
||||
'\n\'info_dict\': {\n' + info_dict_str + '},\n', out=sys.stderr)
|
||||
info_dict_str = '\n\'info_dict\': {\n' + info_dict_str + '},\n'
|
||||
write_string(info_dict_str.replace('\n', '\n '), out=sys.stderr)
|
||||
self.assertFalse(
|
||||
missing_keys,
|
||||
'Missing keys in test definition: %s' % (
|
||||
|
@ -11,7 +11,6 @@
|
||||
import base64
|
||||
|
||||
from yt_dlp.aes import (
|
||||
BLOCK_SIZE_BYTES,
|
||||
aes_cbc_decrypt,
|
||||
aes_cbc_decrypt_bytes,
|
||||
aes_cbc_encrypt,
|
||||
@ -103,8 +102,7 @@ def test_decrypt_text(self):
|
||||
|
||||
def test_ecb_encrypt(self):
|
||||
data = bytes_to_intlist(self.secret_msg)
|
||||
data += [0x08] * (BLOCK_SIZE_BYTES - len(data) % BLOCK_SIZE_BYTES)
|
||||
encrypted = intlist_to_bytes(aes_ecb_encrypt(data, self.key, self.iv))
|
||||
encrypted = intlist_to_bytes(aes_ecb_encrypt(data, self.key))
|
||||
self.assertEqual(
|
||||
encrypted,
|
||||
b'\xaa\x86]\x81\x97>\x02\x92\x9d\x1bR[[L/u\xd3&\xd1(h\xde{\x81\x94\xba\x02\xae\xbd\xa6\xd0:')
|
||||
|
@ -28,11 +28,23 @@ def aes_cbc_encrypt_bytes(data, key, iv, **kwargs):
|
||||
return intlist_to_bytes(aes_cbc_encrypt(*map(bytes_to_intlist, (data, key, iv)), **kwargs))
|
||||
|
||||
|
||||
BLOCK_SIZE_BYTES = 16
|
||||
|
||||
|
||||
def unpad_pkcs7(data):
|
||||
return data[:-compat_ord(data[-1])]
|
||||
|
||||
|
||||
BLOCK_SIZE_BYTES = 16
|
||||
def pkcs7_padding(data):
|
||||
"""
|
||||
PKCS#7 padding
|
||||
|
||||
@param {int[]} data cleartext
|
||||
@returns {int[]} padding data
|
||||
"""
|
||||
|
||||
remaining_length = BLOCK_SIZE_BYTES - len(data) % BLOCK_SIZE_BYTES
|
||||
return data + [remaining_length] * remaining_length
|
||||
|
||||
|
||||
def pad_block(block, padding_mode):
|
||||
@ -64,7 +76,7 @@ def pad_block(block, padding_mode):
|
||||
|
||||
def aes_ecb_encrypt(data, key, iv=None):
|
||||
"""
|
||||
Encrypt with aes in ECB mode
|
||||
Encrypt with aes in ECB mode. Using PKCS#7 padding
|
||||
|
||||
@param {int[]} data cleartext
|
||||
@param {int[]} key 16/24/32-Byte cipher key
|
||||
@ -77,8 +89,7 @@ def aes_ecb_encrypt(data, key, iv=None):
|
||||
encrypted_data = []
|
||||
for i in range(block_count):
|
||||
block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES]
|
||||
encrypted_data += aes_encrypt(block, expanded_key)
|
||||
encrypted_data = encrypted_data[:len(data)]
|
||||
encrypted_data += aes_encrypt(pkcs7_padding(block), expanded_key)
|
||||
|
||||
return encrypted_data
|
||||
|
||||
@ -551,5 +562,6 @@ def ghash(subkey, data):
|
||||
|
||||
'key_expansion',
|
||||
'pad_block',
|
||||
'pkcs7_padding',
|
||||
'unpad_pkcs7',
|
||||
]
|
||||
|
@ -14,7 +14,7 @@
|
||||
# HTMLParseError has been deprecated in Python 3.3 and removed in
|
||||
# Python 3.5. Introducing dummy exception for Python >3.5 for compatible
|
||||
# and uniform cross-version exception handling
|
||||
class compat_HTMLParseError(Exception):
|
||||
class compat_HTMLParseError(ValueError):
|
||||
pass
|
||||
|
||||
|
||||
|
@ -48,6 +48,7 @@ def compat_setenv(key, value, env=os.environ):
|
||||
|
||||
|
||||
compat_basestring = str
|
||||
compat_casefold = str.casefold
|
||||
compat_chr = chr
|
||||
compat_collections_abc = collections.abc
|
||||
compat_cookiejar = http.cookiejar
|
||||
|
@ -372,8 +372,6 @@
|
||||
CrowdBunkerChannelIE,
|
||||
)
|
||||
from .crunchyroll import (
|
||||
CrunchyrollIE,
|
||||
CrunchyrollShowPlaylistIE,
|
||||
CrunchyrollBetaIE,
|
||||
CrunchyrollBetaShowIE,
|
||||
)
|
||||
@ -470,6 +468,10 @@
|
||||
)
|
||||
from .dumpert import DumpertIE
|
||||
from .defense import DefenseGouvFrIE
|
||||
from .deuxm import (
|
||||
DeuxMIE,
|
||||
DeuxMNewsIE
|
||||
)
|
||||
from .digitalconcerthall import DigitalConcertHallIE
|
||||
from .discovery import DiscoveryIE
|
||||
from .disney import DisneyIE
|
||||
@ -586,6 +588,7 @@
|
||||
from .foxnews import (
|
||||
FoxNewsIE,
|
||||
FoxNewsArticleIE,
|
||||
FoxNewsVideoIE,
|
||||
)
|
||||
from .foxsports import FoxSportsIE
|
||||
from .fptplay import FptplayIE
|
||||
@ -908,6 +911,7 @@
|
||||
)
|
||||
from .linuxacademy import LinuxAcademyIE
|
||||
from .liputan6 import Liputan6IE
|
||||
from .listennotes import ListenNotesIE
|
||||
from .litv import LiTVIE
|
||||
from .livejournal import LiveJournalIE
|
||||
from .livestream import (
|
||||
@ -1427,6 +1431,7 @@
|
||||
)
|
||||
from .puls4 import Puls4IE
|
||||
from .pyvideo import PyvideoIE
|
||||
from .qingting import QingTingIE
|
||||
from .qqmusic import (
|
||||
QQMusicIE,
|
||||
QQMusicSingerIE,
|
||||
@ -1640,7 +1645,6 @@
|
||||
SkyItVideoIE,
|
||||
SkyItVideoLiveIE,
|
||||
SkyItIE,
|
||||
SkyItAcademyIE,
|
||||
SkyItArteIE,
|
||||
CieloTVItIE,
|
||||
TV8ItIE,
|
||||
@ -1760,6 +1764,7 @@
|
||||
SVTPlayIE,
|
||||
SVTSeriesIE,
|
||||
)
|
||||
from .swearnet import SwearnetEpisodeIE
|
||||
from .swrmediathek import SWRMediathekIE
|
||||
from .syvdk import SYVDKIE
|
||||
from .syfy import SyfyIE
|
||||
@ -1960,7 +1965,8 @@
|
||||
TVPEmbedIE,
|
||||
TVPIE,
|
||||
TVPStreamIE,
|
||||
TVPWebsiteIE,
|
||||
TVPVODSeriesIE,
|
||||
TVPVODVideoIE,
|
||||
)
|
||||
from .tvplay import (
|
||||
TVPlayIE,
|
||||
|
@ -161,7 +161,7 @@ class AcFunBangumiIE(AcFunVideoBaseIE):
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
ac_idx = parse_qs(url).get('ac', [None])[-1]
|
||||
video_id = f'{video_id}{format_field(ac_idx, template="__%s")}'
|
||||
video_id = f'{video_id}{format_field(ac_idx, None, "__%s")}'
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
json_bangumi_data = self._search_json(r'window.bangumiData\s*=', webpage, 'bangumiData', video_id)
|
||||
|
@ -28,30 +28,34 @@
|
||||
|
||||
|
||||
class ADNIE(InfoExtractor):
|
||||
IE_DESC = 'Anime Digital Network'
|
||||
_VALID_URL = r'https?://(?:www\.)?animedigitalnetwork\.fr/video/[^/]+/(?P<id>\d+)'
|
||||
_TEST = {
|
||||
'url': 'http://animedigitalnetwork.fr/video/blue-exorcist-kyoto-saga/7778-episode-1-debut-des-hostilites',
|
||||
'md5': '0319c99885ff5547565cacb4f3f9348d',
|
||||
IE_DESC = 'Animation Digital Network'
|
||||
_VALID_URL = r'https?://(?:www\.)?(?:animation|anime)digitalnetwork\.fr/video/[^/]+/(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://animationdigitalnetwork.fr/video/fruits-basket/9841-episode-1-a-ce-soir',
|
||||
'md5': '1c9ef066ceb302c86f80c2b371615261',
|
||||
'info_dict': {
|
||||
'id': '7778',
|
||||
'id': '9841',
|
||||
'ext': 'mp4',
|
||||
'title': 'Blue Exorcist - Kyôto Saga - Episode 1',
|
||||
'description': 'md5:2f7b5aa76edbc1a7a92cedcda8a528d5',
|
||||
'series': 'Blue Exorcist - Kyôto Saga',
|
||||
'duration': 1467,
|
||||
'release_date': '20170106',
|
||||
'title': 'Fruits Basket - Episode 1',
|
||||
'description': 'md5:14be2f72c3c96809b0ca424b0097d336',
|
||||
'series': 'Fruits Basket',
|
||||
'duration': 1437,
|
||||
'release_date': '20190405',
|
||||
'comment_count': int,
|
||||
'average_rating': float,
|
||||
'season_number': 2,
|
||||
'episode': 'Début des hostilités',
|
||||
'season_number': 1,
|
||||
'episode': 'À ce soir !',
|
||||
'episode_number': 1,
|
||||
}
|
||||
}
|
||||
},
|
||||
'skip': 'Only available in region (FR, ...)',
|
||||
}, {
|
||||
'url': 'http://animedigitalnetwork.fr/video/blue-exorcist-kyoto-saga/7778-episode-1-debut-des-hostilites',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
_NETRC_MACHINE = 'animedigitalnetwork'
|
||||
_BASE_URL = 'http://animedigitalnetwork.fr'
|
||||
_API_BASE_URL = 'https://gw.api.animedigitalnetwork.fr/'
|
||||
_NETRC_MACHINE = 'animationdigitalnetwork'
|
||||
_BASE = 'animationdigitalnetwork.fr'
|
||||
_API_BASE_URL = 'https://gw.api.' + _BASE + '/'
|
||||
_PLAYER_BASE_URL = _API_BASE_URL + 'player/'
|
||||
_HEADERS = {}
|
||||
_LOGIN_ERR_MESSAGE = 'Unable to log in'
|
||||
@ -75,11 +79,11 @@ def _get_subtitles(self, sub_url, video_id):
|
||||
if subtitle_location:
|
||||
enc_subtitles = self._download_webpage(
|
||||
subtitle_location, video_id, 'Downloading subtitles data',
|
||||
fatal=False, headers={'Origin': 'https://animedigitalnetwork.fr'})
|
||||
fatal=False, headers={'Origin': 'https://' + self._BASE})
|
||||
if not enc_subtitles:
|
||||
return None
|
||||
|
||||
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
|
||||
# http://animationdigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
|
||||
dec_subtitles = unpad_pkcs7(aes_cbc_decrypt_bytes(
|
||||
compat_b64decode(enc_subtitles[24:]),
|
||||
binascii.unhexlify(self._K + '7fac1178830cfe0c'),
|
||||
|
@ -368,7 +368,7 @@ def _real_extract(self, url):
|
||||
or '正在观看预览,大会员免费看全片' in webpage):
|
||||
self.raise_login_required('This video is for premium members only')
|
||||
|
||||
play_info = self._search_json(r'window\.__playinfo__\s*=\s*', webpage, 'play info', video_id)['data']
|
||||
play_info = self._search_json(r'window\.__playinfo__\s*=', webpage, 'play info', video_id)['data']
|
||||
formats = self.extract_formats(play_info)
|
||||
if (not formats and '成为大会员抢先看' in webpage
|
||||
and play_info.get('durl') and not play_info.get('dash')):
|
||||
|
@ -9,6 +9,7 @@
|
||||
ExtractorError,
|
||||
float_or_none,
|
||||
sanitized_Request,
|
||||
str_or_none,
|
||||
traverse_obj,
|
||||
urlencode_postdata,
|
||||
USER_AGENTS,
|
||||
@ -16,13 +17,13 @@
|
||||
|
||||
|
||||
class CeskaTelevizeIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?ceskatelevize\.cz/(?:ivysilani|porady)/(?:[^/?#&]+/)*(?P<id>[^/#?]+)'
|
||||
_VALID_URL = r'https?://(?:www\.)?ceskatelevize\.cz/(?:ivysilani|porady|zive)/(?:[^/?#&]+/)*(?P<id>[^/#?]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.ceskatelevize.cz/ivysilani/10441294653-hyde-park-civilizace/215411058090502/bonus/20641-bonus-01-en',
|
||||
'info_dict': {
|
||||
'id': '61924494877028507',
|
||||
'ext': 'mp4',
|
||||
'title': 'Hyde Park Civilizace: Bonus 01 - En',
|
||||
'title': 'Bonus 01 - En - Hyde Park Civilizace',
|
||||
'description': 'English Subtittles',
|
||||
'thumbnail': r're:^https?://.*\.jpg',
|
||||
'duration': 81.3,
|
||||
@ -33,18 +34,29 @@ class CeskaTelevizeIE(InfoExtractor):
|
||||
},
|
||||
}, {
|
||||
# live stream
|
||||
'url': 'http://www.ceskatelevize.cz/ivysilani/zive/ct4/',
|
||||
'url': 'http://www.ceskatelevize.cz/zive/ct1/',
|
||||
'info_dict': {
|
||||
'id': 402,
|
||||
'id': '102',
|
||||
'ext': 'mp4',
|
||||
'title': r're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
|
||||
'title': r'ČT1 - živé vysílání online',
|
||||
'description': 'Sledujte živé vysílání kanálu ČT1 online. Vybírat si můžete i z dalších kanálů České televize na kterémkoli z vašich zařízení.',
|
||||
'is_live': True,
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'Georestricted to Czech Republic',
|
||||
}, {
|
||||
# another
|
||||
'url': 'http://www.ceskatelevize.cz/ivysilani/zive/ct4/',
|
||||
'only_matching': True,
|
||||
'info_dict': {
|
||||
'id': 402,
|
||||
'ext': 'mp4',
|
||||
'title': r're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
|
||||
'is_live': True,
|
||||
},
|
||||
# 'skip': 'Georestricted to Czech Republic',
|
||||
}, {
|
||||
'url': 'http://www.ceskatelevize.cz/ivysilani/embed/iFramePlayer.php?hash=d6a3e1370d2e4fa76296b90bad4dfc19673b641e&IDEC=217 562 22150/0004&channelID=1&width=100%25',
|
||||
'only_matching': True,
|
||||
@ -53,21 +65,21 @@ class CeskaTelevizeIE(InfoExtractor):
|
||||
'url': 'http://www.ceskatelevize.cz/porady/10520528904-queer/215562210900007-bogotart/',
|
||||
'info_dict': {
|
||||
'id': '215562210900007-bogotart',
|
||||
'title': 'Queer: Bogotart',
|
||||
'description': 'Hlavní město Kolumbie v doprovodu queer umělců. Vroucí svět plný vášně, sebevědomí, ale i násilí a bolesti. Připravil Peter Serge Butko',
|
||||
'title': 'Bogotart - Queer',
|
||||
'description': 'Hlavní město Kolumbie v doprovodu queer umělců. Vroucí svět plný vášně, sebevědomí, ale i násilí a bolesti',
|
||||
},
|
||||
'playlist': [{
|
||||
'info_dict': {
|
||||
'id': '61924494877311053',
|
||||
'ext': 'mp4',
|
||||
'title': 'Queer: Bogotart (Varování 18+)',
|
||||
'title': 'Bogotart - Queer (Varování 18+)',
|
||||
'duration': 11.9,
|
||||
},
|
||||
}, {
|
||||
'info_dict': {
|
||||
'id': '61924494877068022',
|
||||
'ext': 'mp4',
|
||||
'title': 'Queer: Bogotart (Queer)',
|
||||
'title': 'Bogotart - Queer (Queer)',
|
||||
'thumbnail': r're:^https?://.*\.jpg',
|
||||
'duration': 1558.3,
|
||||
},
|
||||
@ -84,28 +96,42 @@ class CeskaTelevizeIE(InfoExtractor):
|
||||
|
||||
def _real_extract(self, url):
|
||||
playlist_id = self._match_id(url)
|
||||
parsed_url = compat_urllib_parse_urlparse(url)
|
||||
webpage = self._download_webpage(url, playlist_id)
|
||||
site_name = self._og_search_property('site_name', webpage, fatal=False, default=None)
|
||||
webpage, urlh = self._download_webpage_handle(url, playlist_id)
|
||||
parsed_url = compat_urllib_parse_urlparse(urlh.geturl())
|
||||
site_name = self._og_search_property('site_name', webpage, fatal=False, default='Česká televize')
|
||||
playlist_title = self._og_search_title(webpage, default=None)
|
||||
if site_name and playlist_title:
|
||||
playlist_title = playlist_title.replace(f' — {site_name}', '', 1)
|
||||
playlist_title = re.split(r'\s*[—|]\s*%s' % (site_name, ), playlist_title, 1)[0]
|
||||
playlist_description = self._og_search_description(webpage, default=None)
|
||||
if playlist_description:
|
||||
playlist_description = playlist_description.replace('\xa0', ' ')
|
||||
|
||||
if parsed_url.path.startswith('/porady/'):
|
||||
type_ = 'IDEC'
|
||||
if re.search(r'(^/porady|/zive)/', parsed_url.path):
|
||||
next_data = self._search_nextjs_data(webpage, playlist_id)
|
||||
if '/zive/' in parsed_url.path:
|
||||
idec = traverse_obj(next_data, ('props', 'pageProps', 'data', 'liveBroadcast', 'current', 'idec'), get_all=False)
|
||||
else:
|
||||
idec = traverse_obj(next_data, ('props', 'pageProps', 'data', ('show', 'mediaMeta'), 'idec'), get_all=False)
|
||||
if not idec:
|
||||
idec = traverse_obj(next_data, ('props', 'pageProps', 'data', 'videobonusDetail', 'bonusId'), get_all=False)
|
||||
if idec:
|
||||
type_ = 'bonus'
|
||||
if not idec:
|
||||
raise ExtractorError('Failed to find IDEC id')
|
||||
iframe_hash = self._download_webpage('https://www.ceskatelevize.cz/v-api/iframe-hash/', playlist_id)
|
||||
webpage = self._download_webpage('https://www.ceskatelevize.cz/ivysilani/embed/iFramePlayer.php', playlist_id,
|
||||
query={'hash': iframe_hash, 'origin': 'iVysilani', 'autoStart': 'true', 'IDEC': idec})
|
||||
iframe_hash = self._download_webpage(
|
||||
'https://www.ceskatelevize.cz/v-api/iframe-hash/',
|
||||
playlist_id, note='Getting IFRAME hash')
|
||||
query = {'hash': iframe_hash, 'origin': 'iVysilani', 'autoStart': 'true', type_: idec, }
|
||||
webpage = self._download_webpage(
|
||||
'https://www.ceskatelevize.cz/ivysilani/embed/iFramePlayer.php',
|
||||
playlist_id, note='Downloading player', query=query)
|
||||
|
||||
NOT_AVAILABLE_STRING = 'This content is not available at your territory due to limited copyright.'
|
||||
if '%s</p>' % NOT_AVAILABLE_STRING in webpage:
|
||||
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
|
||||
self.raise_geo_restricted(NOT_AVAILABLE_STRING)
|
||||
if any(not_found in webpage for not_found in ('Neplatný parametr pro videopřehrávač', 'IDEC nebyl nalezen', )):
|
||||
raise ExtractorError('no video with IDEC available', video_id=idec, expected=True)
|
||||
|
||||
type_ = None
|
||||
episode_id = None
|
||||
@ -174,7 +200,6 @@ def _real_extract(self, url):
|
||||
is_live = item.get('type') == 'LIVE'
|
||||
formats = []
|
||||
for format_id, stream_url in item.get('streamUrls', {}).items():
|
||||
stream_url = stream_url.replace('https://', 'http://')
|
||||
if 'playerType=flash' in stream_url:
|
||||
stream_formats = self._extract_m3u8_formats(
|
||||
stream_url, playlist_id, 'mp4', 'm3u8_native',
|
||||
@ -196,7 +221,7 @@ def _real_extract(self, url):
|
||||
entries[num]['formats'].extend(formats)
|
||||
continue
|
||||
|
||||
item_id = item.get('id') or item['assetId']
|
||||
item_id = str_or_none(item.get('id') or item['assetId'])
|
||||
title = item['title']
|
||||
|
||||
duration = float_or_none(item.get('duration'))
|
||||
@ -227,6 +252,8 @@ def _real_extract(self, url):
|
||||
for e in entries:
|
||||
self._sort_formats(e['formats'])
|
||||
|
||||
if len(entries) == 1:
|
||||
return entries[0]
|
||||
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
|
||||
|
||||
def _get_subtitles(self, episode_id, subs):
|
||||
|
@ -3725,7 +3725,8 @@ def description(cls, *, markdown=True, search_examples=None):
|
||||
if not cls.working():
|
||||
desc += ' (**Currently broken**)' if markdown else ' (Currently broken)'
|
||||
|
||||
name = f' - **{cls.IE_NAME}**' if markdown else cls.IE_NAME
|
||||
# Escape emojis. Ref: https://github.com/github/markup/issues/1153
|
||||
name = (' - **%s**' % re.sub(r':(\w+:)', ':\u200B\\g<1>', cls.IE_NAME)) if markdown else cls.IE_NAME
|
||||
return f'{name}:{desc}' if desc else name
|
||||
|
||||
def extract_subtitles(self, *args, **kwargs):
|
||||
|
@ -1,40 +1,16 @@
|
||||
import base64
|
||||
import json
|
||||
import re
|
||||
import urllib.request
|
||||
import xml.etree.ElementTree
|
||||
import zlib
|
||||
from hashlib import sha1
|
||||
from math import floor, pow, sqrt
|
||||
import urllib.parse
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .vrv import VRVBaseIE
|
||||
from ..aes import aes_cbc_decrypt
|
||||
from ..compat import (
|
||||
compat_b64decode,
|
||||
compat_etree_fromstring,
|
||||
compat_str,
|
||||
compat_urllib_parse_urlencode,
|
||||
compat_urlparse,
|
||||
)
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
bytes_to_intlist,
|
||||
extract_attributes,
|
||||
float_or_none,
|
||||
format_field,
|
||||
int_or_none,
|
||||
intlist_to_bytes,
|
||||
join_nonempty,
|
||||
lowercase_escape,
|
||||
merge_dicts,
|
||||
parse_iso8601,
|
||||
qualities,
|
||||
remove_end,
|
||||
sanitized_Request,
|
||||
traverse_obj,
|
||||
try_get,
|
||||
xpath_text,
|
||||
)
|
||||
|
||||
|
||||
@ -42,16 +18,7 @@ class CrunchyrollBaseIE(InfoExtractor):
|
||||
_LOGIN_URL = 'https://www.crunchyroll.com/welcome/login'
|
||||
_API_BASE = 'https://api.crunchyroll.com'
|
||||
_NETRC_MACHINE = 'crunchyroll'
|
||||
|
||||
def _call_rpc_api(self, method, video_id, note=None, data=None):
|
||||
data = data or {}
|
||||
data['req'] = 'RpcApi' + method
|
||||
data = compat_urllib_parse_urlencode(data).encode('utf-8')
|
||||
return self._download_xml(
|
||||
'https://www.crunchyroll.com/xml/',
|
||||
video_id, note, fatal=False, data=data, headers={
|
||||
'Content-Type': 'application/x-www-form-urlencoded',
|
||||
})
|
||||
params = None
|
||||
|
||||
def _perform_login(self, username, password):
|
||||
if self._get_cookies(self._LOGIN_URL).get('etp_rt'):
|
||||
@ -72,7 +39,7 @@ def _perform_login(self, username, password):
|
||||
|
||||
login_response = self._download_json(
|
||||
f'{self._API_BASE}/login.1.json', None, 'Logging in',
|
||||
data=compat_urllib_parse_urlencode({
|
||||
data=urllib.parse.urlencode({
|
||||
'account': username,
|
||||
'password': password,
|
||||
'session_id': session_id
|
||||
@ -82,652 +49,23 @@ def _perform_login(self, username, password):
|
||||
if not self._get_cookies(self._LOGIN_URL).get('etp_rt'):
|
||||
raise ExtractorError('Login succeeded but did not set etp_rt cookie')
|
||||
|
||||
# Beta-specific, but needed for redirects
|
||||
def _get_beta_embedded_json(self, webpage, display_id):
|
||||
def _get_embedded_json(self, webpage, display_id):
|
||||
initial_state = self._parse_json(self._search_regex(
|
||||
r'__INITIAL_STATE__\s*=\s*({.+?})\s*;', webpage, 'initial state'), display_id)
|
||||
app_config = self._parse_json(self._search_regex(
|
||||
r'__APP_CONFIG__\s*=\s*({.+?})\s*;', webpage, 'app config'), display_id)
|
||||
return initial_state, app_config
|
||||
|
||||
def _redirect_to_beta(self, webpage, iekey, video_id):
|
||||
if not self._get_cookies(self._LOGIN_URL).get('etp_rt'):
|
||||
raise ExtractorError('Received a beta page from non-beta url when not logged in.')
|
||||
initial_state, app_config = self._get_beta_embedded_json(webpage, video_id)
|
||||
url = app_config['baseSiteUrl'] + initial_state['router']['locations']['current']['pathname']
|
||||
self.to_screen(f'{video_id}: Redirected to beta site - {url}')
|
||||
return self.url_result(f'{url}', iekey, video_id)
|
||||
|
||||
@staticmethod
|
||||
def _add_skip_wall(url):
|
||||
parsed_url = compat_urlparse.urlparse(url)
|
||||
qs = compat_urlparse.parse_qs(parsed_url.query)
|
||||
# Always force skip_wall to bypass maturity wall, namely 18+ confirmation message:
|
||||
# > This content may be inappropriate for some people.
|
||||
# > Are you sure you want to continue?
|
||||
# since it's not disabled by default in crunchyroll account's settings.
|
||||
# See https://github.com/ytdl-org/youtube-dl/issues/7202.
|
||||
qs['skip_wall'] = ['1']
|
||||
return compat_urlparse.urlunparse(
|
||||
parsed_url._replace(query=compat_urllib_parse_urlencode(qs, True)))
|
||||
|
||||
|
||||
class CrunchyrollIE(CrunchyrollBaseIE, VRVBaseIE):
|
||||
IE_NAME = 'crunchyroll'
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://(?:(?P<prefix>www|m)\.)?(?P<url>
|
||||
crunchyroll\.(?:com|fr)/(?:
|
||||
media(?:-|/\?id=)|
|
||||
(?!series/|watch/)(?:[^/]+/){1,2}[^/?&#]*?
|
||||
)(?P<id>[0-9]+)
|
||||
)(?:[/?&#]|$)'''
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
|
||||
'info_dict': {
|
||||
'id': '645513',
|
||||
'ext': 'mp4',
|
||||
'title': 'Wanna be the Strongest in the World Episode 1 – An Idol-Wrestler is Born!',
|
||||
'description': 'md5:2d17137920c64f2f49981a7797d275ef',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'uploader': 'Yomiuri Telecasting Corporation (YTV)',
|
||||
'upload_date': '20131013',
|
||||
'url': 're:(?!.*&)',
|
||||
},
|
||||
'params': {
|
||||
# rtmp
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'Video gone',
|
||||
}, {
|
||||
'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1',
|
||||
'info_dict': {
|
||||
'id': '589804',
|
||||
'ext': 'flv',
|
||||
'title': 'Culture Japan Episode 1 – Rebuilding Japan after the 3.11',
|
||||
'description': 'md5:2fbc01f90b87e8e9137296f37b461c12',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'uploader': 'Danny Choo Network',
|
||||
'upload_date': '20120213',
|
||||
},
|
||||
'params': {
|
||||
# rtmp
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'Video gone',
|
||||
}, {
|
||||
'url': 'http://www.crunchyroll.com/rezero-starting-life-in-another-world-/episode-5-the-morning-of-our-promise-is-still-distant-702409',
|
||||
'info_dict': {
|
||||
'id': '702409',
|
||||
'ext': 'mp4',
|
||||
'title': compat_str,
|
||||
'description': compat_str,
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'uploader': 'Re:Zero Partners',
|
||||
'timestamp': 1462098900,
|
||||
'upload_date': '20160501',
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.crunchyroll.com/konosuba-gods-blessing-on-this-wonderful-world/episode-1-give-me-deliverance-from-this-judicial-injustice-727589',
|
||||
'info_dict': {
|
||||
'id': '727589',
|
||||
'ext': 'mp4',
|
||||
'title': compat_str,
|
||||
'description': compat_str,
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'uploader': 'Kadokawa Pictures Inc.',
|
||||
'timestamp': 1484130900,
|
||||
'upload_date': '20170111',
|
||||
'series': compat_str,
|
||||
'season': "KONOSUBA -God's blessing on this wonderful world! 2",
|
||||
'season_number': 2,
|
||||
'episode': 'Give Me Deliverance From This Judicial Injustice!',
|
||||
'episode_number': 1,
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.crunchyroll.fr/girl-friend-beta/episode-11-goodbye-la-mode-661697',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# geo-restricted (US), 18+ maturity wall, non-premium available
|
||||
'url': 'http://www.crunchyroll.com/cosplay-complex-ova/episode-1-the-birth-of-the-cosplay-club-565617',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# A description with double quotes
|
||||
'url': 'http://www.crunchyroll.com/11eyes/episode-1-piros-jszaka-red-night-535080',
|
||||
'info_dict': {
|
||||
'id': '535080',
|
||||
'ext': 'mp4',
|
||||
'title': compat_str,
|
||||
'description': compat_str,
|
||||
'uploader': 'Marvelous AQL Inc.',
|
||||
'timestamp': 1255512600,
|
||||
'upload_date': '20091014',
|
||||
},
|
||||
'params': {
|
||||
# Just test metadata extraction
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
# make sure we can extract an uploader name that's not a link
|
||||
'url': 'http://www.crunchyroll.com/hakuoki-reimeiroku/episode-1-dawn-of-the-divine-warriors-606899',
|
||||
'info_dict': {
|
||||
'id': '606899',
|
||||
'ext': 'mp4',
|
||||
'title': 'Hakuoki Reimeiroku Episode 1 – Dawn of the Divine Warriors',
|
||||
'description': 'Ryunosuke was left to die, but Serizawa-san asked him a simple question "Do you want to live?"',
|
||||
'uploader': 'Geneon Entertainment',
|
||||
'upload_date': '20120717',
|
||||
},
|
||||
'params': {
|
||||
# just test metadata extraction
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'Video gone',
|
||||
}, {
|
||||
# A video with a vastly different season name compared to the series name
|
||||
'url': 'http://www.crunchyroll.com/nyarko-san-another-crawling-chaos/episode-1-test-590532',
|
||||
'info_dict': {
|
||||
'id': '590532',
|
||||
'ext': 'mp4',
|
||||
'title': compat_str,
|
||||
'description': compat_str,
|
||||
'uploader': 'TV TOKYO',
|
||||
'timestamp': 1330956000,
|
||||
'upload_date': '20120305',
|
||||
'series': 'Nyarko-san: Another Crawling Chaos',
|
||||
'season': 'Haiyoru! Nyaruani (ONA)',
|
||||
},
|
||||
'params': {
|
||||
# Just test metadata extraction
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.crunchyroll.com/media-723735',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.crunchyroll.com/en-gb/mob-psycho-100/episode-2-urban-legends-encountering-rumors-780921',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
_FORMAT_IDS = {
|
||||
'360': ('60', '106'),
|
||||
'480': ('61', '106'),
|
||||
'720': ('62', '106'),
|
||||
'1080': ('80', '108'),
|
||||
}
|
||||
|
||||
def _download_webpage(self, url_or_request, *args, **kwargs):
|
||||
request = (url_or_request if isinstance(url_or_request, urllib.request.Request)
|
||||
else sanitized_Request(url_or_request))
|
||||
# Accept-Language must be set explicitly to accept any language to avoid issues
|
||||
# similar to https://github.com/ytdl-org/youtube-dl/issues/6797.
|
||||
# Along with IP address Crunchyroll uses Accept-Language to guess whether georestriction
|
||||
# should be imposed or not (from what I can see it just takes the first language
|
||||
# ignoring the priority and requires it to correspond the IP). By the way this causes
|
||||
# Crunchyroll to not work in georestriction cases in some browsers that don't place
|
||||
# the locale lang first in header. However allowing any language seems to workaround the issue.
|
||||
request.add_header('Accept-Language', '*')
|
||||
return super(CrunchyrollBaseIE, self)._download_webpage(request, *args, **kwargs)
|
||||
|
||||
def _decrypt_subtitles(self, data, iv, id):
|
||||
data = bytes_to_intlist(compat_b64decode(data))
|
||||
iv = bytes_to_intlist(compat_b64decode(iv))
|
||||
id = int(id)
|
||||
|
||||
def obfuscate_key_aux(count, modulo, start):
|
||||
output = list(start)
|
||||
for _ in range(count):
|
||||
output.append(output[-1] + output[-2])
|
||||
# cut off start values
|
||||
output = output[2:]
|
||||
output = list(map(lambda x: x % modulo + 33, output))
|
||||
return output
|
||||
|
||||
def obfuscate_key(key):
|
||||
num1 = int(floor(pow(2, 25) * sqrt(6.9)))
|
||||
num2 = (num1 ^ key) << 5
|
||||
num3 = key ^ num1
|
||||
num4 = num3 ^ (num3 >> 3) ^ num2
|
||||
prefix = intlist_to_bytes(obfuscate_key_aux(20, 97, (1, 2)))
|
||||
shaHash = bytes_to_intlist(sha1(prefix + str(num4).encode('ascii')).digest())
|
||||
# Extend 160 Bit hash to 256 Bit
|
||||
return shaHash + [0] * 12
|
||||
|
||||
key = obfuscate_key(id)
|
||||
|
||||
decrypted_data = intlist_to_bytes(aes_cbc_decrypt(data, key, iv))
|
||||
return zlib.decompress(decrypted_data)
|
||||
|
||||
def _convert_subtitles_to_srt(self, sub_root):
|
||||
output = ''
|
||||
|
||||
for i, event in enumerate(sub_root.findall('./events/event'), 1):
|
||||
start = event.attrib['start'].replace('.', ',')
|
||||
end = event.attrib['end'].replace('.', ',')
|
||||
text = event.attrib['text'].replace('\\N', '\n')
|
||||
output += '%d\n%s --> %s\n%s\n\n' % (i, start, end, text)
|
||||
return output
|
||||
|
||||
def _convert_subtitles_to_ass(self, sub_root):
|
||||
output = ''
|
||||
|
||||
def ass_bool(strvalue):
|
||||
assvalue = '0'
|
||||
if strvalue == '1':
|
||||
assvalue = '-1'
|
||||
return assvalue
|
||||
|
||||
output = '[Script Info]\n'
|
||||
output += 'Title: %s\n' % sub_root.attrib['title']
|
||||
output += 'ScriptType: v4.00+\n'
|
||||
output += 'WrapStyle: %s\n' % sub_root.attrib['wrap_style']
|
||||
output += 'PlayResX: %s\n' % sub_root.attrib['play_res_x']
|
||||
output += 'PlayResY: %s\n' % sub_root.attrib['play_res_y']
|
||||
output += """
|
||||
[V4+ Styles]
|
||||
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
|
||||
"""
|
||||
for style in sub_root.findall('./styles/style'):
|
||||
output += 'Style: ' + style.attrib['name']
|
||||
output += ',' + style.attrib['font_name']
|
||||
output += ',' + style.attrib['font_size']
|
||||
output += ',' + style.attrib['primary_colour']
|
||||
output += ',' + style.attrib['secondary_colour']
|
||||
output += ',' + style.attrib['outline_colour']
|
||||
output += ',' + style.attrib['back_colour']
|
||||
output += ',' + ass_bool(style.attrib['bold'])
|
||||
output += ',' + ass_bool(style.attrib['italic'])
|
||||
output += ',' + ass_bool(style.attrib['underline'])
|
||||
output += ',' + ass_bool(style.attrib['strikeout'])
|
||||
output += ',' + style.attrib['scale_x']
|
||||
output += ',' + style.attrib['scale_y']
|
||||
output += ',' + style.attrib['spacing']
|
||||
output += ',' + style.attrib['angle']
|
||||
output += ',' + style.attrib['border_style']
|
||||
output += ',' + style.attrib['outline']
|
||||
output += ',' + style.attrib['shadow']
|
||||
output += ',' + style.attrib['alignment']
|
||||
output += ',' + style.attrib['margin_l']
|
||||
output += ',' + style.attrib['margin_r']
|
||||
output += ',' + style.attrib['margin_v']
|
||||
output += ',' + style.attrib['encoding']
|
||||
output += '\n'
|
||||
|
||||
output += """
|
||||
[Events]
|
||||
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
|
||||
"""
|
||||
for event in sub_root.findall('./events/event'):
|
||||
output += 'Dialogue: 0'
|
||||
output += ',' + event.attrib['start']
|
||||
output += ',' + event.attrib['end']
|
||||
output += ',' + event.attrib['style']
|
||||
output += ',' + event.attrib['name']
|
||||
output += ',' + event.attrib['margin_l']
|
||||
output += ',' + event.attrib['margin_r']
|
||||
output += ',' + event.attrib['margin_v']
|
||||
output += ',' + event.attrib['effect']
|
||||
output += ',' + event.attrib['text']
|
||||
output += '\n'
|
||||
|
||||
return output
|
||||
|
||||
def _extract_subtitles(self, subtitle):
|
||||
sub_root = compat_etree_fromstring(subtitle)
|
||||
return [{
|
||||
'ext': 'srt',
|
||||
'data': self._convert_subtitles_to_srt(sub_root),
|
||||
}, {
|
||||
'ext': 'ass',
|
||||
'data': self._convert_subtitles_to_ass(sub_root),
|
||||
}]
|
||||
|
||||
def _get_subtitles(self, video_id, webpage):
|
||||
subtitles = {}
|
||||
for sub_id, sub_name in re.findall(r'\bssid=([0-9]+)"[^>]+?\btitle="([^"]+)', webpage):
|
||||
sub_doc = self._call_rpc_api(
|
||||
'Subtitle_GetXml', video_id,
|
||||
'Downloading subtitles for ' + sub_name, data={
|
||||
'subtitle_script_id': sub_id,
|
||||
})
|
||||
if not isinstance(sub_doc, xml.etree.ElementTree.Element):
|
||||
continue
|
||||
sid = sub_doc.get('id')
|
||||
iv = xpath_text(sub_doc, 'iv', 'subtitle iv')
|
||||
data = xpath_text(sub_doc, 'data', 'subtitle data')
|
||||
if not sid or not iv or not data:
|
||||
continue
|
||||
subtitle = self._decrypt_subtitles(data, iv, sid).decode('utf-8')
|
||||
lang_code = self._search_regex(r'lang_code=["\']([^"\']+)', subtitle, 'subtitle_lang_code', fatal=False)
|
||||
if not lang_code:
|
||||
continue
|
||||
subtitles[lang_code] = self._extract_subtitles(subtitle)
|
||||
return subtitles
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = self._match_valid_url(url)
|
||||
video_id = mobj.group('id')
|
||||
|
||||
if mobj.group('prefix') == 'm':
|
||||
mobile_webpage = self._download_webpage(url, video_id, 'Downloading mobile webpage')
|
||||
webpage_url = self._search_regex(r'<link rel="canonical" href="([^"]+)" />', mobile_webpage, 'webpage_url')
|
||||
else:
|
||||
webpage_url = 'http://www.' + mobj.group('url')
|
||||
|
||||
webpage = self._download_webpage(
|
||||
self._add_skip_wall(webpage_url), video_id,
|
||||
headers=self.geo_verification_headers())
|
||||
if re.search(r'<div id="preload-data">', webpage):
|
||||
return self._redirect_to_beta(webpage, CrunchyrollBetaIE.ie_key(), video_id)
|
||||
note_m = self._html_search_regex(
|
||||
r'<div class="showmedia-trailer-notice">(.+?)</div>',
|
||||
webpage, 'trailer-notice', default='')
|
||||
if note_m:
|
||||
raise ExtractorError(note_m, expected=True)
|
||||
|
||||
mobj = re.search(r'Page\.messaging_box_controller\.addItems\(\[(?P<msg>{.+?})\]\)', webpage)
|
||||
if mobj:
|
||||
msg = json.loads(mobj.group('msg'))
|
||||
if msg.get('type') == 'error':
|
||||
raise ExtractorError('crunchyroll returned error: %s' % msg['message_body'], expected=True)
|
||||
|
||||
if 'To view this, please log in to verify you are 18 or older.' in webpage:
|
||||
self.raise_login_required()
|
||||
|
||||
media = self._parse_json(self._search_regex(
|
||||
r'vilos\.config\.media\s*=\s*({.+?});',
|
||||
webpage, 'vilos media', default='{}'), video_id)
|
||||
media_metadata = media.get('metadata') or {}
|
||||
|
||||
language = self._search_regex(
|
||||
r'(?:vilos\.config\.player\.language|LOCALE)\s*=\s*(["\'])(?P<lang>(?:(?!\1).)+)\1',
|
||||
webpage, 'language', default=None, group='lang')
|
||||
|
||||
video_title = self._html_search_regex(
|
||||
(r'(?s)<h1[^>]*>((?:(?!<h1).)*?<(?:span[^>]+itemprop=["\']title["\']|meta[^>]+itemprop=["\']position["\'])[^>]*>(?:(?!<h1).)+?)</h1>',
|
||||
r'<title>(.+?),\s+-\s+.+? Crunchyroll'),
|
||||
webpage, 'video_title', default=None)
|
||||
if not video_title:
|
||||
video_title = re.sub(r'^Watch\s+', '', self._og_search_description(webpage))
|
||||
video_title = re.sub(r' {2,}', ' ', video_title)
|
||||
video_description = (self._parse_json(self._html_search_regex(
|
||||
r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
|
||||
webpage, 'description', default='{}'), video_id) or media_metadata).get('description')
|
||||
|
||||
thumbnails = []
|
||||
thumbnail_url = (self._parse_json(self._html_search_regex(
|
||||
r'<script type="application\/ld\+json">\n\s*(.+?)<\/script>',
|
||||
webpage, 'thumbnail_url', default='{}'), video_id)).get('image')
|
||||
if thumbnail_url:
|
||||
thumbnails.append({
|
||||
'url': thumbnail_url,
|
||||
'width': 1920,
|
||||
'height': 1080
|
||||
})
|
||||
|
||||
if video_description:
|
||||
video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
|
||||
video_uploader = self._html_search_regex(
|
||||
# try looking for both an uploader that's a link and one that's not
|
||||
[r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
|
||||
webpage, 'video_uploader', default=False)
|
||||
|
||||
requested_languages = self._configuration_arg('language')
|
||||
requested_hardsubs = [('' if val == 'none' else val) for val in self._configuration_arg('hardsub')]
|
||||
language_preference = qualities((requested_languages or [language or ''])[::-1])
|
||||
hardsub_preference = qualities((requested_hardsubs or ['', language or ''])[::-1])
|
||||
|
||||
formats = []
|
||||
for stream in media.get('streams', []):
|
||||
audio_lang = stream.get('audio_lang') or ''
|
||||
hardsub_lang = stream.get('hardsub_lang') or ''
|
||||
if (requested_languages and audio_lang.lower() not in requested_languages
|
||||
or requested_hardsubs and hardsub_lang.lower() not in requested_hardsubs):
|
||||
continue
|
||||
vrv_formats = self._extract_vrv_formats(
|
||||
stream.get('url'), video_id, stream.get('format'),
|
||||
audio_lang, hardsub_lang)
|
||||
for f in vrv_formats:
|
||||
f['language_preference'] = language_preference(audio_lang)
|
||||
f['quality'] = hardsub_preference(hardsub_lang)
|
||||
formats.extend(vrv_formats)
|
||||
if not formats:
|
||||
available_fmts = []
|
||||
for a, fmt in re.findall(r'(<a[^>]+token=["\']showmedia\.([0-9]{3,4})p["\'][^>]+>)', webpage):
|
||||
attrs = extract_attributes(a)
|
||||
href = attrs.get('href')
|
||||
if href and '/freetrial' in href:
|
||||
continue
|
||||
available_fmts.append(fmt)
|
||||
if not available_fmts:
|
||||
for p in (r'token=["\']showmedia\.([0-9]{3,4})p"', r'showmedia\.([0-9]{3,4})p'):
|
||||
available_fmts = re.findall(p, webpage)
|
||||
if available_fmts:
|
||||
break
|
||||
if not available_fmts:
|
||||
available_fmts = self._FORMAT_IDS.keys()
|
||||
video_encode_ids = []
|
||||
|
||||
for fmt in available_fmts:
|
||||
stream_quality, stream_format = self._FORMAT_IDS[fmt]
|
||||
video_format = fmt + 'p'
|
||||
stream_infos = []
|
||||
streamdata = self._call_rpc_api(
|
||||
'VideoPlayer_GetStandardConfig', video_id,
|
||||
'Downloading media info for %s' % video_format, data={
|
||||
'media_id': video_id,
|
||||
'video_format': stream_format,
|
||||
'video_quality': stream_quality,
|
||||
'current_page': url,
|
||||
})
|
||||
if isinstance(streamdata, xml.etree.ElementTree.Element):
|
||||
stream_info = streamdata.find('./{default}preload/stream_info')
|
||||
if stream_info is not None:
|
||||
stream_infos.append(stream_info)
|
||||
stream_info = self._call_rpc_api(
|
||||
'VideoEncode_GetStreamInfo', video_id,
|
||||
'Downloading stream info for %s' % video_format, data={
|
||||
'media_id': video_id,
|
||||
'video_format': stream_format,
|
||||
'video_encode_quality': stream_quality,
|
||||
})
|
||||
if isinstance(stream_info, xml.etree.ElementTree.Element):
|
||||
stream_infos.append(stream_info)
|
||||
for stream_info in stream_infos:
|
||||
video_encode_id = xpath_text(stream_info, './video_encode_id')
|
||||
if video_encode_id in video_encode_ids:
|
||||
continue
|
||||
video_encode_ids.append(video_encode_id)
|
||||
|
||||
video_file = xpath_text(stream_info, './file')
|
||||
if not video_file:
|
||||
continue
|
||||
if video_file.startswith('http'):
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
video_file, video_id, 'mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
continue
|
||||
|
||||
video_url = xpath_text(stream_info, './host')
|
||||
if not video_url:
|
||||
continue
|
||||
metadata = stream_info.find('./metadata')
|
||||
format_info = {
|
||||
'format': video_format,
|
||||
'height': int_or_none(xpath_text(metadata, './height')),
|
||||
'width': int_or_none(xpath_text(metadata, './width')),
|
||||
}
|
||||
|
||||
if '.fplive.net/' in video_url:
|
||||
video_url = re.sub(r'^rtmpe?://', 'http://', video_url.strip())
|
||||
parsed_video_url = compat_urlparse.urlparse(video_url)
|
||||
direct_video_url = compat_urlparse.urlunparse(parsed_video_url._replace(
|
||||
netloc='v.lvlt.crcdn.net',
|
||||
path='%s/%s' % (remove_end(parsed_video_url.path, '/'), video_file.split(':')[-1])))
|
||||
if self._is_valid_url(direct_video_url, video_id, video_format):
|
||||
format_info.update({
|
||||
'format_id': 'http-' + video_format,
|
||||
'url': direct_video_url,
|
||||
})
|
||||
formats.append(format_info)
|
||||
continue
|
||||
|
||||
format_info.update({
|
||||
'format_id': 'rtmp-' + video_format,
|
||||
'url': video_url,
|
||||
'play_path': video_file,
|
||||
'ext': 'flv',
|
||||
})
|
||||
formats.append(format_info)
|
||||
self._sort_formats(formats)
|
||||
|
||||
metadata = self._call_rpc_api(
|
||||
'VideoPlayer_GetMediaMetadata', video_id,
|
||||
note='Downloading media info', data={
|
||||
'media_id': video_id,
|
||||
})
|
||||
|
||||
subtitles = {}
|
||||
for subtitle in media.get('subtitles', []):
|
||||
subtitle_url = subtitle.get('url')
|
||||
if not subtitle_url:
|
||||
continue
|
||||
subtitles.setdefault(subtitle.get('language', 'enUS'), []).append({
|
||||
'url': subtitle_url,
|
||||
'ext': subtitle.get('format', 'ass'),
|
||||
})
|
||||
if not subtitles:
|
||||
subtitles = self.extract_subtitles(video_id, webpage)
|
||||
|
||||
# webpage provide more accurate data than series_title from XML
|
||||
series = self._html_search_regex(
|
||||
r'(?s)<h\d[^>]+\bid=["\']showmedia_about_episode_num[^>]+>(.+?)</h\d',
|
||||
webpage, 'series', fatal=False)
|
||||
|
||||
season = episode = episode_number = duration = None
|
||||
|
||||
if isinstance(metadata, xml.etree.ElementTree.Element):
|
||||
season = xpath_text(metadata, 'series_title')
|
||||
episode = xpath_text(metadata, 'episode_title')
|
||||
episode_number = int_or_none(xpath_text(metadata, 'episode_number'))
|
||||
duration = float_or_none(media_metadata.get('duration'), 1000)
|
||||
|
||||
if not episode:
|
||||
episode = media_metadata.get('title')
|
||||
if not episode_number:
|
||||
episode_number = int_or_none(media_metadata.get('episode_number'))
|
||||
thumbnail_url = try_get(media, lambda x: x['thumbnail']['url'])
|
||||
if thumbnail_url:
|
||||
thumbnails.append({
|
||||
'url': thumbnail_url,
|
||||
'width': 640,
|
||||
'height': 360
|
||||
})
|
||||
|
||||
season_number = int_or_none(self._search_regex(
|
||||
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
|
||||
webpage, 'season number', default=None))
|
||||
|
||||
info = self._search_json_ld(webpage, video_id, default={})
|
||||
|
||||
return merge_dicts({
|
||||
'id': video_id,
|
||||
'title': video_title,
|
||||
'description': video_description,
|
||||
'duration': duration,
|
||||
'thumbnails': thumbnails,
|
||||
'uploader': video_uploader,
|
||||
'series': series,
|
||||
'season': season,
|
||||
'season_number': season_number,
|
||||
'episode': episode,
|
||||
'episode_number': episode_number,
|
||||
'subtitles': subtitles,
|
||||
'formats': formats,
|
||||
}, info)
|
||||
|
||||
|
||||
class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
|
||||
IE_NAME = 'crunchyroll:playlist'
|
||||
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?:\w{2}(?:-\w{2})?/)?(?!(?:news|anime-news|library|forum|launchcalendar|lineup|store|comics|freetrial|login|media-\d+))(?P<id>[\w\-]+))/?(?:\?|$)'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'https://www.crunchyroll.com/a-bridge-to-the-starry-skies-hoshizora-e-kakaru-hashi',
|
||||
'info_dict': {
|
||||
'id': 'a-bridge-to-the-starry-skies-hoshizora-e-kakaru-hashi',
|
||||
'title': 'A Bridge to the Starry Skies - Hoshizora e Kakaru Hashi'
|
||||
},
|
||||
'playlist_count': 13,
|
||||
}, {
|
||||
# geo-restricted (US), 18+ maturity wall, non-premium available
|
||||
'url': 'http://www.crunchyroll.com/cosplay-complex-ova',
|
||||
'info_dict': {
|
||||
'id': 'cosplay-complex-ova',
|
||||
'title': 'Cosplay Complex OVA'
|
||||
},
|
||||
'playlist_count': 3,
|
||||
'skip': 'Georestricted',
|
||||
}, {
|
||||
# geo-restricted (US), 18+ maturity wall, non-premium will be available since 2015.11.14
|
||||
'url': 'http://www.crunchyroll.com/ladies-versus-butlers?skip_wall=1',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.crunchyroll.com/fr/ladies-versus-butlers',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
show_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(
|
||||
# https:// gives a 403, but http:// does not
|
||||
self._add_skip_wall(url).replace('https://', 'http://'), show_id,
|
||||
headers=self.geo_verification_headers())
|
||||
if re.search(r'<div id="preload-data">', webpage):
|
||||
return self._redirect_to_beta(webpage, CrunchyrollBetaShowIE.ie_key(), show_id)
|
||||
title = self._html_search_meta('name', webpage, default=None)
|
||||
|
||||
episode_re = r'<li id="showview_videos_media_(\d+)"[^>]+>.*?<a href="([^"]+)"'
|
||||
season_re = r'<a [^>]+season-dropdown[^>]+>([^<]+)'
|
||||
paths = re.findall(f'(?s){episode_re}|{season_re}', webpage)
|
||||
|
||||
entries, current_season = [], None
|
||||
for ep_id, ep, season in paths:
|
||||
if season:
|
||||
current_season = season
|
||||
continue
|
||||
entries.append(self.url_result(
|
||||
f'http://www.crunchyroll.com{ep}', CrunchyrollIE.ie_key(), ep_id, season=current_season))
|
||||
|
||||
return {
|
||||
'_type': 'playlist',
|
||||
'id': show_id,
|
||||
'title': title,
|
||||
'entries': reversed(entries),
|
||||
}
|
||||
|
||||
|
||||
class CrunchyrollBetaBaseIE(CrunchyrollBaseIE):
|
||||
params = None
|
||||
|
||||
def _get_params(self, lang):
|
||||
if not CrunchyrollBetaBaseIE.params:
|
||||
if self._get_cookies(f'https://beta.crunchyroll.com/{lang}').get('etp_rt'):
|
||||
if not CrunchyrollBaseIE.params:
|
||||
if self._get_cookies(f'https://www.crunchyroll.com/{lang}').get('etp_rt'):
|
||||
grant_type, key = 'etp_rt_cookie', 'accountAuthClientId'
|
||||
else:
|
||||
grant_type, key = 'client_id', 'anonClientId'
|
||||
|
||||
initial_state, app_config = self._get_beta_embedded_json(self._download_webpage(
|
||||
f'https://beta.crunchyroll.com/{lang}', None, note='Retrieving main page'), None)
|
||||
api_domain = app_config['cxApiParams']['apiDomain']
|
||||
initial_state, app_config = self._get_embedded_json(self._download_webpage(
|
||||
f'https://www.crunchyroll.com/{lang}', None, note='Retrieving main page'), None)
|
||||
api_domain = app_config['cxApiParams']['apiDomain'].replace('beta.crunchyroll.com', 'www.crunchyroll.com')
|
||||
|
||||
auth_response = self._download_json(
|
||||
f'{api_domain}/auth/v1/token', None, note=f'Authenticating with grant_type={grant_type}',
|
||||
@ -739,7 +77,7 @@ def _get_params(self, lang):
|
||||
headers={
|
||||
'Authorization': auth_response['token_type'] + ' ' + auth_response['access_token']
|
||||
})
|
||||
cms = traverse_obj(policy_response, 'cms_beta', 'cms')
|
||||
cms = policy_response.get('cms_web')
|
||||
bucket = cms['bucket']
|
||||
params = {
|
||||
'Policy': cms['policy'],
|
||||
@ -749,19 +87,19 @@ def _get_params(self, lang):
|
||||
locale = traverse_obj(initial_state, ('localization', 'locale'))
|
||||
if locale:
|
||||
params['locale'] = locale
|
||||
CrunchyrollBetaBaseIE.params = (api_domain, bucket, params)
|
||||
return CrunchyrollBetaBaseIE.params
|
||||
CrunchyrollBaseIE.params = (api_domain, bucket, params)
|
||||
return CrunchyrollBaseIE.params
|
||||
|
||||
|
||||
class CrunchyrollBetaIE(CrunchyrollBetaBaseIE):
|
||||
IE_NAME = 'crunchyroll:beta'
|
||||
class CrunchyrollBetaIE(CrunchyrollBaseIE):
|
||||
IE_NAME = 'crunchyroll'
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://beta\.crunchyroll\.com/
|
||||
https?://(?:beta|www)\.crunchyroll\.com/
|
||||
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
|
||||
watch/(?P<id>\w+)
|
||||
(?:/(?P<display_id>[\w-]+))?/?(?:[?#]|$)'''
|
||||
_TESTS = [{
|
||||
'url': 'https://beta.crunchyroll.com/watch/GY2P1Q98Y/to-the-future',
|
||||
'url': 'https://www.crunchyroll.com/watch/GY2P1Q98Y/to-the-future',
|
||||
'info_dict': {
|
||||
'id': 'GY2P1Q98Y',
|
||||
'ext': 'mp4',
|
||||
@ -777,11 +115,11 @@ class CrunchyrollBetaIE(CrunchyrollBetaBaseIE):
|
||||
'season_number': 1,
|
||||
'episode': 'To the Future',
|
||||
'episode_number': 73,
|
||||
'thumbnail': r're:^https://beta.crunchyroll.com/imgsrv/.*\.jpeg$',
|
||||
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg$',
|
||||
},
|
||||
'params': {'skip_download': 'm3u8', 'format': 'all[format_id~=hardsub]'},
|
||||
}, {
|
||||
'url': 'https://beta.crunchyroll.com/watch/GYE5WKQGR',
|
||||
'url': 'https://www.crunchyroll.com/watch/GYE5WKQGR',
|
||||
'info_dict': {
|
||||
'id': 'GYE5WKQGR',
|
||||
'ext': 'mp4',
|
||||
@ -797,12 +135,12 @@ class CrunchyrollBetaIE(CrunchyrollBetaBaseIE):
|
||||
'season_number': 1,
|
||||
'episode': 'Porter Robinson presents Shelter the Animation',
|
||||
'episode_number': 0,
|
||||
'thumbnail': r're:^https://beta.crunchyroll.com/imgsrv/.*\.jpeg$',
|
||||
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg$',
|
||||
},
|
||||
'params': {'skip_download': True},
|
||||
'skip': 'Video is Premium only',
|
||||
}, {
|
||||
'url': 'https://beta.crunchyroll.com/watch/GY2P1Q98Y',
|
||||
'url': 'https://www.crunchyroll.com/watch/GY2P1Q98Y',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://beta.crunchyroll.com/pt-br/watch/G8WUN8VKP/the-ruler-of-conspiracy',
|
||||
@ -901,15 +239,15 @@ def _real_extract(self, url):
|
||||
}
|
||||
|
||||
|
||||
class CrunchyrollBetaShowIE(CrunchyrollBetaBaseIE):
|
||||
IE_NAME = 'crunchyroll:playlist:beta'
|
||||
class CrunchyrollBetaShowIE(CrunchyrollBaseIE):
|
||||
IE_NAME = 'crunchyroll:playlist'
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://beta\.crunchyroll\.com/
|
||||
https?://(?:beta|www)\.crunchyroll\.com/
|
||||
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
|
||||
series/(?P<id>\w+)
|
||||
(?:/(?P<display_id>[\w-]+))?/?(?:[?#]|$)'''
|
||||
_TESTS = [{
|
||||
'url': 'https://beta.crunchyroll.com/series/GY19NQ2QR/Girl-Friend-BETA',
|
||||
'url': 'https://www.crunchyroll.com/series/GY19NQ2QR/Girl-Friend-BETA',
|
||||
'info_dict': {
|
||||
'id': 'GY19NQ2QR',
|
||||
'title': 'Girl Friend BETA',
|
||||
@ -942,7 +280,7 @@ def entries():
|
||||
episode_display_id = episode['slug_title']
|
||||
yield {
|
||||
'_type': 'url',
|
||||
'url': f'https://beta.crunchyroll.com/{lang}watch/{episode_id}/{episode_display_id}',
|
||||
'url': f'https://www.crunchyroll.com/{lang}watch/{episode_id}/{episode_display_id}',
|
||||
'ie_key': CrunchyrollBetaIE.ie_key(),
|
||||
'id': episode_id,
|
||||
'title': '%s Episode %s – %s' % (episode.get('season_title'), episode.get('episode'), episode.get('title')),
|
||||
|
76
yt_dlp/extractor/deuxm.py
Normal file
76
yt_dlp/extractor/deuxm.py
Normal file
@ -0,0 +1,76 @@
|
||||
from .common import InfoExtractor
|
||||
from ..utils import url_or_none
|
||||
|
||||
|
||||
class DeuxMIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?2m\.ma/[^/]+/replay/single/(?P<id>([\w.]{1,24})+)'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'https://2m.ma/fr/replay/single/6351d439b15e1a613b3debe8',
|
||||
'md5': '5f761f04c9d686e553b685134dca5d32',
|
||||
'info_dict': {
|
||||
'id': '6351d439b15e1a613b3debe8',
|
||||
'ext': 'mp4',
|
||||
'title': 'Grand Angle : Jeudi 20 Octobre 2022',
|
||||
'thumbnail': r're:^https?://2msoread-ww.amagi.tv/mediasfiles/videos/images/.*\.png$'
|
||||
}
|
||||
}, {
|
||||
'url': 'https://2m.ma/fr/replay/single/635c0aeab4eec832622356da',
|
||||
'md5': 'ad6af2f5e4d5b2ad2194a84b6e890b4c',
|
||||
'info_dict': {
|
||||
'id': '635c0aeab4eec832622356da',
|
||||
'ext': 'mp4',
|
||||
'title': 'Journal Amazigh : Vendredi 28 Octobre 2022',
|
||||
'thumbnail': r're:^https?://2msoread-ww.amagi.tv/mediasfiles/videos/images/.*\.png$'
|
||||
}
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
video = self._download_json(
|
||||
f'https://2m.ma/api/watchDetail/{video_id}', video_id)['response']['News']
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': video.get('titre'),
|
||||
'url': video['url'],
|
||||
'description': video.get('description'),
|
||||
'thumbnail': url_or_none(video.get('image')),
|
||||
}
|
||||
|
||||
|
||||
class DeuxMNewsIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?2m\.ma/(?P<lang>\w+)/news/(?P<id>[^/#?]+)'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'https://2m.ma/fr/news/Kan-Ya-Mkan-d%C3%A9poussi%C3%A8re-l-histoire-du-phare-du-Cap-Beddouza-20221028',
|
||||
'md5': '43d5e693a53fa0b71e8a5204c7d4542a',
|
||||
'info_dict': {
|
||||
'id': '635c5d1233b83834e35b282e',
|
||||
'ext': 'mp4',
|
||||
'title': 'Kan Ya Mkan d\u00e9poussi\u00e8re l\u2019histoire du phare du Cap Beddouza',
|
||||
'description': 'md5:99dcf29b82f1d7f2a4acafed1d487527',
|
||||
'thumbnail': r're:^https?://2msoread-ww.amagi.tv/mediasfiles/videos/images/.*\.png$'
|
||||
}
|
||||
}, {
|
||||
'url': 'https://2m.ma/fr/news/Interview-Casablanca-hors-des-sentiers-battus-avec-Abderrahim-KASSOU-Replay--20221017',
|
||||
'md5': '7aca29f02230945ef635eb8290283c0c',
|
||||
'info_dict': {
|
||||
'id': '634d9e108b70d40bc51a844b',
|
||||
'ext': 'mp4',
|
||||
'title': 'Interview: Casablanca hors des sentiers battus avec Abderrahim KASSOU (Replay) ',
|
||||
'description': 'md5:3b8e78111de9fcc6ef7f7dd6cff2430c',
|
||||
'thumbnail': r're:^https?://2msoread-ww.amagi.tv/mediasfiles/videos/images/.*\.png$'
|
||||
}
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
article_name, lang = self._match_valid_url(url).group('id', 'lang')
|
||||
video = self._download_json(
|
||||
f'https://2m.ma/api/articlesByUrl?lang={lang}&url=/news/{article_name}', article_name)['response']['article'][0]
|
||||
return {
|
||||
'id': video['id'],
|
||||
'title': video.get('title'),
|
||||
'url': video['image'][0],
|
||||
'description': video.get('content'),
|
||||
'thumbnail': url_or_none(video.get('cover')),
|
||||
}
|
@ -1,4 +1,5 @@
|
||||
from .common import InfoExtractor
|
||||
from ..utils import extract_attributes, get_element_html_by_id
|
||||
|
||||
|
||||
class EpochIE(InfoExtractor):
|
||||
@ -28,13 +29,21 @@ class EpochIE(InfoExtractor):
|
||||
'title': 'Kash Patel: A ‘6-Year-Saga’ of Government Corruption, From Russiagate to Mar-a-Lago',
|
||||
}
|
||||
},
|
||||
{
|
||||
'url': 'https://www.theepochtimes.com/dick-morris-discusses-his-book-the-return-trumps-big-2024-comeback_4819205.html',
|
||||
'info_dict': {
|
||||
'id': '9489f994-2a20-4812-b233-ac0e5c345632',
|
||||
'ext': 'mp4',
|
||||
'title': 'Dick Morris Discusses His Book ‘The Return: Trump’s Big 2024 Comeback’',
|
||||
}
|
||||
},
|
||||
]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
youmaker_video_id = self._search_regex(r'data-trailer="[\w-]+" data-id="([\w-]+)"', webpage, 'url')
|
||||
youmaker_video_id = extract_attributes(get_element_html_by_id('videobox', webpage))['data-id']
|
||||
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
|
||||
f'http://vs1.youmaker.com/assets/{youmaker_video_id}/playlist.m3u8', video_id, 'mp4', m3u8_id='hls')
|
||||
|
||||
|
@ -75,6 +75,29 @@ def _real_extract(self, url):
|
||||
return info
|
||||
|
||||
|
||||
class FoxNewsVideoIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?foxnews\.com/video/(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.foxnews.com/video/6313058664112',
|
||||
'info_dict': {
|
||||
'id': '6313058664112',
|
||||
'ext': 'mp4',
|
||||
'thumbnail': r're:https://.+/1280x720/match/image\.jpg',
|
||||
'upload_date': '20220930',
|
||||
'description': 'New York City, Kids Therapy, Biden',
|
||||
'duration': 2415,
|
||||
'title': 'Gutfeld! - Thursday, September 29',
|
||||
'timestamp': 1664527538,
|
||||
},
|
||||
'expected_warnings': ['Ignoring subtitle tracks'],
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
return self.url_result(f'https://video.foxnews.com/v/{video_id}', FoxNewsIE, video_id)
|
||||
|
||||
|
||||
class FoxNewsArticleIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?(?:insider\.)?foxnews\.com/(?!v)([^/]+/)+(?P<id>[a-z-]+)'
|
||||
IE_NAME = 'foxnews:article'
|
||||
|
86
yt_dlp/extractor/listennotes.py
Normal file
86
yt_dlp/extractor/listennotes.py
Normal file
@ -0,0 +1,86 @@
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
clean_html,
|
||||
extract_attributes,
|
||||
get_element_by_class,
|
||||
get_element_html_by_id,
|
||||
get_element_text_and_html_by_tag,
|
||||
parse_duration,
|
||||
strip_or_none,
|
||||
traverse_obj,
|
||||
try_call,
|
||||
)
|
||||
|
||||
|
||||
class ListenNotesIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?listennotes\.com/podcasts/[^/]+/[^/]+-(?P<id>.+)/'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.listennotes.com/podcasts/thriving-on-overload/tim-oreilly-on-noticing-KrDgvNb_u1n/',
|
||||
'md5': '5b91a32f841e5788fb82b72a1a8af7f7',
|
||||
'info_dict': {
|
||||
'id': 'KrDgvNb_u1n',
|
||||
'ext': 'mp3',
|
||||
'title': 'md5:32236591a921adf17bbdbf0441b6c0e9',
|
||||
'description': 'md5:c581ed197eeddcee55a67cdb547c8cbd',
|
||||
'duration': 2148.0,
|
||||
'channel': 'Thriving on Overload',
|
||||
'channel_id': 'ed84wITivxF',
|
||||
'episode_id': 'e1312583fa7b4e24acfbb5131050be00',
|
||||
'thumbnail': 'https://production.listennotes.com/podcasts/thriving-on-overload-ross-dawson-1wb_KospA3P-ed84wITivxF.300x300.jpg',
|
||||
'channel_url': 'https://www.listennotes.com/podcasts/thriving-on-overload-ross-dawson-ed84wITivxF/',
|
||||
'cast': ['Tim O’Reilly', 'Cookie Monster', 'Lao Tzu', 'Wallace Steven', 'Eric Raymond', 'Christine Peterson', 'John Maynard Keyne', 'Ross Dawson'],
|
||||
}
|
||||
}, {
|
||||
'url': 'https://www.listennotes.com/podcasts/ask-noah-show/episode-177-wireguard-with-lwEA3154JzG/',
|
||||
'md5': '62fb4ffe7fc525632a1138bf72a5ce53',
|
||||
'info_dict': {
|
||||
'id': 'lwEA3154JzG',
|
||||
'ext': 'mp3',
|
||||
'title': 'Episode 177: WireGuard with Jason Donenfeld',
|
||||
'description': 'md5:24744f36456a3e95f83c1193a3458594',
|
||||
'duration': 3861.0,
|
||||
'channel': 'Ask Noah Show',
|
||||
'channel_id': '4DQTzdS5-j7',
|
||||
'episode_id': '8c8954b95e0b4859ad1eecec8bf6d3a4',
|
||||
'channel_url': 'https://www.listennotes.com/podcasts/ask-noah-show-noah-j-chelliah-4DQTzdS5-j7/',
|
||||
'thumbnail': 'https://production.listennotes.com/podcasts/ask-noah-show-noah-j-chelliah-cfbRUw9Gs3F-4DQTzdS5-j7.300x300.jpg',
|
||||
'cast': ['noah showlink', 'noah show', 'noah dashboard', 'jason donenfeld'],
|
||||
}
|
||||
}]
|
||||
|
||||
def _clean_description(self, description):
|
||||
return clean_html(re.sub(r'(</?(div|p)>\s*)+', '<br/><br/>', description or ''))
|
||||
|
||||
def _real_extract(self, url):
|
||||
audio_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, audio_id)
|
||||
data = self._search_json(
|
||||
r'<script id="original-content"[^>]+\btype="application/json">', webpage, 'content', audio_id)
|
||||
data.update(extract_attributes(get_element_html_by_id(
|
||||
r'episode-play-button-toolbar|episode-no-play-button-toolbar', webpage, escape_value=False)))
|
||||
|
||||
duration, description = self._search_regex(
|
||||
r'(?P<duration>[\d:]+)\s*-\s*(?P<description>.+)',
|
||||
self._html_search_meta(['og:description', 'description', 'twitter:description'], webpage),
|
||||
'description', fatal=False, group=('duration', 'description')) or (None, None)
|
||||
|
||||
return {
|
||||
'id': audio_id,
|
||||
'url': data['audio'],
|
||||
'title': (data.get('data-title')
|
||||
or try_call(lambda: get_element_text_and_html_by_tag('h1', webpage)[0])
|
||||
or self._html_search_meta(('og:title', 'title', 'twitter:title'), webpage, 'title')),
|
||||
'description': (self._clean_description(get_element_by_class('ln-text-p', webpage))
|
||||
or strip_or_none(description)),
|
||||
'duration': parse_duration(traverse_obj(data, 'audio_length', 'data-duration') or duration),
|
||||
'episode_id': traverse_obj(data, 'uuid', 'data-episode-uuid'),
|
||||
**traverse_obj(data, {
|
||||
'thumbnail': 'data-image',
|
||||
'channel': 'data-channel-title',
|
||||
'cast': ('nlp_entities', ..., 'name'),
|
||||
'channel_url': 'channel_url',
|
||||
'channel_id': 'channel_short_uuid',
|
||||
})
|
||||
}
|
@ -1,8 +1,12 @@
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
extract_attributes,
|
||||
int_or_none,
|
||||
str_to_int,
|
||||
url_or_none,
|
||||
urlencode_postdata,
|
||||
)
|
||||
|
||||
@ -17,17 +21,20 @@ class ManyVidsIE(InfoExtractor):
|
||||
'id': '133957',
|
||||
'ext': 'mp4',
|
||||
'title': 'everthing about me (Preview)',
|
||||
'uploader': 'ellyxxix',
|
||||
'view_count': int,
|
||||
'like_count': int,
|
||||
},
|
||||
}, {
|
||||
# full video
|
||||
'url': 'https://www.manyvids.com/Video/935718/MY-FACE-REVEAL/',
|
||||
'md5': 'f3e8f7086409e9b470e2643edb96bdcc',
|
||||
'md5': 'bb47bab0e0802c2a60c24ef079dfe60f',
|
||||
'info_dict': {
|
||||
'id': '935718',
|
||||
'ext': 'mp4',
|
||||
'title': 'MY FACE REVEAL',
|
||||
'description': 'md5:ec5901d41808b3746fed90face161612',
|
||||
'uploader': 'Sarah Calanthe',
|
||||
'view_count': int,
|
||||
'like_count': int,
|
||||
},
|
||||
@ -36,17 +43,50 @@ class ManyVidsIE(InfoExtractor):
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
real_url = 'https://www.manyvids.com/video/%s/gtm.js' % (video_id, )
|
||||
try:
|
||||
webpage = self._download_webpage(real_url, video_id)
|
||||
except Exception:
|
||||
# probably useless fallback
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
video_url = self._search_regex(
|
||||
r'data-(?:video-filepath|meta-video)\s*=s*(["\'])(?P<url>(?:(?!\1).)+)\1',
|
||||
webpage, 'video URL', group='url')
|
||||
info = self._search_regex(
|
||||
r'''(<div\b[^>]*\bid\s*=\s*(['"])pageMetaDetails\2[^>]*>)''',
|
||||
webpage, 'meta details', default='')
|
||||
info = extract_attributes(info)
|
||||
|
||||
title = self._html_search_regex(
|
||||
player = self._search_regex(
|
||||
r'''(<div\b[^>]*\bid\s*=\s*(['"])rmpPlayerStream\2[^>]*>)''',
|
||||
webpage, 'player details', default='')
|
||||
player = extract_attributes(player)
|
||||
|
||||
video_urls_and_ids = (
|
||||
(info.get('data-meta-video'), 'video'),
|
||||
(player.get('data-video-transcoded'), 'transcoded'),
|
||||
(player.get('data-video-filepath'), 'filepath'),
|
||||
(self._og_search_video_url(webpage, secure=False, default=None), 'og_video'),
|
||||
)
|
||||
|
||||
def txt_or_none(s, default=None):
|
||||
return (s.strip() or default) if isinstance(s, str) else default
|
||||
|
||||
uploader = txt_or_none(info.get('data-meta-author'))
|
||||
|
||||
def mung_title(s):
|
||||
if uploader:
|
||||
s = re.sub(r'^\s*%s\s+[|-]' % (re.escape(uploader), ), '', s)
|
||||
return txt_or_none(s)
|
||||
|
||||
title = (
|
||||
mung_title(info.get('data-meta-title'))
|
||||
or self._html_search_regex(
|
||||
(r'<span[^>]+class=["\']item-title[^>]+>([^<]+)',
|
||||
r'<h2[^>]+class=["\']h2 m-0["\'][^>]*>([^<]+)'),
|
||||
webpage, 'title', default=None) or self._html_search_meta(
|
||||
'twitter:title', webpage, 'title', fatal=True)
|
||||
webpage, 'title', default=None)
|
||||
or self._html_search_meta(
|
||||
'twitter:title', webpage, 'title', fatal=True))
|
||||
|
||||
title = re.sub(r'\s*[|-]\s+ManyVids\s*$', '', title) or title
|
||||
|
||||
if any(p in webpage for p in ('preview_videos', '_preview.mp4')):
|
||||
title += ' (Preview)'
|
||||
@ -59,7 +99,8 @@ def _real_extract(self, url):
|
||||
# Sets some cookies
|
||||
self._download_webpage(
|
||||
'https://www.manyvids.com/includes/ajax_repository/you_had_me_at_hello.php',
|
||||
video_id, fatal=False, data=urlencode_postdata({
|
||||
video_id, note='Setting format cookies', fatal=False,
|
||||
data=urlencode_postdata({
|
||||
'mvtoken': mv_token,
|
||||
'vid': video_id,
|
||||
}), headers={
|
||||
@ -67,24 +108,56 @@ def _real_extract(self, url):
|
||||
'X-Requested-With': 'XMLHttpRequest'
|
||||
})
|
||||
|
||||
if determine_ext(video_url) == 'm3u8':
|
||||
formats = self._extract_m3u8_formats(
|
||||
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls')
|
||||
formats = []
|
||||
for v_url, fmt in video_urls_and_ids:
|
||||
v_url = url_or_none(v_url)
|
||||
if not v_url:
|
||||
continue
|
||||
if determine_ext(v_url) == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
v_url, video_id, 'mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls'))
|
||||
else:
|
||||
formats = [{'url': video_url}]
|
||||
formats.append({
|
||||
'url': v_url,
|
||||
'format_id': fmt,
|
||||
})
|
||||
|
||||
like_count = int_or_none(self._search_regex(
|
||||
r'data-likes=["\'](\d+)', webpage, 'like count', default=None))
|
||||
view_count = str_to_int(self._html_search_regex(
|
||||
r'(?s)<span[^>]+class="views-wrapper"[^>]*>(.+?)</span', webpage,
|
||||
'view count', default=None))
|
||||
self._remove_duplicate_formats(formats)
|
||||
|
||||
for f in formats:
|
||||
if f.get('height') is None:
|
||||
f['height'] = int_or_none(
|
||||
self._search_regex(r'_(\d{2,3}[02468])_', f['url'], 'video height', default=None))
|
||||
if '/preview/' in f['url']:
|
||||
f['format_id'] = '_'.join(filter(None, (f.get('format_id'), 'preview')))
|
||||
f['preference'] = -10
|
||||
if 'transcoded' in f['format_id']:
|
||||
f['preference'] = f.get('preference', -1) - 1
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
def get_likes():
|
||||
likes = self._search_regex(
|
||||
r'''(<a\b[^>]*\bdata-id\s*=\s*(['"])%s\2[^>]*>)''' % (video_id, ),
|
||||
webpage, 'likes', default='')
|
||||
likes = extract_attributes(likes)
|
||||
return int_or_none(likes.get('data-likes'))
|
||||
|
||||
def get_views():
|
||||
return str_to_int(self._html_search_regex(
|
||||
r'''(?s)<span\b[^>]*\bclass\s*=["']views-wrapper\b[^>]+>.+?<span\b[^>]+>\s*(\d[\d,.]*)\s*</span>''',
|
||||
webpage, 'view count', default=None))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'view_count': view_count,
|
||||
'like_count': like_count,
|
||||
'formats': formats,
|
||||
'uploader': self._html_search_regex(r'<meta[^>]+name="author"[^>]*>([^<]+)', webpage, 'uploader'),
|
||||
'description': txt_or_none(info.get('data-meta-description')),
|
||||
'uploader': txt_or_none(info.get('data-meta-author')),
|
||||
'thumbnail': (
|
||||
url_or_none(info.get('data-meta-image'))
|
||||
or url_or_none(player.get('data-video-screenshot'))),
|
||||
'view_count': get_views(),
|
||||
'like_count': get_likes(),
|
||||
}
|
||||
|
@ -69,7 +69,7 @@ class MotherlessIE(InfoExtractor):
|
||||
'title': 'a/ Hot Teens',
|
||||
'categories': list,
|
||||
'upload_date': '20210104',
|
||||
'uploader_id': 'yonbiw',
|
||||
'uploader_id': 'anonymous',
|
||||
'thumbnail': r're:https?://.*\.jpg',
|
||||
'age_limit': 18,
|
||||
},
|
||||
@ -123,11 +123,12 @@ def _real_extract(self, url):
|
||||
kwargs = {_AGO_UNITS.get(uploaded_ago[-1]): delta}
|
||||
upload_date = (datetime.datetime.utcnow() - datetime.timedelta(**kwargs)).strftime('%Y%m%d')
|
||||
|
||||
comment_count = webpage.count('class="media-comment-contents"')
|
||||
comment_count = len(re.findall(r'''class\s*=\s*['"]media-comment-contents\b''', webpage))
|
||||
uploader_id = self._html_search_regex(
|
||||
(r'"media-meta-member">\s+<a href="/m/([^"]+)"',
|
||||
r'<span\b[^>]+\bclass="username">([^<]+)</span>'),
|
||||
(r'''<span\b[^>]+\bclass\s*=\s*["']username\b[^>]*>([^<]+)</span>''',
|
||||
r'''(?s)['"](?:media-meta-member|thumb-member-username)\b[^>]+>\s*<a\b[^>]+\bhref\s*=\s*['"]/m/([^"']+)'''),
|
||||
webpage, 'uploader_id', fatal=False)
|
||||
|
||||
categories = self._html_search_meta('keywords', webpage, default=None)
|
||||
if categories:
|
||||
categories = [cat.strip() for cat in categories.split(',')]
|
||||
@ -217,19 +218,19 @@ def _real_extract(self, url):
|
||||
r'<title>([\w\s]+\w)\s+-', webpage, 'title', fatal=False)
|
||||
description = self._html_search_meta(
|
||||
'description', webpage, fatal=False)
|
||||
page_count = self._int(self._search_regex(
|
||||
r'(\d+)</(?:a|span)><(?:a|span)[^>]+rel="next">',
|
||||
webpage, 'page_count', default=0), 'page_count')
|
||||
page_count = str_to_int(self._search_regex(
|
||||
r'(\d+)\s*</(?:a|span)>\s*<(?:a|span)[^>]+(?:>\s*NEXT|\brel\s*=\s*["\']?next)\b',
|
||||
webpage, 'page_count', default=0))
|
||||
if not page_count:
|
||||
message = self._search_regex(
|
||||
r'class="error-page"[^>]*>\s*<p[^>]*>\s*(?P<error_msg>[^<]+)(?<=\S)\s*',
|
||||
r'''class\s*=\s*['"]error-page\b[^>]*>\s*<p[^>]*>\s*(?P<error_msg>[^<]+)(?<=\S)\s*''',
|
||||
webpage, 'error_msg', default=None) or 'This group has no videos.'
|
||||
self.report_warning(message, group_id)
|
||||
page_count = 1
|
||||
PAGE_SIZE = 80
|
||||
|
||||
def _get_page(idx):
|
||||
if not page_count:
|
||||
return
|
||||
if idx > 0:
|
||||
webpage = self._download_webpage(
|
||||
page_url, group_id, query={'page': idx + 1},
|
||||
note='Downloading page %d/%d' % (idx + 1, page_count)
|
||||
|
@ -1,12 +1,26 @@
|
||||
import itertools
|
||||
import json
|
||||
import re
|
||||
import time
|
||||
from base64 import b64encode
|
||||
from binascii import hexlify
|
||||
from datetime import datetime
|
||||
from hashlib import md5
|
||||
from random import randint
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str, compat_urllib_parse_urlencode
|
||||
from ..utils import float_or_none, sanitized_Request
|
||||
from ..aes import aes_ecb_encrypt, pkcs7_padding
|
||||
from ..compat import compat_urllib_parse_urlencode
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
bytes_to_intlist,
|
||||
error_to_compat_str,
|
||||
float_or_none,
|
||||
int_or_none,
|
||||
intlist_to_bytes,
|
||||
sanitized_Request,
|
||||
try_get,
|
||||
)
|
||||
|
||||
|
||||
class NetEaseMusicBaseIE(InfoExtractor):
|
||||
@ -17,7 +31,7 @@ class NetEaseMusicBaseIE(InfoExtractor):
|
||||
@classmethod
|
||||
def _encrypt(cls, dfsid):
|
||||
salt_bytes = bytearray(cls._NETEASE_SALT.encode('utf-8'))
|
||||
string_bytes = bytearray(compat_str(dfsid).encode('ascii'))
|
||||
string_bytes = bytearray(str(dfsid).encode('ascii'))
|
||||
salt_len = len(salt_bytes)
|
||||
for i in range(len(string_bytes)):
|
||||
string_bytes[i] = string_bytes[i] ^ salt_bytes[i % salt_len]
|
||||
@ -26,32 +40,105 @@ def _encrypt(cls, dfsid):
|
||||
result = b64encode(m.digest()).decode('ascii')
|
||||
return result.replace('/', '_').replace('+', '-')
|
||||
|
||||
def make_player_api_request_data_and_headers(self, song_id, bitrate):
|
||||
KEY = b'e82ckenh8dichen8'
|
||||
URL = '/api/song/enhance/player/url'
|
||||
now = int(time.time() * 1000)
|
||||
rand = randint(0, 1000)
|
||||
cookie = {
|
||||
'osver': None,
|
||||
'deviceId': None,
|
||||
'appver': '8.0.0',
|
||||
'versioncode': '140',
|
||||
'mobilename': None,
|
||||
'buildver': '1623435496',
|
||||
'resolution': '1920x1080',
|
||||
'__csrf': '',
|
||||
'os': 'pc',
|
||||
'channel': None,
|
||||
'requestId': '{0}_{1:04}'.format(now, rand),
|
||||
}
|
||||
request_text = json.dumps(
|
||||
{'ids': '[{0}]'.format(song_id), 'br': bitrate, 'header': cookie},
|
||||
separators=(',', ':'))
|
||||
message = 'nobody{0}use{1}md5forencrypt'.format(
|
||||
URL, request_text).encode('latin1')
|
||||
msg_digest = md5(message).hexdigest()
|
||||
|
||||
data = '{0}-36cd479b6b5-{1}-36cd479b6b5-{2}'.format(
|
||||
URL, request_text, msg_digest)
|
||||
data = pkcs7_padding(bytes_to_intlist(data))
|
||||
encrypted = intlist_to_bytes(aes_ecb_encrypt(data, bytes_to_intlist(KEY)))
|
||||
encrypted_params = hexlify(encrypted).decode('ascii').upper()
|
||||
|
||||
cookie = '; '.join(
|
||||
['{0}={1}'.format(k, v if v is not None else 'undefined')
|
||||
for [k, v] in cookie.items()])
|
||||
|
||||
headers = {
|
||||
'User-Agent': self.extractor.get_param('http_headers')['User-Agent'],
|
||||
'Content-Type': 'application/x-www-form-urlencoded',
|
||||
'Referer': 'https://music.163.com',
|
||||
'Cookie': cookie,
|
||||
}
|
||||
return ('params={0}'.format(encrypted_params), headers)
|
||||
|
||||
def _call_player_api(self, song_id, bitrate):
|
||||
url = 'https://interface3.music.163.com/eapi/song/enhance/player/url'
|
||||
data, headers = self.make_player_api_request_data_and_headers(song_id, bitrate)
|
||||
try:
|
||||
msg = 'empty result'
|
||||
result = self._download_json(
|
||||
url, song_id, data=data.encode('ascii'), headers=headers)
|
||||
if result:
|
||||
return result
|
||||
except ExtractorError as e:
|
||||
if type(e.cause) in (ValueError, TypeError):
|
||||
# JSON load failure
|
||||
raise
|
||||
except Exception as e:
|
||||
msg = error_to_compat_str(e)
|
||||
self.report_warning('%s API call (%s) failed: %s' % (
|
||||
song_id, bitrate, msg))
|
||||
return {}
|
||||
|
||||
def extract_formats(self, info):
|
||||
err = 0
|
||||
formats = []
|
||||
song_id = info['id']
|
||||
for song_format in self._FORMATS:
|
||||
details = info.get(song_format)
|
||||
if not details:
|
||||
continue
|
||||
song_file_path = '/%s/%s.%s' % (
|
||||
self._encrypt(details['dfsId']), details['dfsId'], details['extension'])
|
||||
|
||||
# 203.130.59.9, 124.40.233.182, 115.231.74.139, etc is a reverse proxy-like feature
|
||||
# from NetEase's CDN provider that can be used if m5.music.126.net does not
|
||||
# work, especially for users outside of Mainland China
|
||||
# via: https://github.com/JixunMoe/unblock-163/issues/3#issuecomment-163115880
|
||||
for host in ('http://m5.music.126.net', 'http://115.231.74.139/m1.music.126.net',
|
||||
'http://124.40.233.182/m1.music.126.net', 'http://203.130.59.9/m1.music.126.net'):
|
||||
song_url = host + song_file_path
|
||||
bitrate = int_or_none(details.get('bitrate')) or 999000
|
||||
data = self._call_player_api(song_id, bitrate)
|
||||
for song in try_get(data, lambda x: x['data'], list) or []:
|
||||
song_url = try_get(song, lambda x: x['url'])
|
||||
if not song_url:
|
||||
continue
|
||||
if self._is_valid_url(song_url, info['id'], 'song'):
|
||||
formats.append({
|
||||
'url': song_url,
|
||||
'ext': details.get('extension'),
|
||||
'abr': float_or_none(details.get('bitrate'), scale=1000),
|
||||
'abr': float_or_none(song.get('br'), scale=1000),
|
||||
'format_id': song_format,
|
||||
'filesize': details.get('size'),
|
||||
'asr': details.get('sr')
|
||||
'filesize': int_or_none(song.get('size')),
|
||||
'asr': int_or_none(details.get('sr')),
|
||||
})
|
||||
break
|
||||
elif err == 0:
|
||||
err = try_get(song, lambda x: x['code'], int)
|
||||
|
||||
if not formats:
|
||||
msg = 'No media links found'
|
||||
if err != 0 and (err < 200 or err >= 400):
|
||||
raise ExtractorError(
|
||||
'%s (site code %d)' % (msg, err, ), expected=True)
|
||||
else:
|
||||
self.raise_geo_restricted(
|
||||
msg + ': probably this video is not available from your location due to geo restriction.',
|
||||
countries=['CN'])
|
||||
|
||||
return formats
|
||||
|
||||
@classmethod
|
||||
@ -67,33 +154,19 @@ def query_api(self, endpoint, video_id, note):
|
||||
class NetEaseMusicIE(NetEaseMusicBaseIE):
|
||||
IE_NAME = 'netease:song'
|
||||
IE_DESC = '网易云音乐'
|
||||
_VALID_URL = r'https?://music\.163\.com/(#/)?song\?id=(?P<id>[0-9]+)'
|
||||
_VALID_URL = r'https?://(y\.)?music\.163\.com/(?:[#m]/)?song\?.*?\bid=(?P<id>[0-9]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://music.163.com/#/song?id=32102397',
|
||||
'md5': 'f2e97280e6345c74ba9d5677dd5dcb45',
|
||||
'md5': '3e909614ce09b1ccef4a3eb205441190',
|
||||
'info_dict': {
|
||||
'id': '32102397',
|
||||
'ext': 'mp3',
|
||||
'title': 'Bad Blood (feat. Kendrick Lamar)',
|
||||
'title': 'Bad Blood',
|
||||
'creator': 'Taylor Swift / Kendrick Lamar',
|
||||
'upload_date': '20150517',
|
||||
'timestamp': 1431878400,
|
||||
'description': 'md5:a10a54589c2860300d02e1de821eb2ef',
|
||||
'upload_date': '20150516',
|
||||
'timestamp': 1431792000,
|
||||
'description': 'md5:25fc5f27e47aad975aa6d36382c7833c',
|
||||
},
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}, {
|
||||
'note': 'No lyrics translation.',
|
||||
'url': 'http://music.163.com/#/song?id=29822014',
|
||||
'info_dict': {
|
||||
'id': '29822014',
|
||||
'ext': 'mp3',
|
||||
'title': '听见下雨的声音',
|
||||
'creator': '周杰伦',
|
||||
'upload_date': '20141225',
|
||||
'timestamp': 1419523200,
|
||||
'description': 'md5:a4d8d89f44656af206b7b2555c0bce6c',
|
||||
},
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}, {
|
||||
'note': 'No lyrics.',
|
||||
'url': 'http://music.163.com/song?id=17241424',
|
||||
@ -103,9 +176,9 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
|
||||
'title': 'Opus 28',
|
||||
'creator': 'Dustin O\'Halloran',
|
||||
'upload_date': '20080211',
|
||||
'description': 'md5:f12945b0f6e0365e3b73c5032e1b0ff4',
|
||||
'timestamp': 1202745600,
|
||||
},
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}, {
|
||||
'note': 'Has translated name.',
|
||||
'url': 'http://music.163.com/#/song?id=22735043',
|
||||
@ -119,7 +192,18 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
|
||||
'timestamp': 1264608000,
|
||||
'alt_title': '说出愿望吧(Genie)',
|
||||
},
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}, {
|
||||
'url': 'https://y.music.163.com/m/song?app_version=8.8.45&id=95670&uct2=sKnvS4+0YStsWkqsPhFijw%3D%3D&dlt=0846',
|
||||
'md5': '95826c73ea50b1c288b22180ec9e754d',
|
||||
'info_dict': {
|
||||
'id': '95670',
|
||||
'ext': 'mp3',
|
||||
'title': '国际歌',
|
||||
'creator': '马备',
|
||||
'upload_date': '19911130',
|
||||
'timestamp': 691516800,
|
||||
'description': 'md5:1ba2f911a2b0aa398479f595224f2141',
|
||||
},
|
||||
}]
|
||||
|
||||
def _process_lyrics(self, lyrics_info):
|
||||
|
@ -58,8 +58,7 @@ def _call_api(self, path, video_id, item=None, note=None, fatal=True, query=None
|
||||
return self._download_json(
|
||||
urljoin('https://psapi.nrk.no/', path),
|
||||
video_id, note or 'Downloading %s JSON' % item,
|
||||
fatal=fatal, query=query,
|
||||
headers={'Accept-Encoding': 'gzip, deflate, br'})
|
||||
fatal=fatal, query=query)
|
||||
|
||||
|
||||
class NRKIE(NRKBaseIE):
|
||||
|
47
yt_dlp/extractor/qingting.py
Normal file
47
yt_dlp/extractor/qingting.py
Normal file
@ -0,0 +1,47 @@
|
||||
from .common import InfoExtractor
|
||||
|
||||
from ..utils import traverse_obj
|
||||
|
||||
|
||||
class QingTingIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.|m\.)?(?:qingting\.fm|qtfm\.cn)/v?channels/(?P<channel>\d+)/programs/(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.qingting.fm/channels/378005/programs/22257411/',
|
||||
'md5': '47e6a94f4e621ed832c316fd1888fb3c',
|
||||
'info_dict': {
|
||||
'id': '22257411',
|
||||
'title': '用了十年才修改,谁在乎教科书?',
|
||||
'channel_id': '378005',
|
||||
'channel': '睡前消息',
|
||||
'uploader': '马督工',
|
||||
'ext': 'm4a',
|
||||
}
|
||||
}, {
|
||||
'url': 'https://m.qtfm.cn/vchannels/378005/programs/23023573/',
|
||||
'md5': '2703120b6abe63b5fa90b975a58f4c0e',
|
||||
'info_dict': {
|
||||
'id': '23023573',
|
||||
'title': '【睡前消息488】重庆山火之后,有图≠真相',
|
||||
'channel_id': '378005',
|
||||
'channel': '睡前消息',
|
||||
'uploader': '马督工',
|
||||
'ext': 'm4a',
|
||||
}
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
channel_id, pid = self._match_valid_url(url).group('channel', 'id')
|
||||
webpage = self._download_webpage(
|
||||
f'https://m.qtfm.cn/vchannels/{channel_id}/programs/{pid}/', pid)
|
||||
info = self._search_json(r'window\.__initStores\s*=', webpage, 'program info', pid)
|
||||
return {
|
||||
'id': pid,
|
||||
'title': traverse_obj(info, ('ProgramStore', 'programInfo', 'title')),
|
||||
'channel_id': channel_id,
|
||||
'channel': traverse_obj(info, ('ProgramStore', 'channelInfo', 'title')),
|
||||
'uploader': traverse_obj(info, ('ProgramStore', 'podcasterInfo', 'podcaster', 'nickname')),
|
||||
'url': traverse_obj(info, ('ProgramStore', 'programInfo', 'audioUrl')),
|
||||
'vcodec': 'none',
|
||||
'acodec': 'm4a',
|
||||
'ext': 'm4a',
|
||||
}
|
@ -1,4 +1,5 @@
|
||||
import functools
|
||||
import urllib
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_parse_qs
|
||||
@ -72,14 +73,20 @@ def _fetch_oauth_token(self, video_id):
|
||||
self._API_HEADERS['authorization'] = f'Bearer {auth["token"]}'
|
||||
|
||||
def _call_api(self, ep, video_id, *args, **kwargs):
|
||||
for attempt in range(2):
|
||||
if 'authorization' not in self._API_HEADERS:
|
||||
self._fetch_oauth_token(video_id)
|
||||
assert 'authorization' in self._API_HEADERS
|
||||
|
||||
try:
|
||||
headers = dict(self._API_HEADERS)
|
||||
headers['x-customheader'] = f'https://www.redgifs.com/watch/{video_id}'
|
||||
data = self._download_json(
|
||||
f'https://api.redgifs.com/v2/{ep}', video_id, headers=headers, *args, **kwargs)
|
||||
break
|
||||
except ExtractorError as e:
|
||||
if not attempt and isinstance(e.cause, urllib.error.HTTPError) and e.cause.code == 401:
|
||||
del self._API_HEADERS['authorization'] # refresh the token
|
||||
raise
|
||||
|
||||
if 'error' in data:
|
||||
raise ExtractorError(f'RedGifs said: {data["error"]}', expected=True, video_id=video_id)
|
||||
return data
|
||||
|
@ -25,7 +25,6 @@ class SkyItPlayerIE(InfoExtractor):
|
||||
'salesforce': 'C6D585FD1615272C98DE38235F38BD86',
|
||||
'sitocommerciale': 'VJwfFuSGnLKnd9Phe9y96WkXgYDCguPMJ2dLhGMb2RE',
|
||||
'sky': 'F96WlOd8yoFmLQgiqv6fNQRvHZcsWk5jDaYnDvhbiJk',
|
||||
'skyacademy': 'A6LAn7EkO2Q26FRy0IAMBekX6jzDXYL3',
|
||||
'skyarte': 'LWk29hfiU39NNdq87ePeRach3nzTSV20o0lTv2001Cd',
|
||||
'theupfront': 'PRSGmDMsg6QMGc04Obpoy7Vsbn7i2Whp',
|
||||
}
|
||||
@ -42,11 +41,7 @@ def _parse_video(self, video, video_id):
|
||||
if not hls_url and video.get('geoblock' if is_live else 'geob'):
|
||||
self.raise_geo_restricted(countries=['IT'])
|
||||
|
||||
if is_live:
|
||||
formats = self._extract_m3u8_formats(hls_url, video_id, 'mp4')
|
||||
else:
|
||||
formats = self._extract_akamai_formats(
|
||||
hls_url, video_id, {'http': 'videoplatform.sky.it'})
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
@ -80,14 +75,17 @@ class SkyItVideoIE(SkyItPlayerIE):
|
||||
_VALID_URL = r'https?://(?:masterchef|video|xfactor)\.sky\.it(?:/[^/]+)*/video/[0-9a-z-]+-(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://video.sky.it/news/mondo/video/uomo-ucciso-da-uno-squalo-in-australia-631227',
|
||||
'md5': 'fe5c91e59a84a3437eaa0bca6e134ccd',
|
||||
'md5': '5b858a62d9ffe2ab77b397553024184a',
|
||||
'info_dict': {
|
||||
'id': '631227',
|
||||
'ext': 'mp4',
|
||||
'title': 'Uomo ucciso da uno squalo in Australia',
|
||||
'timestamp': 1606036192,
|
||||
'upload_date': '20201122',
|
||||
}
|
||||
'duration': 26,
|
||||
'thumbnail': 'https://video.sky.it/captures/thumbs/631227/631227_thumb_880x494.jpg',
|
||||
},
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
}, {
|
||||
'url': 'https://xfactor.sky.it/video/x-factor-2020-replay-audizioni-1-615820',
|
||||
'only_matching': True,
|
||||
@ -110,7 +108,8 @@ class SkyItVideoLiveIE(SkyItPlayerIE):
|
||||
'id': '1',
|
||||
'ext': 'mp4',
|
||||
'title': r're:Diretta TG24 \d{4}-\d{2}-\d{2} \d{2}:\d{2}',
|
||||
'description': 'Guarda la diretta streaming di SkyTg24, segui con Sky tutti gli appuntamenti e gli speciali di Tg24.',
|
||||
'description': r're:(?:Clicca play e )?[Gg]uarda la diretta streaming di SkyTg24, segui con Sky tutti gli appuntamenti e gli speciali di Tg24\.',
|
||||
'live_status': 'is_live',
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
@ -132,15 +131,17 @@ class SkyItIE(SkyItPlayerIE):
|
||||
IE_NAME = 'sky.it'
|
||||
_VALID_URL = r'https?://(?:sport|tg24)\.sky\.it(?:/[^/]+)*/\d{4}/\d{2}/\d{2}/(?P<id>[^/?&#]+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://sport.sky.it/calcio/serie-a/2020/11/21/juventus-cagliari-risultato-gol',
|
||||
'url': 'https://sport.sky.it/calcio/serie-a/2022/11/03/brozovic-inter-news',
|
||||
'info_dict': {
|
||||
'id': '631201',
|
||||
'id': '789222',
|
||||
'ext': 'mp4',
|
||||
'title': 'Un rosso alla violenza: in campo per i diritti delle donne',
|
||||
'upload_date': '20201121',
|
||||
'timestamp': 1605995753,
|
||||
'title': 'Brozovic con il gruppo: verso convocazione per Juve-Inter',
|
||||
'upload_date': '20221103',
|
||||
'timestamp': 1667484130,
|
||||
'duration': 22,
|
||||
'thumbnail': 'https://videoplatform.sky.it/still/2022/11/03/1667480526353_brozovic_videostill_1.jpg',
|
||||
},
|
||||
'expected_warnings': ['Unable to download f4m manifest'],
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
}, {
|
||||
'url': 'https://tg24.sky.it/mondo/2020/11/22/australia-squalo-uccide-uomo',
|
||||
'md5': 'fe5c91e59a84a3437eaa0bca6e134ccd',
|
||||
@ -150,7 +151,10 @@ class SkyItIE(SkyItPlayerIE):
|
||||
'title': 'Uomo ucciso da uno squalo in Australia',
|
||||
'timestamp': 1606036192,
|
||||
'upload_date': '20201122',
|
||||
'duration': 26,
|
||||
'thumbnail': 'https://video.sky.it/captures/thumbs/631227/631227_thumb_880x494.jpg',
|
||||
},
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
}]
|
||||
_VIDEO_ID_REGEX = r'data-videoid="(\d+)"'
|
||||
|
||||
@ -162,40 +166,25 @@ def _real_extract(self, url):
|
||||
return self._player_url_result(video_id)
|
||||
|
||||
|
||||
class SkyItAcademyIE(SkyItIE):
|
||||
IE_NAME = 'skyacademy.it'
|
||||
_VALID_URL = r'https?://(?:www\.)?skyacademy\.it(?:/[^/]+)*/\d{4}/\d{2}/\d{2}/(?P<id>[^/?&#]+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.skyacademy.it/eventi-speciali/2019/07/05/a-lezione-di-cinema-con-sky-academy-/',
|
||||
'md5': 'ced5c26638b7863190cbc44dd6f6ba08',
|
||||
'info_dict': {
|
||||
'id': '523458',
|
||||
'ext': 'mp4',
|
||||
'title': 'Sky Academy "The Best CineCamp 2019"',
|
||||
'timestamp': 1562843784,
|
||||
'upload_date': '20190711',
|
||||
}
|
||||
}]
|
||||
_DOMAIN = 'skyacademy'
|
||||
_VIDEO_ID_REGEX = r'id="news-videoId_(\d+)"'
|
||||
|
||||
|
||||
class SkyItArteIE(SkyItIE):
|
||||
IE_NAME = 'arte.sky.it'
|
||||
_VALID_URL = r'https?://arte\.sky\.it/video/(?P<id>[^/?&#]+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://arte.sky.it/video/serie-musei-venezia-collezionismo-12-novembre/',
|
||||
'url': 'https://arte.sky.it/video/oliviero-toscani-torino-galleria-mazzoleni-788962',
|
||||
'md5': '515aee97b87d7a018b6c80727d3e7e17',
|
||||
'info_dict': {
|
||||
'id': '627926',
|
||||
'id': '788962',
|
||||
'ext': 'mp4',
|
||||
'title': "Musei Galleria Franchetti alla Ca' d'Oro Palazzo Grimani",
|
||||
'upload_date': '20201106',
|
||||
'timestamp': 1604664493,
|
||||
}
|
||||
'title': 'La fotografia di Oliviero Toscani conquista Torino',
|
||||
'upload_date': '20221102',
|
||||
'timestamp': 1667399996,
|
||||
'duration': 12,
|
||||
'thumbnail': 'https://videoplatform.sky.it/still/2022/11/02/1667396388552_oliviero-toscani-torino-galleria-mazzoleni_videostill_1.jpg',
|
||||
},
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
}]
|
||||
_DOMAIN = 'skyarte'
|
||||
_VIDEO_ID_REGEX = r'(?s)<iframe[^>]+src="(?:https:)?//player\.sky\.it/player/external\.html\?[^"]*\bid=(\d+)'
|
||||
_VIDEO_ID_REGEX = r'"embedUrl"\s*:\s*"(?:https:)?//player\.sky\.it/player/external\.html\?[^"]*\bid=(\d+)'
|
||||
|
||||
|
||||
class CieloTVItIE(SkyItIE):
|
||||
@ -210,7 +199,10 @@ class CieloTVItIE(SkyItIE):
|
||||
'title': 'Il lunedì è sempre un dramma',
|
||||
'upload_date': '20190329',
|
||||
'timestamp': 1553862178,
|
||||
}
|
||||
'duration': 30,
|
||||
'thumbnail': 'https://videoplatform.sky.it/still/2019/03/29/1553858575610_lunedi_dramma_mant_videostill_1.jpg',
|
||||
},
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
}]
|
||||
_DOMAIN = 'cielo'
|
||||
_VIDEO_ID_REGEX = r'videoId\s*=\s*"(\d+)"'
|
||||
@ -218,9 +210,9 @@ class CieloTVItIE(SkyItIE):
|
||||
|
||||
class TV8ItIE(SkyItVideoIE):
|
||||
IE_NAME = 'tv8.it'
|
||||
_VALID_URL = r'https?://tv8\.it/showvideo/(?P<id>\d+)'
|
||||
_VALID_URL = r'https?://(?:www\.)?tv8\.it/(?:show)?video/[0-9a-z-]+-(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://tv8.it/showvideo/630529/ogni-mattina-ucciso-asino-di-andrea-lo-cicero/18-11-2020/',
|
||||
'url': 'https://www.tv8.it/video/ogni-mattina-ucciso-asino-di-andrea-lo-cicero-630529',
|
||||
'md5': '9ab906a3f75ea342ed928442f9dabd21',
|
||||
'info_dict': {
|
||||
'id': '630529',
|
||||
@ -228,6 +220,9 @@ class TV8ItIE(SkyItVideoIE):
|
||||
'title': 'Ogni mattina - Ucciso asino di Andrea Lo Cicero',
|
||||
'timestamp': 1605721374,
|
||||
'upload_date': '20201118',
|
||||
}
|
||||
'duration': 114,
|
||||
'thumbnail': 'https://videoplatform.sky.it/still/2020/11/18/1605717753954_ogni-mattina-ucciso-asino-di-andrea-lo-cicero_videostill_1.jpg',
|
||||
},
|
||||
'params': {'skip_download': 'm3u8'},
|
||||
}]
|
||||
_DOMAIN = 'mtv8'
|
||||
|
@ -1,22 +1,15 @@
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_str,
|
||||
)
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
lowercase_escape,
|
||||
try_get,
|
||||
)
|
||||
from ..utils import ExtractorError, lowercase_escape, traverse_obj
|
||||
|
||||
|
||||
class StripchatIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://stripchat\.com/(?P<id>[^/?#]+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://stripchat.com/feel_me',
|
||||
'url': 'https://stripchat.com/Joselin_Flower',
|
||||
'info_dict': {
|
||||
'id': 'feel_me',
|
||||
'id': 'Joselin_Flower',
|
||||
'ext': 'mp4',
|
||||
'title': 're:^feel_me [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
|
||||
'title': 're:^Joselin_Flower [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
|
||||
'description': str,
|
||||
'is_live': True,
|
||||
'age_limit': 18,
|
||||
@ -39,18 +32,22 @@ def _real_extract(self, url):
|
||||
if not data:
|
||||
raise ExtractorError('Unable to find configuration for stream.')
|
||||
|
||||
if try_get(data, lambda x: x['viewCam']['show'], dict):
|
||||
if traverse_obj(data, ('viewCam', 'show'), expected_type=dict):
|
||||
raise ExtractorError('Model is in private show', expected=True)
|
||||
elif not try_get(data, lambda x: x['viewCam']['model']['isLive'], bool):
|
||||
elif not traverse_obj(data, ('viewCam', 'model', 'isLive'), expected_type=bool):
|
||||
raise ExtractorError('Model is offline', expected=True)
|
||||
|
||||
server = try_get(data, lambda x: x['viewCam']['viewServers']['flashphoner-hls'], compat_str)
|
||||
host = try_get(data, lambda x: x['config']['data']['hlsStreamHost'], compat_str)
|
||||
model_id = try_get(data, lambda x: x['viewCam']['model']['id'], int)
|
||||
server = traverse_obj(data, ('viewCam', 'viewServers', 'flashphoner-hls'), expected_type=str)
|
||||
model_id = traverse_obj(data, ('viewCam', 'model', 'id'), expected_type=int)
|
||||
|
||||
for host in traverse_obj(data, (
|
||||
'config', 'data', (('featuresV2', 'hlsFallback', 'fallbackDomains', ...), 'hlsStreamHost'))):
|
||||
formats = self._extract_m3u8_formats(
|
||||
'https://b-%s.%s/hls/%d/%d.m3u8' % (server, host, model_id, model_id),
|
||||
f'https://b-{server}.{host}/hls/{model_id}/{model_id}.m3u8',
|
||||
video_id, ext='mp4', m3u8_id='hls', fatal=False, live=True)
|
||||
if formats:
|
||||
break
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
|
73
yt_dlp/extractor/swearnet.py
Normal file
73
yt_dlp/extractor/swearnet.py
Normal file
@ -0,0 +1,73 @@
|
||||
from .common import InfoExtractor
|
||||
from ..utils import int_or_none, traverse_obj
|
||||
|
||||
|
||||
class SwearnetEpisodeIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://www\.swearnet\.com/shows/(?P<id>[\w-]+)/seasons/(?P<season_num>\d+)/episodes/(?P<episode_num>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.swearnet.com/shows/gettin-learnt-with-ricky/seasons/1/episodes/1',
|
||||
'info_dict': {
|
||||
'id': '232819',
|
||||
'ext': 'mp4',
|
||||
'episode_number': 1,
|
||||
'episode': 'Episode 1',
|
||||
'duration': 719,
|
||||
'description': 'md5:c48ef71440ce466284c07085cd7bd761',
|
||||
'season': 'Season 1',
|
||||
'title': 'Episode 1 - Grilled Cheese Sammich',
|
||||
'season_number': 1,
|
||||
'thumbnail': 'https://cdn.vidyard.com/thumbnails/232819/_RX04IKIq60a2V6rIRqq_Q_small.jpg',
|
||||
}
|
||||
}]
|
||||
|
||||
def _get_formats_and_subtitle(self, video_source, video_id):
|
||||
video_source = video_source or {}
|
||||
formats, subtitles = [], {}
|
||||
for key, value in video_source.items():
|
||||
if key == 'hls':
|
||||
for video_hls in value:
|
||||
fmts, subs = self._extract_m3u8_formats_and_subtitles(video_hls.get('url'), video_id)
|
||||
formats.extend(fmts)
|
||||
self._merge_subtitles(subs, target=subtitles)
|
||||
else:
|
||||
formats.extend({
|
||||
'url': video_mp4.get('url'),
|
||||
'ext': 'mp4'
|
||||
} for video_mp4 in value)
|
||||
|
||||
return formats, subtitles
|
||||
|
||||
def _get_direct_subtitle(self, caption_json):
|
||||
subs = {}
|
||||
for caption in caption_json:
|
||||
subs.setdefault(caption.get('language') or 'und', []).append({
|
||||
'url': caption.get('vttUrl'),
|
||||
'name': caption.get('name')
|
||||
})
|
||||
|
||||
return subs
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id, season_number, episode_number = self._match_valid_url(url).group('id', 'season_num', 'episode_num')
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
external_id = self._search_regex(r'externalid\s*=\s*"([^"]+)', webpage, 'externalid')
|
||||
json_data = self._download_json(
|
||||
f'https://play.vidyard.com/player/{external_id}.json', display_id)['payload']['chapters'][0]
|
||||
|
||||
formats, subtitles = self._get_formats_and_subtitle(json_data['sources'], display_id)
|
||||
self._merge_subtitles(self._get_direct_subtitle(json_data.get('captions')), target=subtitles)
|
||||
|
||||
return {
|
||||
'id': str(json_data['videoId']),
|
||||
'title': json_data.get('name') or self._html_search_meta(['og:title', 'twitter:title'], webpage),
|
||||
'description': (json_data.get('description')
|
||||
or self._html_search_meta(['og:description', 'twitter:description'])),
|
||||
'duration': int_or_none(json_data.get('seconds')),
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
'season_number': int_or_none(season_number),
|
||||
'episode_number': int_or_none(episode_number),
|
||||
'thumbnails': [{'url': thumbnail_url}
|
||||
for thumbnail_url in traverse_obj(json_data, ('thumbnailUrls', ...))]
|
||||
}
|
@ -1,41 +1,137 @@
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import clean_html, get_element_by_class
|
||||
from ..utils import (
|
||||
clean_html,
|
||||
format_field,
|
||||
get_element_by_class,
|
||||
parse_duration,
|
||||
parse_qs,
|
||||
traverse_obj,
|
||||
unified_timestamp,
|
||||
update_url_query,
|
||||
url_basename,
|
||||
)
|
||||
|
||||
|
||||
class TelegramEmbedIE(InfoExtractor):
|
||||
IE_NAME = 'telegram:embed'
|
||||
_VALID_URL = r'https?://t\.me/(?P<channel_name>[^/]+)/(?P<id>\d+)'
|
||||
_VALID_URL = r'https?://t\.me/(?P<channel_id>[^/]+)/(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://t.me/europa_press/613',
|
||||
'md5': 'dd707708aea958c11a590e8068825f22',
|
||||
'info_dict': {
|
||||
'id': '613',
|
||||
'ext': 'mp4',
|
||||
'title': 'Europa Press',
|
||||
'description': '6ce2d7e8d56eda16d80607b23db7b252',
|
||||
'thumbnail': r're:^https?:\/\/cdn.*?telesco\.pe\/file\/\w+',
|
||||
'title': 'md5:6ce2d7e8d56eda16d80607b23db7b252',
|
||||
'description': 'md5:6ce2d7e8d56eda16d80607b23db7b252',
|
||||
'channel_id': 'europa_press',
|
||||
'channel': 'Europa Press ✔',
|
||||
'thumbnail': r're:^https?://.+',
|
||||
'timestamp': 1635631203,
|
||||
'upload_date': '20211030',
|
||||
'duration': 61,
|
||||
},
|
||||
}, {
|
||||
# 2-video post
|
||||
'url': 'https://t.me/vorposte/29342',
|
||||
'info_dict': {
|
||||
'id': 'vorposte-29342',
|
||||
'title': 'Форпост 29342',
|
||||
'description': 'md5:9d92e22169a3e136d5d69df25f82c3dc',
|
||||
},
|
||||
'playlist_count': 2,
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
# 2-video post with --no-playlist
|
||||
'url': 'https://t.me/vorposte/29343',
|
||||
'md5': '1724e96053c18e788c8464038876e245',
|
||||
'info_dict': {
|
||||
'id': '29343',
|
||||
'ext': 'mp4',
|
||||
'title': 'md5:9d92e22169a3e136d5d69df25f82c3dc',
|
||||
'description': 'md5:9d92e22169a3e136d5d69df25f82c3dc',
|
||||
'channel_id': 'vorposte',
|
||||
'channel': 'Форпост',
|
||||
'thumbnail': r're:^https?://.+',
|
||||
'timestamp': 1666384480,
|
||||
'upload_date': '20221021',
|
||||
'duration': 35,
|
||||
},
|
||||
'params': {
|
||||
'noplaylist': True,
|
||||
}
|
||||
}, {
|
||||
# 2-video post with 'single' query param
|
||||
'url': 'https://t.me/vorposte/29342?single',
|
||||
'md5': 'd20b202f1e41400a9f43201428add18f',
|
||||
'info_dict': {
|
||||
'id': '29342',
|
||||
'ext': 'mp4',
|
||||
'title': 'md5:9d92e22169a3e136d5d69df25f82c3dc',
|
||||
'description': 'md5:9d92e22169a3e136d5d69df25f82c3dc',
|
||||
'channel_id': 'vorposte',
|
||||
'channel': 'Форпост',
|
||||
'thumbnail': r're:^https?://.+',
|
||||
'timestamp': 1666384480,
|
||||
'upload_date': '20221021',
|
||||
'duration': 33,
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id, query={'embed': 0})
|
||||
webpage_embed = self._download_webpage(url, video_id, query={'embed': 1}, note='Downloading ermbed page')
|
||||
channel_id, msg_id = self._match_valid_url(url).group('channel_id', 'id')
|
||||
embed = self._download_webpage(
|
||||
url, msg_id, query={'embed': '1', 'single': []}, note='Downloading embed frame')
|
||||
|
||||
def clean_text(html_class, html):
|
||||
text = clean_html(get_element_by_class(html_class, html))
|
||||
return text.replace('\n', ' ') if text else None
|
||||
|
||||
description = clean_text('tgme_widget_message_text', embed)
|
||||
message = {
|
||||
'title': description or '',
|
||||
'description': description,
|
||||
'channel': clean_text('tgme_widget_message_author', embed),
|
||||
'channel_id': channel_id,
|
||||
'timestamp': unified_timestamp(self._search_regex(
|
||||
r'<time[^>]*datetime="([^"]*)"', embed, 'timestamp', fatal=False)),
|
||||
}
|
||||
|
||||
videos = []
|
||||
for video in re.findall(r'<a class="tgme_widget_message_video_player(?s:.+?)</time>', embed):
|
||||
video_url = self._search_regex(
|
||||
r'<video[^>]+src="([^"]+)"', video, 'video URL', fatal=False)
|
||||
webpage_url = self._search_regex(
|
||||
r'<a class="tgme_widget_message_video_player[^>]+href="([^"]+)"',
|
||||
video, 'webpage URL', fatal=False)
|
||||
if not video_url or not webpage_url:
|
||||
continue
|
||||
formats = [{
|
||||
'url': self._proto_relative_url(self._search_regex(
|
||||
'<video[^>]+src="([^"]+)"', webpage_embed, 'source')),
|
||||
'url': video_url,
|
||||
'ext': 'mp4',
|
||||
}]
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': self._html_search_meta(['og:title', 'twitter:title'], webpage, default=None),
|
||||
'description': self._html_search_meta(
|
||||
['og:description', 'twitter:description'], webpage,
|
||||
default=clean_html(get_element_by_class('tgme_widget_message_text', webpage_embed))),
|
||||
videos.append({
|
||||
'id': url_basename(webpage_url),
|
||||
'webpage_url': update_url_query(webpage_url, {'single': True}),
|
||||
'duration': parse_duration(self._search_regex(
|
||||
r'<time[^>]+duration[^>]*>([\d:]+)</time>', video, 'duration', fatal=False)),
|
||||
'thumbnail': self._search_regex(
|
||||
r'tgme_widget_message_video_thumb"[^>]+background-image:url\(\'([^\']+)\'\)',
|
||||
webpage_embed, 'thumbnail'),
|
||||
video, 'thumbnail', fatal=False),
|
||||
'formats': formats,
|
||||
}
|
||||
**message,
|
||||
})
|
||||
|
||||
playlist_id = None
|
||||
if len(videos) > 1 and 'single' not in parse_qs(url, keep_blank_values=True):
|
||||
playlist_id = f'{channel_id}-{msg_id}'
|
||||
|
||||
if self._yes_playlist(playlist_id, msg_id):
|
||||
return self.playlist_result(
|
||||
videos, playlist_id, format_field(message, 'channel', f'%s {msg_id}'), description)
|
||||
else:
|
||||
return traverse_obj(videos, lambda _, x: x['id'] == msg_id, get_all=False)
|
||||
|
@ -4,40 +4,51 @@
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
clean_html,
|
||||
determine_ext,
|
||||
dict_get,
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
orderedSet,
|
||||
str_or_none,
|
||||
strip_or_none,
|
||||
traverse_obj,
|
||||
try_get,
|
||||
url_or_none,
|
||||
)
|
||||
|
||||
|
||||
class TVPIE(InfoExtractor):
|
||||
IE_NAME = 'tvp'
|
||||
IE_DESC = 'Telewizja Polska'
|
||||
_VALID_URL = r'https?://(?:[^/]+\.)?(?:tvp(?:parlament)?\.(?:pl|info)|polandin\.com)/(?:video/(?:[^,\s]*,)*|(?:(?!\d+/)[^/]+/)*)(?P<id>\d+)'
|
||||
_VALID_URL = r'https?://(?:[^/]+\.)?(?:tvp(?:parlament)?\.(?:pl|info)|tvpworld\.com|swipeto\.pl)/(?:(?!\d+/)[^/]+/)*(?P<id>\d+)'
|
||||
|
||||
_TESTS = [{
|
||||
# TVPlayer 2 in js wrapper
|
||||
'url': 'https://vod.tvp.pl/video/czas-honoru,i-seria-odc-13,194536',
|
||||
'url': 'https://swipeto.pl/64095316/uliczny-foxtrot-wypozyczalnia-kaset-kto-pamieta-dvdvideo',
|
||||
'info_dict': {
|
||||
'id': '194536',
|
||||
'id': '64095316',
|
||||
'ext': 'mp4',
|
||||
'title': 'Czas honoru, odc. 13 – Władek',
|
||||
'description': 'md5:437f48b93558370b031740546b696e24',
|
||||
'age_limit': 12,
|
||||
'title': 'Uliczny Foxtrot — Wypożyczalnia kaset. Kto pamięta DVD-Video?',
|
||||
'age_limit': 0,
|
||||
'duration': 374,
|
||||
'thumbnail': r're:https://.+',
|
||||
},
|
||||
'expected_warnings': [
|
||||
'Failed to download ISM manifest: HTTP Error 404: Not Found',
|
||||
'Failed to download m3u8 information: HTTP Error 404: Not Found',
|
||||
],
|
||||
}, {
|
||||
# TVPlayer legacy
|
||||
'url': 'http://www.tvp.pl/there-can-be-anything-so-i-shortened-it/17916176',
|
||||
'url': 'https://www.tvp.pl/polska-press-video-uploader/wideo/62042351',
|
||||
'info_dict': {
|
||||
'id': '17916176',
|
||||
'id': '62042351',
|
||||
'ext': 'mp4',
|
||||
'title': 'TVP Gorzów pokaże filmy studentów z podroży dookoła świata',
|
||||
'description': 'TVP Gorzów pokaże filmy studentów z podroży dookoła świata',
|
||||
'title': 'Wideo',
|
||||
'description': 'Wideo Kamera',
|
||||
'duration': 24,
|
||||
'age_limit': 0,
|
||||
'thumbnail': r're:https://.+',
|
||||
},
|
||||
}, {
|
||||
# TVPlayer 2 in iframe
|
||||
@ -48,6 +59,8 @@ class TVPIE(InfoExtractor):
|
||||
'title': 'Dzieci na sprzedaż dla homoseksualistów',
|
||||
'description': 'md5:7d318eef04e55ddd9f87a8488ac7d590',
|
||||
'age_limit': 12,
|
||||
'duration': 259,
|
||||
'thumbnail': r're:https://.+',
|
||||
},
|
||||
}, {
|
||||
# TVPlayer 2 in client-side rendered website (regional; window.__newsData)
|
||||
@ -58,7 +71,11 @@ class TVPIE(InfoExtractor):
|
||||
'title': 'Studio Yayo',
|
||||
'upload_date': '20160616',
|
||||
'timestamp': 1466075700,
|
||||
}
|
||||
'age_limit': 0,
|
||||
'duration': 20,
|
||||
'thumbnail': r're:https://.+',
|
||||
},
|
||||
'skip': 'Geo-blocked outside PL',
|
||||
}, {
|
||||
# TVPlayer 2 in client-side rendered website (tvp.info; window.__videoData)
|
||||
'url': 'https://www.tvp.info/52880236/09042021-0800',
|
||||
@ -66,7 +83,10 @@ class TVPIE(InfoExtractor):
|
||||
'id': '52880236',
|
||||
'ext': 'mp4',
|
||||
'title': '09.04.2021, 08:00',
|
||||
'age_limit': 0,
|
||||
'thumbnail': r're:https://.+',
|
||||
},
|
||||
'skip': 'Geo-blocked outside PL',
|
||||
}, {
|
||||
# client-side rendered (regional) program (playlist) page
|
||||
'url': 'https://opole.tvp.pl/9660819/rozmowa-dnia',
|
||||
@ -122,7 +142,7 @@ class TVPIE(InfoExtractor):
|
||||
'url': 'https://www.tvpparlament.pl/retransmisje-vod/inne/wizyta-premiera-mateusza-morawieckiego-w-firmie-berotu-sp-z-oo/48857277',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://polandin.com/47942651/pln-10-billion-in-subsidies-transferred-to-companies-pm',
|
||||
'url': 'https://tvpworld.com/48583640/tescos-polish-business-bought-by-danish-chain-netto',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@ -151,16 +171,13 @@ def _extract_vue_video(self, video_data, page_id=None):
|
||||
is_website = video_data.get('type') == 'website'
|
||||
if is_website:
|
||||
url = video_data['url']
|
||||
fucked_up_url_parts = re.match(r'https?://vod\.tvp\.pl/(\d+)/([^/?#]+)', url)
|
||||
if fucked_up_url_parts:
|
||||
url = f'https://vod.tvp.pl/website/{fucked_up_url_parts.group(2)},{fucked_up_url_parts.group(1)}'
|
||||
else:
|
||||
url = 'tvp:' + str_or_none(video_data.get('_id') or page_id)
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'id': str_or_none(video_data.get('_id') or page_id),
|
||||
'url': url,
|
||||
'ie_key': 'TVPEmbed' if not is_website else 'TVPWebsite',
|
||||
'ie_key': (TVPIE if is_website else TVPEmbedIE).ie_key(),
|
||||
'title': str_or_none(video_data.get('title')),
|
||||
'description': str_or_none(video_data.get('lead')),
|
||||
'timestamp': int_or_none(video_data.get('release_date_long')),
|
||||
@ -217,8 +234,9 @@ def _real_extract(self, url):
|
||||
|
||||
# The URL may redirect to a VOD
|
||||
# example: https://vod.tvp.pl/48463890/wadowickie-spotkania-z-janem-pawlem-ii
|
||||
if TVPWebsiteIE.suitable(urlh.url):
|
||||
return self.url_result(urlh.url, ie=TVPWebsiteIE.ie_key(), video_id=page_id)
|
||||
for ie_cls in (TVPVODSeriesIE, TVPVODVideoIE):
|
||||
if ie_cls.suitable(urlh.url):
|
||||
return self.url_result(urlh.url, ie=ie_cls.ie_key(), video_id=page_id)
|
||||
|
||||
if re.search(
|
||||
r'window\.__(?:video|news|website|directory)Data\s*=',
|
||||
@ -297,12 +315,13 @@ def _real_extract(self, url):
|
||||
class TVPEmbedIE(InfoExtractor):
|
||||
IE_NAME = 'tvp:embed'
|
||||
IE_DESC = 'Telewizja Polska'
|
||||
_GEO_BYPASS = False
|
||||
_VALID_URL = r'''(?x)
|
||||
(?:
|
||||
tvp:
|
||||
|https?://
|
||||
(?:[^/]+\.)?
|
||||
(?:tvp(?:parlament)?\.pl|tvp\.info|polandin\.com)/
|
||||
(?:tvp(?:parlament)?\.pl|tvp\.info|tvpworld\.com|swipeto\.pl)/
|
||||
(?:sess/
|
||||
(?:tvplayer\.php\?.*?object_id
|
||||
|TVPlayer2/(?:embed|api)\.php\?.*[Ii][Dd])
|
||||
@ -320,6 +339,12 @@ class TVPEmbedIE(InfoExtractor):
|
||||
'title': 'Czas honoru, odc. 13 – Władek',
|
||||
'description': 'md5:76649d2014f65c99477be17f23a4dead',
|
||||
'age_limit': 12,
|
||||
'duration': 2652,
|
||||
'series': 'Czas honoru',
|
||||
'episode': 'Episode 13',
|
||||
'episode_number': 13,
|
||||
'season': 'sezon 1',
|
||||
'thumbnail': r're:https://.+',
|
||||
},
|
||||
}, {
|
||||
'url': 'https://www.tvp.pl/sess/tvplayer.php?object_id=51247504&autoplay=false',
|
||||
@ -327,6 +352,9 @@ class TVPEmbedIE(InfoExtractor):
|
||||
'id': '51247504',
|
||||
'ext': 'mp4',
|
||||
'title': 'Razmova 091220',
|
||||
'duration': 876,
|
||||
'age_limit': 0,
|
||||
'thumbnail': r're:https://.+',
|
||||
},
|
||||
}, {
|
||||
# TVPlayer2 embed URL
|
||||
@ -361,40 +389,48 @@ def _real_extract(self, url):
|
||||
# stripping JSONP padding
|
||||
datastr = webpage[15 + len(callback):-3]
|
||||
if datastr.startswith('null,'):
|
||||
error = self._parse_json(datastr[5:], video_id)
|
||||
raise ExtractorError(error[0]['desc'])
|
||||
error = self._parse_json(datastr[5:], video_id, fatal=False)
|
||||
error_desc = traverse_obj(error, (0, 'desc'))
|
||||
|
||||
if error_desc == 'Obiekt wymaga płatności':
|
||||
raise ExtractorError('Video requires payment and log-in, but log-in is not implemented')
|
||||
|
||||
raise ExtractorError(error_desc or 'unexpected JSON error')
|
||||
|
||||
content = self._parse_json(datastr, video_id)['content']
|
||||
info = content['info']
|
||||
is_live = try_get(info, lambda x: x['isLive'], bool)
|
||||
|
||||
if info.get('isGeoBlocked'):
|
||||
# actual country list is not provided, we just assume it's always available in PL
|
||||
self.raise_geo_restricted(countries=['PL'])
|
||||
|
||||
formats = []
|
||||
for file in content['files']:
|
||||
video_url = file.get('url')
|
||||
video_url = url_or_none(file.get('url'))
|
||||
if not video_url:
|
||||
continue
|
||||
if video_url.endswith('.m3u8'):
|
||||
ext = determine_ext(video_url, None)
|
||||
if ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(video_url, video_id, m3u8_id='hls', fatal=False, live=is_live))
|
||||
elif video_url.endswith('.mpd'):
|
||||
elif ext == 'mpd':
|
||||
if is_live:
|
||||
# doesn't work with either ffmpeg or native downloader
|
||||
continue
|
||||
formats.extend(self._extract_mpd_formats(video_url, video_id, mpd_id='dash', fatal=False))
|
||||
elif video_url.endswith('.f4m'):
|
||||
elif ext == 'f4m':
|
||||
formats.extend(self._extract_f4m_formats(video_url, video_id, f4m_id='hds', fatal=False))
|
||||
elif video_url.endswith('.ism/manifest'):
|
||||
formats.extend(self._extract_ism_formats(video_url, video_id, ism_id='mss', fatal=False))
|
||||
else:
|
||||
# mp4, wmv or something
|
||||
quality = file.get('quality', {})
|
||||
formats.append({
|
||||
'format_id': 'direct',
|
||||
'url': video_url,
|
||||
'ext': determine_ext(video_url, file['type']),
|
||||
'fps': int_or_none(quality.get('fps')),
|
||||
'tbr': int_or_none(quality.get('bitrate')),
|
||||
'width': int_or_none(quality.get('width')),
|
||||
'height': int_or_none(quality.get('height')),
|
||||
'ext': ext or file.get('type'),
|
||||
'fps': int_or_none(traverse_obj(file, ('quality', 'fps'))),
|
||||
'tbr': int_or_none(traverse_obj(file, ('quality', 'bitrate')), scale=1000),
|
||||
'width': int_or_none(traverse_obj(file, ('quality', 'width'))),
|
||||
'height': int_or_none(traverse_obj(file, ('quality', 'height'))),
|
||||
})
|
||||
|
||||
self._sort_formats(formats)
|
||||
@ -449,57 +485,105 @@ def _real_extract(self, url):
|
||||
return info_dict
|
||||
|
||||
|
||||
class TVPWebsiteIE(InfoExtractor):
|
||||
IE_NAME = 'tvp:series'
|
||||
_VALID_URL = r'https?://vod\.tvp\.pl/website/(?P<display_id>[^,]+),(?P<id>\d+)'
|
||||
class TVPVODBaseIE(InfoExtractor):
|
||||
_API_BASE_URL = 'https://vod.tvp.pl/api/products'
|
||||
|
||||
def _call_api(self, resource, video_id, **kwargs):
|
||||
return self._download_json(
|
||||
f'{self._API_BASE_URL}/{resource}', video_id,
|
||||
query={'lang': 'pl', 'platform': 'BROWSER'}, **kwargs)
|
||||
|
||||
def _parse_video(self, video):
|
||||
return {
|
||||
'_type': 'url',
|
||||
'url': 'tvp:' + video['externalUid'],
|
||||
'ie_key': TVPEmbedIE.ie_key(),
|
||||
'title': video.get('title'),
|
||||
'description': traverse_obj(video, ('lead', 'description')),
|
||||
'age_limit': int_or_none(video.get('rating')),
|
||||
'duration': int_or_none(video.get('duration')),
|
||||
}
|
||||
|
||||
|
||||
class TVPVODVideoIE(TVPVODBaseIE):
|
||||
IE_NAME = 'tvp:vod'
|
||||
_VALID_URL = r'https?://vod\.tvp\.pl/[a-z\d-]+,\d+/[a-z\d-]+(?<!-odcinki)(?:-odcinki,\d+/odcinek-\d+,S\d+E\d+)?,(?P<id>\d+)(?:\?[^#]+)?(?:#.+)?$'
|
||||
|
||||
_TESTS = [{
|
||||
# series
|
||||
'url': 'https://vod.tvp.pl/website/wspaniale-stulecie,17069012/video',
|
||||
'url': 'https://vod.tvp.pl/dla-dzieci,24/laboratorium-alchemika-odcinki,309338/odcinek-24,S01E24,311357',
|
||||
'info_dict': {
|
||||
'id': '17069012',
|
||||
},
|
||||
'playlist_count': 312,
|
||||
}, {
|
||||
# film
|
||||
'url': 'https://vod.tvp.pl/website/krzysztof-krawczyk-cale-moje-zycie,51374466',
|
||||
'info_dict': {
|
||||
'id': '51374509',
|
||||
'id': '60468609',
|
||||
'ext': 'mp4',
|
||||
'title': 'Krzysztof Krawczyk – całe moje życie, Krzysztof Krawczyk – całe moje życie',
|
||||
'description': 'md5:2e80823f00f5fc263555482f76f8fa42',
|
||||
'age_limit': 12,
|
||||
'title': 'Laboratorium alchemika, Tusze termiczne. Jak zobaczyć niewidoczne. Odcinek 24',
|
||||
'description': 'md5:1d4098d3e537092ccbac1abf49b7cd4c',
|
||||
'duration': 300,
|
||||
'episode_number': 24,
|
||||
'episode': 'Episode 24',
|
||||
'age_limit': 0,
|
||||
'series': 'Laboratorium alchemika',
|
||||
'thumbnail': 're:https://.+',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'add_ie': ['TVPEmbed'],
|
||||
}, {
|
||||
'url': 'https://vod.tvp.pl/website/lzy-cennet,38678312',
|
||||
'url': 'https://vod.tvp.pl/filmy-dokumentalne,163/ukrainski-sluga-narodu,339667',
|
||||
'info_dict': {
|
||||
'id': '51640077',
|
||||
'ext': 'mp4',
|
||||
'title': 'Ukraiński sługa narodu, Ukraiński sługa narodu',
|
||||
'series': 'Ukraiński sługa narodu',
|
||||
'description': 'md5:b7940c0a8e439b0c81653a986f544ef3',
|
||||
'age_limit': 12,
|
||||
'episode': 'Episode 0',
|
||||
'episode_number': 0,
|
||||
'duration': 3051,
|
||||
'thumbnail': 're:https://.+',
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
return self._parse_video(self._call_api(f'vods/{video_id}', video_id))
|
||||
|
||||
|
||||
class TVPVODSeriesIE(TVPVODBaseIE):
|
||||
IE_NAME = 'tvp:vod:series'
|
||||
_VALID_URL = r'https?://vod\.tvp\.pl/[a-z\d-]+,\d+/[a-z\d-]+-odcinki,(?P<id>\d+)(?:\?[^#]+)?(?:#.+)?$'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'https://vod.tvp.pl/seriale,18/ranczo-odcinki,316445',
|
||||
'info_dict': {
|
||||
'id': '316445',
|
||||
'title': 'Ranczo',
|
||||
'age_limit': 12,
|
||||
'categories': ['seriale'],
|
||||
},
|
||||
'playlist_count': 129,
|
||||
}, {
|
||||
'url': 'https://vod.tvp.pl/programy,88/rolnik-szuka-zony-odcinki,284514',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://vod.tvp.pl/dla-dzieci,24/laboratorium-alchemika-odcinki,309338',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _entries(self, display_id, playlist_id):
|
||||
url = 'https://vod.tvp.pl/website/%s,%s/video' % (display_id, playlist_id)
|
||||
for page_num in itertools.count(1):
|
||||
page = self._download_webpage(
|
||||
url, display_id, 'Downloading page %d' % page_num,
|
||||
query={'page': page_num})
|
||||
|
||||
video_ids = orderedSet(re.findall(
|
||||
r'<a[^>]+\bhref=["\']/video/%s,[^,]+,(\d+)' % display_id,
|
||||
page))
|
||||
|
||||
if not video_ids:
|
||||
break
|
||||
|
||||
for video_id in video_ids:
|
||||
yield self.url_result(
|
||||
'tvp:%s' % video_id, ie=TVPEmbedIE.ie_key(),
|
||||
video_id=video_id)
|
||||
def _entries(self, seasons, playlist_id):
|
||||
for season in seasons:
|
||||
episodes = self._call_api(
|
||||
f'vods/serials/{playlist_id}/seasons/{season["id"]}/episodes', playlist_id,
|
||||
note=f'Downloading episode list for {season["title"]}')
|
||||
yield from map(self._parse_video, episodes)
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = self._match_valid_url(url)
|
||||
display_id, playlist_id = mobj.group('display_id', 'id')
|
||||
playlist_id = self._match_id(url)
|
||||
metadata = self._call_api(
|
||||
f'vods/serials/{playlist_id}', playlist_id,
|
||||
note='Downloading serial metadata')
|
||||
seasons = self._call_api(
|
||||
f'vods/serials/{playlist_id}/seasons', playlist_id,
|
||||
note='Downloading season list')
|
||||
return self.playlist_result(
|
||||
self._entries(display_id, playlist_id), playlist_id)
|
||||
self._entries(seasons, playlist_id), playlist_id, strip_or_none(metadata.get('title')),
|
||||
clean_html(traverse_obj(metadata, ('description', 'lead'), expected_type=strip_or_none)),
|
||||
categories=[traverse_obj(metadata, ('mainCategory', 'name'))],
|
||||
age_limit=int_or_none(metadata.get('rating')),
|
||||
)
|
||||
|
@ -870,7 +870,7 @@ def _real_extract(self, url):
|
||||
|
||||
if '://player.vimeo.com/video/' in url:
|
||||
config = self._parse_json(self._search_regex(
|
||||
r'\bconfig\s*=\s*({.+?})\s*;', webpage, 'info section'), video_id)
|
||||
r'\b(?:playerC|c)onfig\s*=\s*({.+?})\s*;', webpage, 'info section'), video_id)
|
||||
if config.get('view') == 4:
|
||||
config = self._verify_player_video_password(
|
||||
redirect_url, video_id, headers)
|
||||
|
@ -13,6 +13,7 @@
|
||||
merge_dicts,
|
||||
str_or_none,
|
||||
strip_or_none,
|
||||
traverse_obj,
|
||||
try_get,
|
||||
urlencode_postdata,
|
||||
url_or_none,
|
||||
@ -81,6 +82,13 @@ class VLiveIE(VLiveBaseIE):
|
||||
'upload_date': '20150817',
|
||||
'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
|
||||
'timestamp': 1439816449,
|
||||
'like_count': int,
|
||||
'channel': 'Girl\'s Day',
|
||||
'channel_id': 'FDF27',
|
||||
'comment_count': int,
|
||||
'release_timestamp': 1439818140,
|
||||
'release_date': '20150817',
|
||||
'duration': 1014,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
@ -98,6 +106,13 @@ class VLiveIE(VLiveBaseIE):
|
||||
'upload_date': '20161112',
|
||||
'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
|
||||
'timestamp': 1478923074,
|
||||
'like_count': int,
|
||||
'channel': 'EXO',
|
||||
'channel_id': 'F94BD',
|
||||
'comment_count': int,
|
||||
'release_timestamp': 1478924280,
|
||||
'release_date': '20161112',
|
||||
'duration': 906,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
@ -169,6 +184,7 @@ def get_common_fields():
|
||||
'like_count': int_or_none(video.get('likeCount')),
|
||||
'comment_count': int_or_none(video.get('commentCount')),
|
||||
'timestamp': int_or_none(video.get('createdAt'), scale=1000),
|
||||
'release_timestamp': int_or_none(traverse_obj(video, 'onAirStartAt', 'willStartAt'), scale=1000),
|
||||
'thumbnail': video.get('thumb'),
|
||||
}
|
||||
|
||||
|
@ -255,7 +255,7 @@ class ZenYandexIE(InfoExtractor):
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
redirect = self._search_json(r'var it\s*=\s*', webpage, 'redirect', id, default={}).get('retpath')
|
||||
redirect = self._search_json(r'var it\s*=', webpage, 'redirect', id, default={}).get('retpath')
|
||||
if redirect:
|
||||
video_id = self._match_id(redirect)
|
||||
webpage = self._download_webpage(redirect, video_id, note='Redirecting')
|
||||
@ -373,7 +373,7 @@ def _real_extract(self, url):
|
||||
item_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, item_id)
|
||||
redirect = self._search_json(
|
||||
r'var it\s*=\s*', webpage, 'redirect', item_id, default={}).get('retpath')
|
||||
r'var it\s*=', webpage, 'redirect', item_id, default={}).get('retpath')
|
||||
if redirect:
|
||||
item_id = self._match_id(redirect)
|
||||
webpage = self._download_webpage(redirect, item_id, note='Redirecting')
|
||||
|
@ -369,14 +369,24 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
|
||||
r'(?:www\.)?hpniueoejy4opn7bc4ftgazyqjoeqwlvh2uiku2xqku6zpoa4bf5ruid\.onion',
|
||||
# piped instances from https://github.com/TeamPiped/Piped/wiki/Instances
|
||||
r'(?:www\.)?piped\.kavin\.rocks',
|
||||
r'(?:www\.)?piped\.silkky\.cloud',
|
||||
r'(?:www\.)?piped\.tokhmi\.xyz',
|
||||
r'(?:www\.)?piped\.moomoo\.me',
|
||||
r'(?:www\.)?il\.ax',
|
||||
r'(?:www\.)?piped\.syncpundit\.com',
|
||||
r'(?:www\.)?piped\.syncpundit\.io',
|
||||
r'(?:www\.)?piped\.mha\.fi',
|
||||
r'(?:www\.)?watch\.whatever\.social',
|
||||
r'(?:www\.)?piped\.garudalinux\.org',
|
||||
r'(?:www\.)?piped\.rivo\.lol',
|
||||
r'(?:www\.)?piped-libre\.kavin\.rocks',
|
||||
r'(?:www\.)?yt\.jae\.fi',
|
||||
r'(?:www\.)?piped\.mint\.lgbt',
|
||||
r'(?:www\.)?piped\.privacy\.com\.de',
|
||||
r'(?:www\.)?il\.ax',
|
||||
r'(?:www\.)?piped\.esmailelbob\.xyz',
|
||||
r'(?:www\.)?piped\.projectsegfau\.lt',
|
||||
r'(?:www\.)?piped\.privacydev\.net',
|
||||
r'(?:www\.)?piped\.palveluntarjoaja\.eu',
|
||||
r'(?:www\.)?piped\.smnz\.de',
|
||||
r'(?:www\.)?piped\.adminforge\.de',
|
||||
r'(?:www\.)?watch\.whatevertinfoil\.de',
|
||||
r'(?:www\.)?piped\.qdi\.fi',
|
||||
)
|
||||
|
||||
# extracted from account/account_menu ep
|
||||
|
@ -3,13 +3,14 @@
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
NO_DEFAULT,
|
||||
ExtractorError,
|
||||
determine_ext,
|
||||
extract_attributes,
|
||||
float_or_none,
|
||||
int_or_none,
|
||||
join_nonempty,
|
||||
merge_dicts,
|
||||
NO_DEFAULT,
|
||||
orderedSet,
|
||||
parse_codecs,
|
||||
qualities,
|
||||
traverse_obj,
|
||||
@ -188,7 +189,7 @@ class ZDFIE(ZDFBaseIE):
|
||||
},
|
||||
}, {
|
||||
'url': 'https://www.zdf.de/funk/druck-11790/funk-alles-ist-verzaubert-102.html',
|
||||
'md5': '57af4423db0455a3975d2dc4578536bc',
|
||||
'md5': '1b93bdec7d02fc0b703c5e7687461628',
|
||||
'info_dict': {
|
||||
'ext': 'mp4',
|
||||
'id': 'video_funk_1770473',
|
||||
@ -250,17 +251,15 @@ def _extract_entry(self, url, player, content, video_id):
|
||||
title = content.get('title') or content['teaserHeadline']
|
||||
|
||||
t = content['mainVideoContent']['http://zdf.de/rels/target']
|
||||
|
||||
ptmd_path = t.get('http://zdf.de/rels/streams/ptmd')
|
||||
|
||||
ptmd_path = traverse_obj(t, (
|
||||
(('streams', 'default'), None),
|
||||
('http://zdf.de/rels/streams/ptmd', 'http://zdf.de/rels/streams/ptmd-template')
|
||||
), get_all=False)
|
||||
if not ptmd_path:
|
||||
ptmd_path = traverse_obj(
|
||||
t, ('streams', 'default', 'http://zdf.de/rels/streams/ptmd-template'),
|
||||
'http://zdf.de/rels/streams/ptmd-template').replace(
|
||||
'{playerId}', 'ngplayer_2_4')
|
||||
raise ExtractorError('Could not extract ptmd_path')
|
||||
|
||||
info = self._extract_ptmd(
|
||||
urljoin(url, ptmd_path), video_id, player['apiToken'], url)
|
||||
urljoin(url, ptmd_path.replace('{playerId}', 'ngplayer_2_4')), video_id, player['apiToken'], url)
|
||||
|
||||
thumbnails = []
|
||||
layouts = try_get(
|
||||
@ -309,14 +308,15 @@ def _extract_mobile(self, video_id):
|
||||
'https://zdf-cdn.live.cellular.de/mediathekV2/document/%s' % video_id,
|
||||
video_id)
|
||||
|
||||
document = video['document']
|
||||
|
||||
formats = []
|
||||
formitaeten = try_get(video, lambda x: x['document']['formitaeten'], list)
|
||||
document = formitaeten and video['document']
|
||||
if formitaeten:
|
||||
title = document['titel']
|
||||
content_id = document['basename']
|
||||
|
||||
formats = []
|
||||
format_urls = set()
|
||||
for f in document['formitaeten']:
|
||||
for f in formitaeten or []:
|
||||
self._extract_format(content_id, formats, format_urls, f)
|
||||
self._sort_formats(formats)
|
||||
|
||||
@ -364,9 +364,9 @@ class ZDFChannelIE(ZDFBaseIE):
|
||||
'url': 'https://www.zdf.de/sport/das-aktuelle-sportstudio',
|
||||
'info_dict': {
|
||||
'id': 'das-aktuelle-sportstudio',
|
||||
'title': 'das aktuelle sportstudio | ZDF',
|
||||
'title': 'das aktuelle sportstudio',
|
||||
},
|
||||
'playlist_mincount': 23,
|
||||
'playlist_mincount': 18,
|
||||
}, {
|
||||
'url': 'https://www.zdf.de/dokumentation/planet-e',
|
||||
'info_dict': {
|
||||
@ -374,6 +374,14 @@ class ZDFChannelIE(ZDFBaseIE):
|
||||
'title': 'planet e.',
|
||||
},
|
||||
'playlist_mincount': 50,
|
||||
}, {
|
||||
'url': 'https://www.zdf.de/gesellschaft/aktenzeichen-xy-ungeloest',
|
||||
'info_dict': {
|
||||
'id': 'aktenzeichen-xy-ungeloest',
|
||||
'title': 'Aktenzeichen XY... ungelöst',
|
||||
'entries': "lambda x: not any('xy580-fall1-kindermoerder-gesucht-100' in e['url'] for e in x)",
|
||||
},
|
||||
'playlist_mincount': 2,
|
||||
}, {
|
||||
'url': 'https://www.zdf.de/filme/taunuskrimi/',
|
||||
'only_matching': True,
|
||||
@ -383,60 +391,36 @@ class ZDFChannelIE(ZDFBaseIE):
|
||||
def suitable(cls, url):
|
||||
return False if ZDFIE.suitable(url) else super(ZDFChannelIE, cls).suitable(url)
|
||||
|
||||
def _og_search_title(self, webpage, fatal=False):
|
||||
title = super(ZDFChannelIE, self)._og_search_title(webpage, fatal=fatal)
|
||||
return re.split(r'\s+[-|]\s+ZDF(?:mediathek)?$', title or '')[0] or None
|
||||
|
||||
def _real_extract(self, url):
|
||||
channel_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, channel_id)
|
||||
|
||||
entries = [
|
||||
self.url_result(item_url, ie=ZDFIE.ie_key())
|
||||
for item_url in orderedSet(re.findall(
|
||||
r'data-plusbar-url=["\'](http.+?\.html)', webpage))]
|
||||
matches = re.finditer(
|
||||
r'''<div\b[^>]*?\sdata-plusbar-id\s*=\s*(["'])(?P<p_id>[\w-]+)\1[^>]*?\sdata-plusbar-url=\1(?P<url>%s)\1''' % ZDFIE._VALID_URL,
|
||||
webpage)
|
||||
|
||||
return self.playlist_result(
|
||||
entries, channel_id, self._og_search_title(webpage, fatal=False))
|
||||
if self._downloader.params.get('noplaylist', False):
|
||||
entry = next(
|
||||
(self.url_result(m.group('url'), ie=ZDFIE.ie_key()) for m in matches),
|
||||
None)
|
||||
self.to_screen('Downloading just the main video because of --no-playlist')
|
||||
if entry:
|
||||
return entry
|
||||
else:
|
||||
self.to_screen('Downloading playlist %s - add --no-playlist to download just the main video' % (channel_id, ))
|
||||
|
||||
r"""
|
||||
player = self._extract_player(webpage, channel_id)
|
||||
def check_video(m):
|
||||
v_ref = self._search_regex(
|
||||
r'''(<a\b[^>]*?\shref\s*=[^>]+?\sdata-target-id\s*=\s*(["'])%s\2[^>]*>)''' % (m.group('p_id'), ),
|
||||
webpage, 'check id', default='')
|
||||
v_ref = extract_attributes(v_ref)
|
||||
return v_ref.get('data-target-video-type') != 'novideo'
|
||||
|
||||
channel_id = self._search_regex(
|
||||
r'docId\s*:\s*(["\'])(?P<id>(?!\1).+?)\1', webpage,
|
||||
'channel id', group='id')
|
||||
|
||||
channel = self._call_api(
|
||||
'https://api.zdf.de/content/documents/%s.json' % channel_id,
|
||||
player, url, channel_id)
|
||||
|
||||
items = []
|
||||
for module in channel['module']:
|
||||
for teaser in try_get(module, lambda x: x['teaser'], list) or []:
|
||||
t = try_get(
|
||||
teaser, lambda x: x['http://zdf.de/rels/target'], dict)
|
||||
if not t:
|
||||
continue
|
||||
items.extend(try_get(
|
||||
t,
|
||||
lambda x: x['resultsWithVideo']['http://zdf.de/rels/search/results'],
|
||||
list) or [])
|
||||
items.extend(try_get(
|
||||
module,
|
||||
lambda x: x['filterRef']['resultsWithVideo']['http://zdf.de/rels/search/results'],
|
||||
list) or [])
|
||||
|
||||
entries = []
|
||||
entry_urls = set()
|
||||
for item in items:
|
||||
t = try_get(item, lambda x: x['http://zdf.de/rels/target'], dict)
|
||||
if not t:
|
||||
continue
|
||||
sharing_url = t.get('http://zdf.de/rels/sharing-url')
|
||||
if not sharing_url or not isinstance(sharing_url, compat_str):
|
||||
continue
|
||||
if sharing_url in entry_urls:
|
||||
continue
|
||||
entry_urls.add(sharing_url)
|
||||
entries.append(self.url_result(
|
||||
sharing_url, ie=ZDFIE.ie_key(), video_id=t.get('id')))
|
||||
|
||||
return self.playlist_result(entries, channel_id, channel.get('title'))
|
||||
"""
|
||||
return self.playlist_from_matches(
|
||||
(m.group('url') for m in matches if check_video(m)),
|
||||
channel_id, self._og_search_title(webpage, fatal=False))
|
||||
|
@ -294,9 +294,10 @@ def _create_alias(option, opt_str, value, parser):
|
||||
|
||||
aliases = (x if x.startswith('-') else f'--{x}' for x in map(str.strip, aliases.split(',')))
|
||||
try:
|
||||
args = [f'ARG{i}' for i in range(nargs)]
|
||||
alias_group.add_option(
|
||||
*aliases, help=opts, nargs=nargs, dest=parser.ALIAS_DEST, type='str' if nargs else None,
|
||||
metavar=' '.join(f'ARG{i}' for i in range(nargs)), action='callback',
|
||||
*aliases, nargs=nargs, dest=parser.ALIAS_DEST, type='str' if nargs else None,
|
||||
metavar=' '.join(args), help=opts.format(*args), action='callback',
|
||||
callback=_alias_callback, callback_kwargs={'opts': opts, 'nargs': nargs})
|
||||
except Exception as err:
|
||||
raise optparse.OptionValueError(f'wrong {opt_str} formatting; {err}')
|
||||
@ -549,11 +550,11 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
|
||||
selection.add_option(
|
||||
'--min-filesize',
|
||||
metavar='SIZE', dest='min_filesize', default=None,
|
||||
help='Do not download any videos smaller than SIZE, e.g. 50k or 44.6M')
|
||||
help='Abort download if filesize is smaller than SIZE, e.g. 50k or 44.6M')
|
||||
selection.add_option(
|
||||
'--max-filesize',
|
||||
metavar='SIZE', dest='max_filesize', default=None,
|
||||
help='Do not download any videos larger than SIZE, e.g. 50k or 44.6M')
|
||||
help='Abort download if filesize if larger than SIZE, e.g. 50k or 44.6M')
|
||||
selection.add_option(
|
||||
'--date',
|
||||
metavar='DATE', dest='date', default=None,
|
||||
|
@ -174,6 +174,7 @@ def release_hash(self):
|
||||
|
||||
def _report_error(self, msg, expected=False):
|
||||
self.ydl.report_error(msg, tb=False if expected else None)
|
||||
self.ydl._download_retcode = 100
|
||||
|
||||
def _report_permission_error(self, file):
|
||||
self._report_error(f'Unable to write to {file}; Try running as administrator', True)
|
||||
|
@ -480,6 +480,7 @@ def handle_endtag(self, tag):
|
||||
raise self.HTMLBreakOnClosingTagException()
|
||||
|
||||
|
||||
# XXX: This should be far less strict
|
||||
def get_element_text_and_html_by_tag(tag, html):
|
||||
"""
|
||||
For the first element with the specified tag in the passed HTML document
|
||||
@ -524,6 +525,7 @@ def __init__(self):
|
||||
|
||||
def handle_starttag(self, tag, attrs):
|
||||
self.attrs = dict(attrs)
|
||||
raise compat_HTMLParseError('done')
|
||||
|
||||
|
||||
class HTMLListAttrsParser(html.parser.HTMLParser):
|
||||
@ -684,7 +686,8 @@ def replace_insane(char):
|
||||
return '\0_'
|
||||
return char
|
||||
|
||||
if restricted and is_id is NO_DEFAULT:
|
||||
# Replace look-alike Unicode glyphs
|
||||
if restricted and (is_id is NO_DEFAULT or not is_id):
|
||||
s = unicodedata.normalize('NFKC', s)
|
||||
s = re.sub(r'[0-9]+(?::[0-9]+)+', lambda m: m.group(0).replace(':', '_'), s) # Handle timestamps
|
||||
result = ''.join(map(replace_insane, s))
|
||||
@ -985,6 +988,25 @@ def make_HTTPS_handler(params, **kwargs):
|
||||
context.options |= 4 # SSL_OP_LEGACY_SERVER_CONNECT
|
||||
# Allow use of weaker ciphers in Python 3.10+. See https://bugs.python.org/issue43998
|
||||
context.set_ciphers('DEFAULT')
|
||||
elif (
|
||||
sys.version_info < (3, 10)
|
||||
and ssl.OPENSSL_VERSION_INFO >= (1, 1, 1)
|
||||
and not ssl.OPENSSL_VERSION.startswith('LibreSSL')
|
||||
):
|
||||
# Backport the default SSL ciphers and minimum TLS version settings from Python 3.10 [1].
|
||||
# This is to ensure consistent behavior across Python versions, and help avoid fingerprinting
|
||||
# in some situations [2][3].
|
||||
# Python 3.10 only supports OpenSSL 1.1.1+ [4]. Because this change is likely
|
||||
# untested on older versions, we only apply this to OpenSSL 1.1.1+ to be safe.
|
||||
# LibreSSL is excluded until further investigation due to cipher support issues [5][6].
|
||||
# 1. https://github.com/python/cpython/commit/e983252b516edb15d4338b0a47631b59ef1e2536
|
||||
# 2. https://github.com/yt-dlp/yt-dlp/issues/4627
|
||||
# 3. https://github.com/yt-dlp/yt-dlp/pull/5294
|
||||
# 4. https://peps.python.org/pep-0644/
|
||||
# 5. https://peps.python.org/pep-0644/#libressl-support
|
||||
# 6. https://github.com/yt-dlp/yt-dlp/commit/5b9f253fa0aee996cf1ed30185d4b502e00609c4#commitcomment-89054368
|
||||
context.set_ciphers('@SECLEVEL=2:ECDH+AESGCM:ECDH+CHACHA20:ECDH+AES:DHE+AES:!aNULL:!eNULL:!aDSS:!SHA1:!AESCCM')
|
||||
context.minimum_version = ssl.TLSVersion.TLSv1_2
|
||||
|
||||
context.verify_mode = ssl.CERT_REQUIRED if opts_check_certificate else ssl.CERT_NONE
|
||||
if opts_check_certificate:
|
||||
@ -1982,12 +2004,13 @@ def system_identifier():
|
||||
with contextlib.suppress(OSError): # We may not have access to the executable
|
||||
libc_ver = platform.libc_ver()
|
||||
|
||||
return 'Python %s (%s %s) - %s %s' % (
|
||||
return 'Python %s (%s %s) - %s (%s%s)' % (
|
||||
platform.python_version(),
|
||||
python_implementation,
|
||||
platform.architecture()[0],
|
||||
platform.platform(),
|
||||
format_field(join_nonempty(*libc_ver, delim=' '), None, '(%s)'),
|
||||
ssl.OPENSSL_VERSION,
|
||||
format_field(join_nonempty(*libc_ver, delim=' '), None, ', %s'),
|
||||
)
|
||||
|
||||
|
||||
@ -3078,8 +3101,8 @@ def escape_url(url):
|
||||
).geturl()
|
||||
|
||||
|
||||
def parse_qs(url):
|
||||
return urllib.parse.parse_qs(urllib.parse.urlparse(url).query)
|
||||
def parse_qs(url, **kwargs):
|
||||
return urllib.parse.parse_qs(urllib.parse.urlparse(url).query, **kwargs)
|
||||
|
||||
|
||||
def read_batch_urls(batch_fd):
|
||||
|
Loading…
Reference in New Issue
Block a user