Generic: use compat_urllib_parse_unquote to prevent utf8 mangling

of the entire page in python 2. -requires- fixed compat_urllib_parse_unquote example - the following will save with a mangled playlist title, instead of the kanji for 'tsunami'. This affects all utf8encoded urls as well youtube-dl -f18 -o '%(playlist_title)s-%(title)s.%(ext)s' \ 61c14c1e3a/tsunami.html
2024-07-22 06:41:01 +02:00 · 2015-07-15 15:30:47 -05:00 · 2015-07-15 15:30:47 -05:00 · 45eedbe58c
commit 45eedbe58c
parent e37c932fca
1 changed files with 1 additions and 1 deletions
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@ -1115,7 +1115,7 @@ def _real_extract(self, url):
        # Sometimes embedded video player is hidden behind percent encoding
        # (e.g. https://github.com/rg3/youtube-dl/issues/2448)
        # Unescaping the whole page allows to handle those cases in a generic way
-        webpage = compat_urllib_parse.unquote(webpage)
+        webpage = compat_urllib_parse_unquote(webpage)

        # it's tempting to parse this further, but you would
        # have to take into account all the variations like