Generic: use compat_urllib_parse_unquote to prevent utf8 mangling

of the entire page in python 2.

-requires- fixed compat_urllib_parse_unquote

example - the following will save with a mangled playlist title,
 instead of the kanji for 'tsunami'. This affects all utf8encoded
 urls as well

youtube-dl -f18 -o '%(playlist_title)s-%(title)s.%(ext)s' \
  61c14c1e3a/tsunami.html
This commit is contained in:
fnord 2015-07-15 15:30:47 -05:00
parent e37c932fca
commit 45eedbe58c

View File

@ -1115,7 +1115,7 @@ def _real_extract(self, url):
# Sometimes embedded video player is hidden behind percent encoding
# (e.g. https://github.com/rg3/youtube-dl/issues/2448)
# Unescaping the whole page allows to handle those cases in a generic way
webpage = compat_urllib_parse.unquote(webpage)
webpage = compat_urllib_parse_unquote(webpage)
# it's tempting to parse this further, but you would
# have to take into account all the variations like