We need to keep the orginal subtitles information, so that the '--load-info' option can be used to list or select the subtitles again.
We'll also be able to have a separate field for storing the automatic captions info.
For each language the extractor builds a list with the available formats sorted (like for video formats), then YoutubeDL selects one of them using the '--sub-format' option which now allows giving the format preferences (for example 'ass/srt/best').
For each format the 'url' field can be set so that we only download the contents if needed, or if the contents needs to be processed (like in crunchyroll) the 'data' field can be used.
The reasons for this change are:
* We weren't checking that the format given with '--sub-format' was available, checking it in each extractor would be repetitive.
* It allows to easily support giving a format preference.
* The subtitles were automatically downloaded in the extractor, but I think that if you use for example the '--dump-json' option you want to finish as fast as possible.
Currently only the ted extractor has been updated, but the old system still works.
If you run 'while read aurl ; do youtube-dl --extract-audio "${aurl}"; done < path_to_batch_file' (batch_file contains one url per line) each call to youtube-dl consumed some characters and 'read' would assing to 'aurl' a non valid url, something like 'tube.com/watch?v=<id>'.
On Python 2.x on Windows, if there are any unicode arguments in the command argument list, the whole list is converted to unicode internally.
Therefore, we need to call encodeArgument on every argument.
Fixes#4337 and #4668.
With avconv and older versions of ffmpeg the video is partially copied.
The duration difference between the audio and the video seem to be really small, so it's probably not noticeable.
Instead of having to configure PPs in code, this allows us and embedding programs not to worry about imports or finer details, similarly to how we handle IEs.
From now on, the line
from __future__ import unicode_literals
should be contained in every single Python file lest we run into any more 2.x/3.x issues.
Going forward, we're likely to develop on 3.x only and would likely miss subtle bugs otherwise.
utils is large enough without these compatibility functions.
Everything that is present in newer versions of Python (i.e. with dev Python it's just an import) goes into compat.py .
Everything else (i.e. youtube-dl-specific helpers) goes into utils.py .