mirror of
https://github.com/nexus-stc/hyperboria
synced 2024-06-15 01:20:01 +02:00
a683e0ce18
- [nexus] Development GitOrigin-RevId: 5d5feedff7b70be4c788abeb22f89c6758431d33 |
||
---|---|---|
.. | ||
configs | ||
drivers | ||
pdftools | ||
proto | ||
resolvers | ||
validators | ||
__init__.py | ||
BUILD.bazel | ||
cli.py | ||
client.py | ||
consts.py | ||
exceptions.py | ||
matcher.py | ||
network_agent.py | ||
prepared_request.py | ||
proxy_manager.py | ||
README.md | ||
source.py |
Nexus Pylon
Pylon
is a downloader for scientific publications.
- Look articles by DOI, MD5 or IPFS hashes
- Validates downloaded items
- Streams data by chunks
- GRPC-ready
Build
bazel build -c opt nexus-pylon-wheel
Install
PIP
pip install nexus-pylon
Nexus Pylon CLI
Download scientific publication:
pylon download --doi 10.1182/blood-2011-03-325258 --output article.pdf
Download file by its MD5:
pylon download --md5 f07707ee92fa675fd4ee53e3fee977d1 --output article.pdf
Download file by its multihash:
pylon download --ipfs-multihashes '["bafykbzacea3vduqii3u52xkzdqan5oc54vsvedmed25dfybrqxyafahjl3rzu"]' --output article.pdf
Using with Selenium
Create directory for exchaning files between host and launched Selenium in Docker
mkdir downloads
Launch Selenium in Docker
docker run -e SE_START_XVFB=false -v $(pwd)/downloads:/downloads -p 4444:4444 selenium/standalone-chrome:latest
Launch Pylon
pylon download --doi 10.1101/2022.09.09.507349 --output article.pdf \
--wd-endpoint 'http://127.0.0.1:4444/wd/hub' \
--wd-directory /downloads --wd-host-directory $(pwd)/downloads --debug