mirror of
https://github.com/nexus-stc/hyperboria
synced 2025-02-04 06:37:32 +01:00
a683e0ce18
- [nexus] Development GitOrigin-RevId: 5d5feedff7b70be4c788abeb22f89c6758431d33
Nexus Pylon
Pylon
is a downloader for scientific publications.
- Look articles by DOI, MD5 or IPFS hashes
- Validates downloaded items
- Streams data by chunks
- GRPC-ready
Build
bazel build -c opt nexus-pylon-wheel
Install
PIP
pip install nexus-pylon
Nexus Pylon CLI
Download scientific publication:
pylon download --doi 10.1182/blood-2011-03-325258 --output article.pdf
Download file by its MD5:
pylon download --md5 f07707ee92fa675fd4ee53e3fee977d1 --output article.pdf
Download file by its multihash:
pylon download --ipfs-multihashes '["bafykbzacea3vduqii3u52xkzdqan5oc54vsvedmed25dfybrqxyafahjl3rzu"]' --output article.pdf
Using with Selenium
Create directory for exchaning files between host and launched Selenium in Docker
mkdir downloads
Launch Selenium in Docker
docker run -e SE_START_XVFB=false -v $(pwd)/downloads:/downloads -p 4444:4444 selenium/standalone-chrome:latest
Launch Pylon
pylon download --doi 10.1101/2022.09.09.507349 --output article.pdf \
--wd-endpoint 'http://127.0.0.1:4444/wd/hub' \
--wd-directory /downloads --wd-host-directory $(pwd)/downloads --debug