hyperboria/nexus/pylon
the-superpirate a683e0ce18 - [nexus] Development
- [nexus] Development

GitOrigin-RevId: 5d5feedff7b70be4c788abeb22f89c6758431d33
2022-09-13 17:28:58 +03:00
..
configs - [nexus] Development 2022-09-13 17:28:58 +03:00
drivers - [nexus] Development 2022-09-13 17:28:58 +03:00
pdftools - [nexus] Development 2022-09-13 17:28:58 +03:00
proto - Send Pylon to golden 2021-01-08 23:09:19 +03:00
resolvers - [nexus] Development 2022-09-13 17:28:58 +03:00
validators - [nexus] Development 2022-09-13 17:28:58 +03:00
__init__.py - [nexus] Update schema 2022-09-02 19:15:47 +03:00
BUILD.bazel - [nexus] Development 2022-09-13 17:28:58 +03:00
cli.py - [nexus] Development 2022-09-13 17:28:58 +03:00
client.py - [nexus] Development 2022-09-13 17:28:58 +03:00
consts.py - [nexus] Update schema 2022-09-02 19:15:47 +03:00
exceptions.py - Send Pylon to golden 2021-01-08 23:09:19 +03:00
matcher.py - [nexus] Development 2022-09-13 17:28:58 +03:00
network_agent.py - [nexus] Development 2022-09-13 17:28:58 +03:00
prepared_request.py - [nexus] Development 2022-09-13 17:28:58 +03:00
proxy_manager.py - [nexus] Development 2022-09-13 17:28:58 +03:00
README.md - [nexus] Development 2022-09-13 17:28:58 +03:00
source.py - [nexus] Development 2022-09-13 17:28:58 +03:00

Nexus Pylon

Pylon is a downloader for scientific publications.

  • Look articles by DOI, MD5 or IPFS hashes
  • Validates downloaded items
  • Streams data by chunks
  • GRPC-ready

Build

bazel build -c opt nexus-pylon-wheel

Install

PIP

pip install nexus-pylon

Nexus Pylon CLI

Download scientific publication:

pylon download --doi 10.1182/blood-2011-03-325258 --output article.pdf

Download file by its MD5:

pylon download --md5 f07707ee92fa675fd4ee53e3fee977d1 --output article.pdf

Download file by its multihash:

pylon download --ipfs-multihashes '["bafykbzacea3vduqii3u52xkzdqan5oc54vsvedmed25dfybrqxyafahjl3rzu"]' --output article.pdf

Using with Selenium

Create directory for exchaning files between host and launched Selenium in Docker

mkdir downloads

Launch Selenium in Docker

docker run -e SE_START_XVFB=false -v $(pwd)/downloads:/downloads -p 4444:4444 selenium/standalone-chrome:latest

Launch Pylon

pylon download --doi 10.1101/2022.09.09.507349 --output article.pdf \
--wd-endpoint 'http://127.0.0.1:4444/wd/hub' \
--wd-directory /downloads --wd-host-directory $(pwd)/downloads --debug