hyperboria/nexus/ingest
the-superpirate d51e5ab65d - fix: Various fixes for release
- fix: Translation fixes
- fix: Various fixes
- feat: PB translations, configuration changes
- fix: Bugfixes

GitOrigin-RevId: 55f8b148c42a296162fc707c36a5146ca0073b4b
2021-01-29 11:26:51 +03:00
..
jobs No description 2021-01-04 18:12:22 +03:00
sinks No description 2021-01-04 18:12:22 +03:00
__init__.py No description 2021-01-04 18:12:22 +03:00
BUILD.bazel - fix: Various fixes for release 2021-01-29 11:26:51 +03:00
main.py No description 2021-01-04 18:12:22 +03:00
README.md No description 2021-01-04 18:12:22 +03:00

Nexus Ingest

Ingest goes to Internet and send retrived data to Kafka queue of operations. This version has cut configs subdirectory due to hard reliance of configs on the network infrastructure you are using. You have to write your own configs taking example below into account.

Sample configs/base.yaml

---
jobs:
  crossref-api:
    class: nexus.ingest.jobs.CrossrefApiJob
    kwargs:
      actions:
        - class: nexus.actions.crossref_api.CrossrefApiToThinScimagPbAction
        - class: nexus.actions.scimag.ScimagPbToDocumentOperationBytesAction
      base_url: https://api.crossref.org/
      max_retries: 60
      retry_delay: 10
      sinks:
        - class: nexus.ingest.sinks.KafkaSink
          kwargs:
            kafka_hosts:
              - kafka-0.example.net
              - kafka-1.example.net
            topic_name: operations_binary
  libgen-api:
    class: nexus.ingest.jobs.LibgenApiJob
    kwargs:
      actions:
        - class: nexus.actions.libgen_api.LibgenApiToScitechPbAction
        - class: nexus.actions.scitech.ScitechPbToDocumentOperationBytesAction
      base_url: libgen.example.net
      max_retries: 60
      retry_delay: 10
      sinks:
        - class: nexus.ingest.sinks.KafkaSink
          kwargs:
            kafka_hosts:
              - kafka-0.example.net
              - kafka-1.example.net
            topic_name: operations_binary
log_path: '/var/log/nexus-ingest/{{ ENV_TYPE }}'