mirror of https://github.com/nexus-stc/hyperboria synced 2024-11-23 11:36:51 +01:00

the-superpirate 8e8a92f1b1 - feat(nexus): Refactoring Cognitron

2 internal commit(s)

GitOrigin-RevId: bdefcb9130693f1bc6c56d23d44fc4e41ff4672d

2021-04-30 16:16:13 +03:00

2.8 KiB

Raw Blame History

Roadmap v.0.1

This paper is composed of lifetime goals for Nexus STC (Standard Template Construct).

Although many of goals looks complex and faraway I strongly believe that we will be able to survive and prosper only by making impossible things.

Legend

(*) Big theoretical task
(E) Non-essential but still worth to try

Accessibility of Science

Software Accessibility

Infrastructure

Hermetic and reproducible build of hyperboria project
Publishing slim images of all required parts to DockerHub (via public services)
Mirroring repository to IPFS
Modern one-click app in .deb, .dmg, .exe and Docker format with support of updates

Public Mirrors

(E) Create Yggdrasil configuration
(E) Promote Yggdrasil itself
Create Onion configuration
Discuss the possibility of switching original LibGen backend to Nexus

Data Accessibility

Infrastructure

Putting scimag collection onto IPFS
Announce data dumps for both scitech and scimag collections
Pinning feature in the app that will allow users to pin subset of the collection in an easy way
(*) Consider various reliable ways to announce new releases of initial data dumps
Maintain and curate the list of already publicly available journals in Pylon

Decentilized Publishing

Search Server Prerequesties

Reconsider search schema taking into account new conditions and points of current section
Writing API in Summa/Tantivy that supports immutability of already existing data
(*) Consider various ways to produce reproducible segments/chunks of data in the case when same records come in different order
Replication API in Summa allowing to effectively stream records from one replica to another
Signing API in Summa for signing every search record and allowing to check signature during replication
(*) Consider various ways of records broadcasting without coordination

Establishing replication network

Containerize nexus-pipe for ingesting feed from CrossRef
Carry tests with several ingesting leader nodes and multiple replicants

Observability of Science

Massive OCR

(E) Fork/take Grobid project under curation
Pair Summa server with possibility to OCR
Extend schema with full article content
Find CPU capacities to OCR all legacy papers

References

Maintain graph statistics (at least PageRank) in Summa/Meta API
Clickable reference links in Cognitron Web (as in the bot)

Entity Extraction

(*) Consider tools like SciBERT and other upcoming techs for automated entity recognition
Separate indexing for entity and navigation on them

Usage Statistics

(*) Consider various reliable ways of exchaning reading/downloading statistics of papers

Broadcasting

(*) Make new papers visible to relevant users

2.8 KiB

Raw Blame History

Roadmap v.0.1

Legend

Accessibility of Science

Software Accessibility

Infrastructure

Public Mirrors

Data Accessibility

Infrastructure

Decentilized Publishing

Search Server Prerequesties

Establishing replication network

Observability of Science

Massive OCR

References

Entity Extraction

Usage Statistics

Broadcasting

Automated Science (to be done)

Technology Alliance (to be done)

2.8 KiB Raw Blame History

Roadmap v.0.1

Legend

Accessibility of Science

Software Accessibility

Infrastructure

Public Mirrors

Data Accessibility

Infrastructure

Decentilized Publishing

Search Server Prerequesties

Establishing replication network

Observability of Science

Massive OCR

References

Entity Extraction

Usage Statistics

Broadcasting

Automated Science (to be done)

Technology Alliance (to be done)

2.8 KiB

Raw Blame History