hyperboria/papers-please/03-roadmap.md

# Roadmap v.0.1

This paper is composed of lifetime goals for Nexus STC (Standard Template Construct).

Although many of goals looks complex and faraway I strongly believe that we will be able to survive and prosper only by making impossible things.

#### Legend
- (*) Big theoretical task
- (E) Non-essential but still worth to try

## Accessibility of Science

### Software Accessibility

#### Infrastructure

- Hermetic and reproducible build of `hyperboria` project
- Publishing slim images of all required parts to DockerHub (via public services)
- Mirroring repository to IPFS
- Modern one-click app in .deb, .dmg, .exe and Docker format with support of updates

#### Public Mirrors

- (E) Create Yggdrasil configuration
- (E) Promote Yggdrasil itself
- Create Onion configuration
- Discuss the possibility of switching original LibGen backend to Nexus

### Data Accessibility

#### Infrastructure

- Putting scimag collection onto IPFS
- Announce data dumps for both scitech and scimag collections
- Pinning feature in the app that will allow users to pin subset of the collection in an easy way
- (*) Consider various **reliable** ways to announce new releases of **initial** data dumps
- Maintain and curate the list of already publicly available journals in Pylon

### Decentilized Publishing

#### Search Server Prerequesties

- Reconsider search schema taking into account new conditions and points of current section
- `Writing API` in Summa/Tantivy that supports immutability of already existing data
- (*) Consider various ways to produce reproducible segments/chunks of data in the case when same records come in different order
- `Replication API` in Summa allowing to effectively stream records from one replica to another
- `Signing API` in Summa for signing every search record and allowing to check signature during replication
- (*) Consider various ways of records broadcasting without coordination

#### Establishing replication network

- Containerize `nexus-pipe` for ingesting feed from CrossRef
- Carry tests with several ingesting leader nodes and multiple replicants

## Observability of Science

### Massive OCR

- (E) Fork/take Grobid project under curation
- Pair Summa server with possibility to OCR
- Extend schema with full article content
- Find CPU capacities to OCR all legacy papers

### References

- Maintain graph statistics (at least PageRank) in Summa/Meta API
- Clickable reference links in Cognitron Web (as in the bot)

### Entity Extraction

- (*) Consider tools like SciBERT and other upcoming techs for automated entity recognition 
- Separate indexing for entity and navigation on them

### Usage Statistics

- (*) Consider various **reliable** ways of exchaning reading/downloading statistics of papers

### Broadcasting

- (*) Make new papers visible to relevant users

## Automated Science (to be done)

## Technology Alliance (to be done)
- feat(nexus): Refactoring Cognitron 2 internal commit(s) GitOrigin-RevId: bdefcb9130693f1bc6c56d23d44fc4e41ff4672d 2021-04-30 15:05:39 +02:00			`# Roadmap v.0.1`
- feat(nexus): More papers 1 internal commit(s) GitOrigin-RevId: 192d64d2923ffeb571225562dba5ff2ad2c392c8 2021-04-25 19:51:49 +02:00
			`This paper is composed of lifetime goals for Nexus STC (Standard Template Construct).`

			`Although many of goals looks complex and faraway I strongly believe that we will be able to survive and prosper only by making impossible things.`

			`#### Legend`
			`- (*) Big theoretical task`
- feat(nexus): Refactoring Cognitron 2 internal commit(s) GitOrigin-RevId: bdefcb9130693f1bc6c56d23d44fc4e41ff4672d 2021-04-30 15:05:39 +02:00			`- (E) Non-essential but still worth to try`
- feat(nexus): More papers 1 internal commit(s) GitOrigin-RevId: 192d64d2923ffeb571225562dba5ff2ad2c392c8 2021-04-25 19:51:49 +02:00
			`## Accessibility of Science`

			`### Software Accessibility`

			`#### Infrastructure`

			- Hermetic and reproducible build of `hyperboria` project
			`- Publishing slim images of all required parts to DockerHub (via public services)`
			`- Mirroring repository to IPFS`
			`- Modern one-click app in .deb, .dmg, .exe and Docker format with support of updates`

			`#### Public Mirrors`

			`- (E) Create Yggdrasil configuration`
			`- (E) Promote Yggdrasil itself`
			`- Create Onion configuration`
			`- Discuss the possibility of switching original LibGen backend to Nexus`

			`### Data Accessibility`

			`#### Infrastructure`

			`- Putting scimag collection onto IPFS`
			`- Announce data dumps for both scitech and scimag collections`
- feat(nexus): Refactoring Cognitron 2 internal commit(s) GitOrigin-RevId: bdefcb9130693f1bc6c56d23d44fc4e41ff4672d 2021-04-30 15:05:39 +02:00			`- Pinning feature in the app that will allow users to pin subset of the collection in an easy way`
- feat(nexus): More papers 1 internal commit(s) GitOrigin-RevId: 192d64d2923ffeb571225562dba5ff2ad2c392c8 2021-04-25 19:51:49 +02:00			`- () Consider various reliable* ways to announce new releases of initial data dumps`
			`- Maintain and curate the list of already publicly available journals in Pylon`

			`### Decentilized Publishing`

			`#### Search Server Prerequesties`

			`- Reconsider search schema taking into account new conditions and points of current section`
			- `Writing API` in Summa/Tantivy that supports immutability of already existing data
			`- (*) Consider various ways to produce reproducible segments/chunks of data in the case when same records come in different order`
			- `Replication API` in Summa allowing to effectively stream records from one replica to another
			- `Signing API` in Summa for signing every search record and allowing to check signature during replication
			`- (*) Consider various ways of records broadcasting without coordination`

			`#### Establishing replication network`

			- Containerize `nexus-pipe` for ingesting feed from CrossRef
			`- Carry tests with several ingesting leader nodes and multiple replicants`

			`## Observability of Science`

			`### Massive OCR`

			`- (E) Fork/take Grobid project under curation`
			`- Pair Summa server with possibility to OCR`
			`- Extend schema with full article content`
			`- Find CPU capacities to OCR all legacy papers`

			`### References`

			`- Maintain graph statistics (at least PageRank) in Summa/Meta API`
- feat(nexus): Refactoring Cognitron 2 internal commit(s) GitOrigin-RevId: bdefcb9130693f1bc6c56d23d44fc4e41ff4672d 2021-04-30 15:05:39 +02:00			`- Clickable reference links in Cognitron Web (as in the bot)`
- feat(nexus): More papers 1 internal commit(s) GitOrigin-RevId: 192d64d2923ffeb571225562dba5ff2ad2c392c8 2021-04-25 19:51:49 +02:00
			`### Entity Extraction`

			`- (*) Consider tools like SciBERT and other upcoming techs for automated entity recognition`
			`- Separate indexing for entity and navigation on them`

			`### Usage Statistics`

			`- () Consider various reliable* ways of exchaning reading/downloading statistics of papers`

			`### Broadcasting`

			`- (*) Make new papers visible to relevant users`

			`## Automated Science (to be done)`

			`## Technology Alliance (to be done)`