- feat(nexus): Refactoring Cognitron

2 internal commit(s)

GitOrigin-RevId: bdefcb9130693f1bc6c56d23d44fc4e41ff4672d
This commit is contained in:
the-superpirate 2021-04-30 16:05:39 +03:00
parent e86f778bf7
commit 8e8a92f1b1
59 changed files with 2152 additions and 773 deletions

View File

@ -20,18 +20,3 @@ platform(
],
)
load("@rules_rust//proto:toolchain.bzl", "rust_proto_toolchain")
rust_proto_toolchain(
name = "proto-toolchain-impl",
grpc_plugin = "//rules/rust/cargo:cargo_bin_protoc_gen_rust_grpc",
proto_plugin = "//rules/rust/cargo:cargo_bin_protoc_gen_rust",
protoc = "@com_google_protobuf//:protoc",
)
toolchain(
name = "proto-toolchain",
toolchain = ":proto-toolchain-impl",
toolchain_type = "@rules_rust//proto:toolchain",
)

View File

@ -1,39 +1,16 @@
# Hyperboria
![Ancient Tech](media/ancient-tech.jpg)
## Introduction
Hyperboria repository is a pack of tools for dealing with SciMag and SciTech collections.
Hyperboria is a monorepository of tools aimed to enhance availability of science.
It consists of configurable [`search engine`](nexus/cognitron), [`pipeline`](nexus/pipe) for [`ingesting`](nexus/ingest) data
from upstream sources. So-called [`actions`](nexus/actions) aimed to converting data from external APIs
into [`internal Protobuf format`](nexus/models) and to landing converted data into databases and/or search engines.
Here you find applications for accessing and searching in the biggest libraries of the Earth and other supportive tools.
## Prerequisite
Install system packages for various OSes:
```shell script
sudo ./repository/install-packages.sh
```
### Ubuntu 20.04
#### Docker
[Installation Guide](https://docs.docker.com/engine/install/ubuntu/)
#### IPFS
[Installation Guide](https://docs.ipfs.io/install/)
### MacOS
#### Docker
[Installation Guide](https://docs.docker.com/docker-for-mac/install/)
#### IPFS
[Installation Guide](https://docs.ipfs.io/install/)
All sources are under [The Unlicense](https://unlicense.org). They are literally yours.
## Content
- [`images`](images) - base docker images for [`nexus`](nexus)
- [`library`](library) - shared libraries
- [`nexus`](nexus) - processing and searching in scientific text collections
- [`rules`](rules) - build rules
- [`apps`](packages) - ready applications and images to deploy in various environments

View File

@ -188,8 +188,6 @@ load("//rules/rust:crates.bzl", "raze_fetch_remote_crates")
raze_fetch_remote_crates()
register_toolchains("//:proto-toolchain")
# NodeJS
load("@build_bazel_rules_nodejs//:index.bzl", "node_repositories", "yarn_install")

15
apps/README.md Normal file
View File

@ -0,0 +1,15 @@
# Packages
All packages requires data dumps. Older base dumps can be found in the end of this page.
## Packages
- [`Telegram Bot`](nexus-bot)
- [`Headless Search Server`](nexus-cognitron)
- [`Web Application`](nexus-cognitron-web)
## Data Dumps
| Date | IPFS Hash |
| --- | ----------- |
| 2021-03-01 | `bafykbzacebzohi352bddfunaub5rgqv5b324nejk5v6fltjh45be5ykw5jsjg` |

View File

@ -0,0 +1,29 @@
---
services:
nexus-bot:
depends_on:
- nexus-meta-api
environment:
ENV_TYPE: production
image: thesuperpirate/nexus-bot:latest
ports:
- '3000:3000'
nexus-meta-api:
depends_on:
- summa
environment:
ENV_TYPE: production
NEXUS_META_API_grpc.address: '0.0.0.0'
NEXUS_META_API_grpc.port: 9090
NEXUS_META_API_summa.url: 'http://summa:8082'
image: thesuperpirate/nexus-meta-api:latest
summa:
environment:
ENV_TYPE: production
SUMMA_debug: 'true'
SUMMA_http.address: '0.0.0.0'
SUMMA_http.port: '8082'
image: izihawa/summa:latest
volumes:
- '${DATA_PATH}:/summa/data'
version: "3"

View File

@ -0,0 +1,22 @@
## Prerequisite
Follow the [development guide](../../papers-please/99-development.md) to install Docker and IPFS.
## Guide
#### 1. Download data dumps
```shell script
git clone https://github.com/nexus-stc/hyperboria
cd hyperboria/apps/nexus-cognitron-web
export COLLECTION=bafykbzacebzohi352bddfunaub5rgqv5b324nejk5v6fltjh45be5ykw5jsjg
ipfs get $COLLECTION -o data && ipfs pin add $COLLECTION
export DATA_PATH=$(realpath ./data)
```
#### 2. Launch
```shell script
docker-compose pull && docker-compose up
```
then go to [http://localhost:3000](http://localhost:3000)

View File

@ -5,9 +5,11 @@ services:
- nexus-meta-api-envoy
environment:
ENV_TYPE: production
NEXUS_COGNITRON_WEB_application.address: 0.0.0.0
NEXUS_COGNITRON_WEB_application.port: 3000
NEXUS_COGNITRON_WEB_ipfs.gateway.url: https://cloudflare-ipfs.com
NEXUS_COGNITRON_WEB_meta_api.url: http://localhost:8080
image: thesuperpirate/cognitron-web:latest
image: thesuperpirate/nexus-cognitron-web:latest
ports:
- '3000:3000'
nexus-meta-api:
@ -18,7 +20,7 @@ services:
NEXUS_META_API_grpc.address: '0.0.0.0'
NEXUS_META_API_grpc.port: 9090
NEXUS_META_API_summa.url: 'http://summa:8082'
image: thesuperpirate/meta-api:latest
image: thesuperpirate/nexus-meta-api:latest
nexus-meta-api-envoy:
depends_on:
- nexus-meta-api
@ -26,7 +28,7 @@ services:
ports:
- '8080:8080'
volumes:
- './nexus-meta-api-envoy.yaml:/etc/envoy/envoy.yaml'
- './envoy.yaml:/etc/envoy/envoy.yaml'
summa:
environment:
ENV_TYPE: production

View File

@ -63,7 +63,13 @@ class Configurator(RichDict):
env_prefix = env_prefix.lower()
for name, value in os.environ.items():
if name.lower().startswith(env_prefix):
env_dict[name[len(env_prefix):].lstrip('_')] = value
stripped_name = name[len(env_prefix):].lstrip('_')
if stripped_name[-2:] == '[]':
if stripped_name not in env_dict:
env_dict[stripped_name[:-2]] = []
env_dict[stripped_name[:-2]].append(value)
else:
env_dict[stripped_name] = value
env_dict = unflatten(env_dict, sep=env_key_separator)
for config in ([os.environ] + configs + [env_dict]):

BIN
media/ancient-tech.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 146 KiB

View File

@ -63,3 +63,21 @@ container_push(
repository = "nexus-bot",
tag = "latest",
)
container_push(
name = "push-public-latest",
format = "Docker",
image = ":image",
registry = "registry.hub.docker.com",
repository = "thesuperpirate/nexus-bot",
tag = "latest",
)
container_push(
name = "push-public-testing",
format = "Docker",
image = ":image",
registry = "registry.hub.docker.com",
repository = "thesuperpirate/nexus-bot",
tag = "testing",
)

View File

@ -1 +1,2 @@
data
data
ipfs

View File

@ -1,3 +0,0 @@
package(default_visibility = ["//visibility:public"])

View File

@ -1,28 +1,4 @@
# Nexus Cognitron
## Prerequisite
Follow the [root guide](../../README.md) to install Docker, IPFS and Bazel (optionally)
## Guide
#### 1. Download data dumps
```shell script
export COLLECTION=bafykbzacebzohi352bddfunaub5rgqv5b324nejk5v6fltjh45be5ykw5jsjg
ipfs get $COLLECTION -o data && ipfs pin add $COLLECTION
export DATA_PATH=$(realpath ./data)
```
#### 2. Launch Nexus Cognitron
Create [`docker-compose.yml`](docker-compose.yml) file to set up Nexus Cognitron and then launch it:
```shell script
docker-compose pull && docker-compose up
```
then go to [http://localhost:3000](http://localhost:3000)
#### 3. (Optional) Deploy data dumps into your database
#### Deploy data dumps into your database
There is a function `work` in [`traversing script`](installer/scripts/iterate.py)
that you can reimplement to iterate over the whole dataset and insert it into your

View File

@ -1,4 +1,3 @@
load("@build_bazel_rules_nodejs//:index.bzl", "js_library")
load("@io_bazel_rules_docker//container:container.bzl", "container_push")
load("@io_bazel_rules_docker//nodejs:image.bzl", "nodejs_image")
load("@npm//nuxt:index.bzl", "nuxt")
@ -20,21 +19,16 @@ deps = [
"@npm//bootstrap-vue",
"@npm//core-js",
"@npm//dateformat",
"@npm//electron",
"@npm//pug",
"@npm//pug-plain-loader",
"@npm//sass",
"@npm//sass-loader",
"@npm//vue",
"//nexus/meta_api/js/client",
"//nexus/views/js",
]
js_library(
name = "nexus-cognitron-web",
package_name = "nexus-cognitron-web",
srcs = files,
deps = deps,
)
nuxt(
name = "web-dev",
args = [
@ -42,7 +36,7 @@ nuxt(
"nexus/cognitron/web/nuxt.config.js",
"--watch-poll",
],
data = [":nexus-cognitron-web"],
data = files + deps,
)
nuxt(
@ -82,7 +76,7 @@ container_push(
format = "Docker",
image = ":image",
registry = "registry.hub.docker.com",
repository = "thesuperpirate/cognitron-web",
repository = "thesuperpirate/nexus-cognitron-web",
tag = "latest",
)
@ -91,6 +85,7 @@ container_push(
format = "Docker",
image = ":image",
registry = "registry.hub.docker.com",
repository = "thesuperpirate/cognitron-web",
repository = "thesuperpirate/nexus-cognitron-web",
tag = "testing",
)

View File

@ -2,5 +2,5 @@
#### Development
```shell script
bazel run web_dev
bazel run web-dev
```

View File

@ -1,6 +1,7 @@
<template lang="pug">
div.document
v-scitech(:document="document")
v-scimag(v-if="document.schema === 'scimag'" :document="document")
v-scitech(v-if="document.schema === 'scitech'" :document="document")
</template>
<script>

View File

@ -1,16 +1,19 @@
<template lang="pug">
ul
li(v-for='scoredDocument in scoredDocuments')
search-item(:scored-document='scoredDocument', :key='scoredDocument.typedDocument.scitech.id')
li(v-for='document in documents')
v-scimag-search-item(v-if="document.schema == 'scimag'", :document='document', :key='document.id')
v-scitech-search-item(v-if="document.schema == 'scitech'", :document='document', :key='document.id')
</template>
<script>
import SearchItem from '@/components/search-item'
import VScimagSearchItem from '@/components/v-scimag-search-item'
import VScitechSearchItem from '@/components/v-scitech-search-item'
export default {
name: 'SearchList',
components: { SearchItem },
components: { VScimagSearchItem, VScitechSearchItem },
props: {
scoredDocuments: {
documents: {
type: Array,
required: true
}

View File

@ -0,0 +1,23 @@
<template lang="pug">
nav.navbar.fixed-bottom.ml-auto
ul.navbar-nav.ml-auto
li.nav-item
| Powered by&nbsp;
a(href="https://github.com/nexus-stc/hyperboria") Nexus STC
| , 2025
</template>
<script>
export default {
name: 'VFooter',
data () {
return {
query: ''
}
}
}
</script>
<style scoped lang="scss">
</style>

View File

@ -3,6 +3,8 @@
b-container
nuxt-link(to="/" title="Go to search!").logo
| > Nexus Cognitron
a.nav-link(href="https://t.me/nexus_search" title="News")
| News
</template>
<script>

View File

@ -0,0 +1,71 @@
<template lang="pug">
div.d-flex
div
nuxt-link(:to="{ name: 'documents-schema-id', params: { schema: document.schema, id: document.id }}") {{ document.icon }} {{ document.title }}
.detail
div
i.mr-1 DOI:
span {{ document.doi }}
div(v-if='document.getFirstAuthors(false, 1)')
span {{ document.getFirstAuthors(false, 1) }} {{ issuedAt }}
.gp
span.el.text-uppercase {{ document.getFormattedFiledata() }}
</template>
<script>
import { getIssuedDate } from '@/plugins/helpers'
export default {
name: 'SearchItem',
props: {
document: {
type: Object,
required: true
}
},
computed: {
issuedAt: function () {
const date = getIssuedDate(this.document.issuedAt)
if (date != null) return '(' + date + ')'
return null
}
}
}
</script>
<style scoped lang="scss">
.el {
display: block;
line-height: 1em;
margin-right: 10px;
padding-right: 10px;
border-right: 1px solid;
&:last-child {
border-right: 0;
}
}
img {
margin-left: 15px;
max-width: 48px;
max-height: 48px;
object-fit: contain;
width: auto;
}
.key {
font-weight: bold;
}
.gp {
margin-top: 2px;
display: flex;
}
.detail {
font-size: 12px;
}
i {
text-transform: uppercase;
}
</style>

View File

@ -2,8 +2,9 @@
div
.top
h6 {{ document.title }}
h6
i {{ document.locator }}
.top
i
h6 {{ document.getFormattedLocator() }}
table
tbody
v-tr(label="DOI", :value="document.doi")
@ -11,17 +12,17 @@
v-tr(label="Tags", :value="tags")
v-tr(label="ISSNS", :value="issns")
v-tr(label="ISBNS", :value="isbns")
v-tr(label="File", :value="document.filedata")
v-tr-link(label="Download link", v-if="ipfsMultihash" :value="document.filename", :url="ipfsUrl")
v-tr(label="File", :value="document.getFormattedFiledata()")
v-tr-multi-link(label="Links", :links="links")
</template>
<script>
import { getIssuedDate } from '@/plugins/helpers'
import VTr from './v-tr'
import VTrLink from './v-tr-link'
import VTrMultiLink from './v-tr-multi-link'
export default {
name: 'VScimag',
components: { VTrLink, VTr },
components: { VTr, VTrMultiLink },
props: {
document: {
type: Object,
@ -29,13 +30,13 @@ export default {
}
},
computed: {
pages: function () {
pages () {
if (this.document.firstPage && this.document.lastPage && this.document.firstPage !== this.document.lastPage) {
return `${this.document.firstPage}-${this.document.lastPage}`
}
return null
},
page: function () {
page () {
if (this.document.firstPage) {
if (this.document.lastPage) {
if (this.document.firstPage === this.document.lastPage) {
@ -49,26 +50,35 @@ export default {
}
return null
},
issns: function () {
issns () {
return (this.document.issnsList || []).join('; ')
},
isbns: function () {
isbns () {
return (this.document.isbnsList || []).join('; ')
},
issuedAt: function () {
issuedAt () {
return getIssuedDate(this.document.issuedAt)
},
ipfsUrl: function () {
if (!this.ipfsMultihash) return null
return `${this.$config.ipfs.gateway.url}/ipfs/${this.ipfsMultihash}?filename=${this.filename}&download=true`
ipfsUrl () {
if (!this.document.getIpfsMultihash()) return null
return `${this.$config.ipfs.gateway.url}/ipfs/${this.document.getIpfsMultihash()}?filename=${this.document.getFilename()}&download=true`
},
ipfsMultihash: function () {
if (this.document.ipfsMultihashesList) {
return this.document.ipfsMultihashesList[0]
links () {
const links = []
if (this.ipfsUrl) {
links.push({
url: this.ipfsUrl,
value: 'IPFS.io'
})
} else {
links.push({
url: this.document.getTelegramLink(),
value: 'Nexus Bot'
})
}
return null
return links
},
tags: function () {
tags () {
return (this.document.tagsList || []).join('; ')
}
}

View File

@ -1,21 +1,15 @@
<template lang="pug">
div.d-flex
div
nuxt-link(:to="{ name: 'documents-schema-id', params: { schema: schema, id: document.id }}") {{ document.title }}
nuxt-link(:to="{ name: 'documents-schema-id', params: { schema: document.schema, id: document.id }}") {{ document.icon }} {{ document.title }}
.detail
div
i.mr-1(v-if='document.doi') DOI:
span {{ document.doi }}
div(v-if='document.firstAuthors')
span {{ document.firstAuthors }} {{ issuedAt }}
div(v-if='document.getFirstAuthors(false, 1)')
span {{ document.getFirstAuthors(false, 1) }} {{ issuedAt }}
.gp
span.el.text-uppercase(v-if="document.extension") {{ document.extension }}
span.el.text-uppercase(v-if="document.language") {{ document.language }}
span.el.text-uppercase(v-if="document.filesize") {{ document.filesize }}
span.el(v-if="document.pages")
span.mr-2 {{ document.pages }}
span pages
span.el.text-uppercase {{ document.getFormattedFiledata() }}
</template>
<script>
@ -25,25 +19,19 @@ import { getIssuedDate } from '@/plugins/helpers'
export default {
name: 'SearchItem',
props: {
scoredDocument: {
document: {
type: Object,
required: true
}
},
computed: {
document: function () {
return this.scoredDocument.typedDocument[this.schema]
},
issuedAt: function () {
const date = getIssuedDate(this.document.issuedAt)
if (date != null) return '(' + date + ')'
return null
},
schema: function () {
const td = this.scoredDocument.typedDocument
return Object.keys(td).filter(k => td[k] !== undefined)[0]
}
}
}
</script>

View File

@ -4,7 +4,7 @@
h6 {{ document.title }}
.top
i
h6 {{ document.locator }}
h6 {{ document.getFormattedLocator() }}
table
tbody
v-tr(label="DOI", :value="document.doi")
@ -12,8 +12,8 @@
v-tr(label="Tags", :value="tags")
v-tr(label="ISBNS", :value="isbns")
v-tr(label="ISSNS", :value="issns")
v-tr(label="File", :value="document.filedata")
v-tr-link(label="Download link", v-if="ipfsMultihash" :value="document.filename", :url="ipfsUrl")
v-tr(label="File", :value="document.getFormattedFiledata()")
v-tr-multi-link(label="Links", :links="links")
</template>
<script>
@ -27,26 +27,44 @@ export default {
}
},
computed: {
isbns: function () {
isbns () {
return (this.document.isbnsList || []).join('; ')
},
issns: function () {
issns () {
return (this.document.issnsList || []).join('; ')
},
issuedAt: function () {
issuedAt () {
return getIssuedDate(this.document.issuedAt)
},
ipfsUrl: function () {
ipfsUrl () {
if (!this.ipfsMultihash) return null
return `${this.$config.ipfs.gateway.url}/ipfs/${this.ipfsMultihash}?filename=${this.document.filename}&download=true`
return `${this.$config.ipfs.gateway.url}/ipfs/${this.ipfsMultihash}?filename=${this.document.getFilename()}&download=true`
},
ipfsMultihash: function () {
ipfsMultihash () {
if (this.document.ipfsMultihashesList) {
return this.document.ipfsMultihashesList[0]
}
return ''
},
tags: function () {
links () {
const links = []
if (this.ipfsUrl) {
links.push({
url: this.ipfsUrl,
value: 'IPFS.io'
})
} else {
links.push({
url: this.document.getTelegramLink(),
value: 'Nexus Bot'
})
}
return links
},
locator () {
return ''
},
tags () {
return (this.document.tagsList || []).join('; ')
}
}

View File

@ -1,23 +1,18 @@
<template lang="pug">
tr(v-if="value")
tr
th {{ label }}
td
a(:href="url" download) {{ value }}
a(v-for="link in links" :href="link.url" download) {{ link.value }}
</template>
<script>
export default {
name: 'VTrLink',
name: 'VTrMultiLink',
props: {
value: {
default: null,
links: {
required: true,
type: String
},
url: {
required: true,
type: String
type: Array
},
label: {
required: true,
@ -39,4 +34,7 @@ export default {
th {
white-space: nowrap;
}
td > a {
margin-right: 10px;
}
</style>

View File

@ -3,4 +3,5 @@
v-header
b-container.mt-3
nuxt
v-footer
</template>

View File

@ -7,8 +7,8 @@ if (buildDir) {
module.exports = {
server: {
host: '0.0.0.0',
port: 3000
host: process.env['NEXUS_COGNITRON_WEB_application.address'] || '0.0.0.0',
port: process.env['NEXUS_COGNITRON_WEB_application.port'] || 3000
},
buildDir: buildDir,
srcDir: 'nexus/cognitron/web',
@ -40,7 +40,8 @@ module.exports = {
publicRuntimeConfig: {
meta_api: {
url: process.env['NEXUS_COGNITRON_WEB_meta_api.url'] || 'http://localhost:8080'
hostname: process.env['NEXUS_COGNITRON_WEB_meta_api.hostname'],
url: process.env['NEXUS_COGNITRON_WEB_meta_api.url']
},
ipfs: {
gateway: {
@ -52,7 +53,7 @@ module.exports = {
// Plugins to run before rendering page (https://go.nuxtjs.dev/config-plugins)
plugins: [
'plugins/helpers',
'plugins/meta-api',
{ src: 'plugins/meta-api', mode: 'client' },
'plugins/utils'
],
@ -81,6 +82,9 @@ module.exports = {
extend (config) {
config.resolve.alias['~'] = process.cwd()
},
transpile: ['nexus-meta-api-js-client'],
transpile: ['nexus-meta-api-js-client', 'nexus-views-js']
},
node: {
window: 'empty'
}
}

View File

@ -15,15 +15,7 @@ export default {
}
},
async fetch () {
const response = await this.$meta_api.getView(this.$route.params.schema, this.$route.params.id)
this.document = {
...response.typedDocument.scitech,
filedata: response.filedata,
filename: response.filename,
filesize: response.filesize,
firstAuthors: response.firstAuthors,
locator: response.locator
}
this.document = await this.$meta_api.get(this.$route.params.schema, this.$route.params.id)
},
fetchOnServer: false
}

View File

@ -3,65 +3,56 @@
form
.input-group
b-form-input(v-model='query' placeholder='Enter book name or DOI')
b-button(type='submit' @click.stop.prevent='submit(query, 1, schema)') Search
b-form-radio-group(
v-model="schema"
:options="schemas"
class="radio-group"
b-button(type='submit' @click.stop.prevent='submit(query, 1, schemas)') Search
b-form-checkbox-group.checkbox-group(
v-model="schemas"
:options="availableSchemas"
value-field="item"
text-field="name")
p.mt-5(v-if="scoredDocuments.length == 0") Nothing found
b-pagination(v-if='scoredDocuments.length > 0' v-model='page' :total-rows='totalRows' :per-page='perPage' limit="2" :disabled="isLoading")
.search_list
search-list(:scored-documents='scoredDocuments')
b-pagination(v-if='scoredDocuments.length > 0' v-model='page' :total-rows='totalRows' :per-page='perPage' limit="2" :disabled="isLoading")
p.mt-5(v-if="nothingFound") Nothing found
b-pagination(v-if='documents.length > 0' v-model='page' :total-rows='totalRows' :per-page='perPage' limit="2" :disabled="isLoading")
.search-list
search-list(:documents='documents')
b-pagination(v-if='documents.length > 0' v-model='page' :total-rows='totalRows' :per-page='perPage' limit="2" :disabled="isLoading")
</template>
<script>
import SearchList from '@/components/search-list'
export default {
name: 'Index',
components: { SearchList },
loading: true,
data () {
return {
query: '',
scoredDocuments: [],
defaultSchema: 'scitech',
schema: 'scitech',
schemas: [
{ item: 'scitech', name: 'Scitech' },
// { item: 'scimag', name: 'Scimag' }
availableSchemas: [
{ item: 'scitech', name: 'SciTech' },
{ item: 'scimag', name: 'SciMag' }
],
documents: [],
nothingFound: false,
page: 1,
totalRows: 10,
perPage: 1
perPage: 1,
query: '',
schemas: ['scimag', 'scitech'],
totalRows: 10
}
},
async fetch () {
this.nothingFound = false
this.query = this.$route.query.query
if (!this.query) {
await this.$router.push({ path: '/' })
this.scoredDocuments = []
return
this.documents = []
return this.$router.push({ path: '/' })
}
this.schemas = this.$route.query.schemas.split(',')
if (this.schemas.length === 0) {
this.schemas = ['scimag', 'scitech']
}
this.page = this.$route.query.page
this.schema = this.$route.query.schema || this.defaultSchema
if (!process.server) {
this.$nuxt.$loading.start()
}
const response = await this.$meta_api.search(this.schema, this.query, this.page - 1, 5)
if (response.hasNext) {
this.totalRows = Number(this.page) + 1
} else {
this.totalRows = this.page
}
this.scoredDocuments = response.scoredDocumentsList
if (!process.server) {
this.$nuxt.$loading.finish()
}
await this.retrieveDocuments()
},
fetchOnServer: false,
computed: {
@ -71,29 +62,41 @@ export default {
},
watch: {
'$route.query': '$fetch',
schema () {
async schemas () {
if (this.query) {
this.submit(this.query, this.page, this.schema)
await this.submit(this.query, 1, this.schemas)
}
},
page () {
this.submit(this.query, this.page, this.schema)
async page () {
await this.submit(this.query, this.page, this.schemas)
}
},
methods: {
submit (query, page, schema) {
this.$router.push({ path: '/', query: { query: query, page: page, schema: schema } })
async submit (query, page, schemas) {
await this.$router.push({ path: '/', query: { query: query, page: page, schemas: schemas.join(',') } })
},
async retrieveDocuments () {
const response = await this.$meta_api.search(this.schemas, this.query, this.page - 1, 5)
if (response.hasNext) {
this.totalRows = Number(this.page) + 1
} else {
this.totalRows = this.page
}
if (response.documents.length === 0) {
this.nothingFound = true
}
this.documents = response.documents
}
}
}
</script>
<style scoped>
.search_list {
.search-list {
padding-top: 15px;
padding-bottom: 15px;
}
.radio-group {
.checkbox-group {
margin: 10px 0;
}
</style>

View File

@ -5,7 +5,6 @@ export function getIssuedDate (unixtime) {
try {
return dateFormat(new Date(unixtime * 1000), 'yyyy')
} catch (e) {
console.error(e)
return null
}
}

View File

@ -1,6 +1,41 @@
import { ScimagView, ScitechView } from 'nexus-views-js'
import MetaApi from 'nexus-meta-api-js-client'
export default ({ $config }, inject) => {
const metaApi = new MetaApi($config.meta_api)
inject('meta_api', metaApi)
function getSchema (typedDocument) {
return Object.keys(typedDocument).filter(k => typedDocument[k] !== undefined)[0]
}
function schemaToView (schema, pb) {
if (schema === 'scimag') {
return new ScimagView(pb)
} else if (schema === 'scitech') {
return new ScitechView(pb)
}
}
class MetaApiWrapper {
constructor (metaApiConfig) {
this.metaApi = new MetaApi(metaApiConfig.url || ('http://' + window.location.host), metaApiConfig.hostname)
}
async get (schema, id) {
const response = await this.metaApi.get(schema, id)
return schemaToView(schema, response[schema])
}
async search (schemas, query, page, pageSize) {
const response = await this.metaApi.search(schemas, query, page, pageSize)
const documents = response.scoredDocumentsList.map((scoredDocument) => {
const schema = getSchema(scoredDocument.typedDocument)
return schemaToView(schema, scoredDocument.typedDocument[schema])
})
return {
hasNext: response.hasNext,
documents: documents
}
}
}
export default ({ $config }, inject) => {
const metaApiWrapper = new MetaApiWrapper($config.meta_api)
inject('meta_api', metaApiWrapper)
}

View File

@ -4,27 +4,3 @@ const MULTIWHITESPACE_REGEX = /\s+/g
export function castStringToSingleString (s) {
return s.replace(ALNUMWHITESPACE_REGEX, ' ').replace(MULTIWHITESPACE_REGEX, '-')
}
export function escapeFormat (text) {
return text.replace(/_+/g, '_')
.replace(/\*+/g, '*')
.replace(/`+/g, "'")
.replace(/\[+/g, '`[`')
.replace(/]+/g, '`]`')
}
export function quoteUrl (url, safe) {
if (typeof (safe) !== 'string') {
safe = '/'
}
url = encodeURIComponent(url)
const toUnencode = []
for (let i = safe.length - 1; i >= 0; --i) {
const encoded = encodeURIComponent(safe[i])
if (encoded !== safe.charAt(i)) {
toUnencode.push(encoded)
}
}
url = url.replace(new RegExp(toUnencode.join('|'), 'ig'), decodeURIComponent)
return url
}

View File

@ -63,7 +63,7 @@ container_push(
format = "Docker",
image = ":image",
registry = "registry.hub.docker.com",
repository = "thesuperpirate/meta-api",
repository = "thesuperpirate/nexus-meta-api",
tag = "latest",
)
@ -72,6 +72,6 @@ container_push(
format = "Docker",
image = ":image",
registry = "registry.hub.docker.com",
repository = "thesuperpirate/meta-api",
repository = "thesuperpirate/nexus-meta-api",
tag = "testing",
)

View File

@ -2,9 +2,13 @@ import documentsProto from 'meta-api-grpc-web-js/meta-api-grpc-web-js_pb/nexus/m
import searchProto from 'meta-api-grpc-web-js/meta-api-grpc-web-js_pb/nexus/meta_api/proto/search_service_grpc_web_pb'
export default class MetaApi {
constructor (config) {
this.documentsClient = new documentsProto.DocumentsPromiseClient(config.url)
this.searchClient = new searchProto.SearchPromiseClient(config.url)
constructor (url, hostname) {
this.metadata = {}
if (hostname) {
this.metadata['X-Forwarded-Host'] = hostname
}
this.documentsClient = new documentsProto.DocumentsPromiseClient(url)
this.searchClient = new searchProto.SearchPromiseClient(url)
}
generateId (length) {
@ -17,23 +21,27 @@ export default class MetaApi {
return result.join('')
}
async getView (schema, documentId) {
prepareMetadata () {
return Object.assign({ 'request-id': this.generateId(12) }, this.metadata)
}
async get (schema, documentId) {
const request = new documentsProto.TypedDocumentRequest()
request.setSchema(schema)
request.setDocumentId(documentId)
request.setSessionId(this.generateId(8))
const response = await this.documentsClient.get_view(request, { 'request-id': this.generateId(12) })
const response = await this.documentsClient.get(request, this.prepareMetadata())
return response.toObject()
}
async search (schema, query, page, pageSize = 5) {
async search (schemas, query, page, pageSize = 5) {
const request = new searchProto.SearchRequest()
request.setPage(page)
request.setPageSize(pageSize)
request.addSchemas(schema)
schemas.forEach((schema) => request.addSchemas(schema))
request.setQuery(query)
request.setSessionId(this.generateId(8))
const response = await this.searchClient.search(request, { 'request-id': this.generateId(12) })
const response = await this.searchClient.search(request, this.prepareMetadata())
return response.toObject()
}
}

View File

@ -1,5 +1,4 @@
load("@com_github_grpc_grpc//bazel:python_rules.bzl", "py_grpc_library", "py_proto_library")
load("@rules_rust//proto:proto.bzl", "rust_proto_library")
load("@rules_proto//proto:defs.bzl", "proto_library")
load("@rules_proto_grpc//js:defs.bzl", "js_grpc_web_library")
@ -27,13 +26,6 @@ py_grpc_library(
deps = [":meta-api-proto-py"],
)
rust_proto_library(
name = "meta-api-proto-rust",
rust_deps = ["//rules/rust/cargo:protobuf"],
visibility = ["//visibility:public"],
deps = [":meta-api-proto"],
)
js_grpc_web_library(
name = "meta-api-grpc-web-js",
protos = [

View File

@ -33,20 +33,8 @@ message TypedDocumentRequest {
int64 user_id = 5;
}
message PutTypedDocumentResponse {}
message GetViewResponse {
nexus.models.proto.TypedDocument typed_document = 1;
string filedata = 2;
string filename = 3;
string filesize = 4;
string first_authors = 5;
string locator = 6;
}
service Documents {
rpc get (TypedDocumentRequest) returns (nexus.models.proto.TypedDocument) {}
rpc get_view (TypedDocumentRequest) returns (GetViewResponse) {}
rpc roll (RollRequest) returns (RollResponse) {}
rpc top_missed (TopMissedRequest) returns (TopMissedResponse) {}
}

View File

@ -21,7 +21,7 @@ DESCRIPTOR = _descriptor.FileDescriptor(
syntax='proto3',
serialized_options=None,
create_key=_descriptor._internal_create_key,
serialized_pb=b'\n,nexus/meta_api/proto/documents_service.proto\x12\x14nexus.meta_api.proto\x1a\'nexus/models/proto/typed_document.proto\"D\n\x0bRollRequest\x12\x10\n\x08language\x18\x01 \x01(\t\x12\x12\n\nsession_id\x18\x02 \x01(\t\x12\x0f\n\x07user_id\x18\x03 \x01(\x03\"#\n\x0cRollResponse\x12\x13\n\x0b\x64ocument_id\x18\x01 \x01(\x04\"X\n\x10TopMissedRequest\x12\x0c\n\x04page\x18\x01 \x01(\r\x12\x11\n\tpage_size\x18\x02 \x01(\r\x12\x12\n\nsession_id\x18\x03 \x01(\t\x12\x0f\n\x07user_id\x18\x04 \x01(\x03\"a\n\x11TopMissedResponse\x12:\n\x0ftyped_documents\x18\x01 \x03(\x0b\x32!.nexus.models.proto.TypedDocument\x12\x10\n\x08has_next\x18\x02 \x01(\x08\"r\n\x14TypedDocumentRequest\x12\x0e\n\x06schema\x18\x01 \x01(\t\x12\x13\n\x0b\x64ocument_id\x18\x02 \x01(\x04\x12\x10\n\x08position\x18\x03 \x01(\r\x12\x12\n\nsession_id\x18\x04 \x01(\t\x12\x0f\n\x07user_id\x18\x05 \x01(\x03\"\x1a\n\x18PutTypedDocumentResponse\"u\n\x0fGetViewResponse\x12\x39\n\x0etyped_document\x18\x01 \x01(\x0b\x32!.nexus.models.proto.TypedDocument\x12\x10\n\x08\x66ilename\x18\x02 \x01(\t\x12\x15\n\rfirst_authors\x18\x03 \x01(\t2\xf6\x02\n\tDocuments\x12V\n\x03get\x12*.nexus.meta_api.proto.TypedDocumentRequest\x1a!.nexus.models.proto.TypedDocument\"\x00\x12_\n\x08get_view\x12*.nexus.meta_api.proto.TypedDocumentRequest\x1a%.nexus.meta_api.proto.GetViewResponse\"\x00\x12O\n\x04roll\x12!.nexus.meta_api.proto.RollRequest\x1a\".nexus.meta_api.proto.RollResponse\"\x00\x12_\n\ntop_missed\x12&.nexus.meta_api.proto.TopMissedRequest\x1a\'.nexus.meta_api.proto.TopMissedResponse\"\x00\x62\x06proto3'
serialized_pb=b'\n,nexus/meta_api/proto/documents_service.proto\x12\x14nexus.meta_api.proto\x1a\'nexus/models/proto/typed_document.proto\"D\n\x0bRollRequest\x12\x10\n\x08language\x18\x01 \x01(\t\x12\x12\n\nsession_id\x18\x02 \x01(\t\x12\x0f\n\x07user_id\x18\x03 \x01(\x03\"#\n\x0cRollResponse\x12\x13\n\x0b\x64ocument_id\x18\x01 \x01(\x04\"X\n\x10TopMissedRequest\x12\x0c\n\x04page\x18\x01 \x01(\r\x12\x11\n\tpage_size\x18\x02 \x01(\r\x12\x12\n\nsession_id\x18\x03 \x01(\t\x12\x0f\n\x07user_id\x18\x04 \x01(\x03\"a\n\x11TopMissedResponse\x12:\n\x0ftyped_documents\x18\x01 \x03(\x0b\x32!.nexus.models.proto.TypedDocument\x12\x10\n\x08has_next\x18\x02 \x01(\x08\"r\n\x14TypedDocumentRequest\x12\x0e\n\x06schema\x18\x01 \x01(\t\x12\x13\n\x0b\x64ocument_id\x18\x02 \x01(\x04\x12\x10\n\x08position\x18\x03 \x01(\r\x12\x12\n\nsession_id\x18\x04 \x01(\t\x12\x0f\n\x07user_id\x18\x05 \x01(\x03\x32\x95\x02\n\tDocuments\x12V\n\x03get\x12*.nexus.meta_api.proto.TypedDocumentRequest\x1a!.nexus.models.proto.TypedDocument\"\x00\x12O\n\x04roll\x12!.nexus.meta_api.proto.RollRequest\x1a\".nexus.meta_api.proto.RollResponse\"\x00\x12_\n\ntop_missed\x12&.nexus.meta_api.proto.TopMissedRequest\x1a\'.nexus.meta_api.proto.TopMissedResponse\"\x00\x62\x06proto3'
,
dependencies=[nexus_dot_models_dot_proto_dot_typed__document__pb2.DESCRIPTOR,])
@ -257,86 +257,12 @@ _TYPEDDOCUMENTREQUEST = _descriptor.Descriptor(
serialized_end=521,
)
_PUTTYPEDDOCUMENTRESPONSE = _descriptor.Descriptor(
name='PutTypedDocumentResponse',
full_name='nexus.meta_api.proto.PutTypedDocumentResponse',
filename=None,
file=DESCRIPTOR,
containing_type=None,
create_key=_descriptor._internal_create_key,
fields=[
],
extensions=[
],
nested_types=[],
enum_types=[
],
serialized_options=None,
is_extendable=False,
syntax='proto3',
extension_ranges=[],
oneofs=[
],
serialized_start=523,
serialized_end=549,
)
_GETVIEWRESPONSE = _descriptor.Descriptor(
name='GetViewResponse',
full_name='nexus.meta_api.proto.GetViewResponse',
filename=None,
file=DESCRIPTOR,
containing_type=None,
create_key=_descriptor._internal_create_key,
fields=[
_descriptor.FieldDescriptor(
name='typed_document', full_name='nexus.meta_api.proto.GetViewResponse.typed_document', index=0,
number=1, type=11, cpp_type=10, label=1,
has_default_value=False, default_value=None,
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key),
_descriptor.FieldDescriptor(
name='filename', full_name='nexus.meta_api.proto.GetViewResponse.filename', index=1,
number=2, type=9, cpp_type=9, label=1,
has_default_value=False, default_value=b"".decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key),
_descriptor.FieldDescriptor(
name='first_authors', full_name='nexus.meta_api.proto.GetViewResponse.first_authors', index=2,
number=3, type=9, cpp_type=9, label=1,
has_default_value=False, default_value=b"".decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key),
],
extensions=[
],
nested_types=[],
enum_types=[
],
serialized_options=None,
is_extendable=False,
syntax='proto3',
extension_ranges=[],
oneofs=[
],
serialized_start=551,
serialized_end=668,
)
_TOPMISSEDRESPONSE.fields_by_name['typed_documents'].message_type = nexus_dot_models_dot_proto_dot_typed__document__pb2._TYPEDDOCUMENT
_GETVIEWRESPONSE.fields_by_name['typed_document'].message_type = nexus_dot_models_dot_proto_dot_typed__document__pb2._TYPEDDOCUMENT
DESCRIPTOR.message_types_by_name['RollRequest'] = _ROLLREQUEST
DESCRIPTOR.message_types_by_name['RollResponse'] = _ROLLRESPONSE
DESCRIPTOR.message_types_by_name['TopMissedRequest'] = _TOPMISSEDREQUEST
DESCRIPTOR.message_types_by_name['TopMissedResponse'] = _TOPMISSEDRESPONSE
DESCRIPTOR.message_types_by_name['TypedDocumentRequest'] = _TYPEDDOCUMENTREQUEST
DESCRIPTOR.message_types_by_name['PutTypedDocumentResponse'] = _PUTTYPEDDOCUMENTRESPONSE
DESCRIPTOR.message_types_by_name['GetViewResponse'] = _GETVIEWRESPONSE
_sym_db.RegisterFileDescriptor(DESCRIPTOR)
RollRequest = _reflection.GeneratedProtocolMessageType('RollRequest', (_message.Message,), {
@ -374,20 +300,6 @@ TypedDocumentRequest = _reflection.GeneratedProtocolMessageType('TypedDocumentRe
})
_sym_db.RegisterMessage(TypedDocumentRequest)
PutTypedDocumentResponse = _reflection.GeneratedProtocolMessageType('PutTypedDocumentResponse', (_message.Message,), {
'DESCRIPTOR' : _PUTTYPEDDOCUMENTRESPONSE,
'__module__' : 'nexus.meta_api.proto.documents_service_pb2'
# @@protoc_insertion_point(class_scope:nexus.meta_api.proto.PutTypedDocumentResponse)
})
_sym_db.RegisterMessage(PutTypedDocumentResponse)
GetViewResponse = _reflection.GeneratedProtocolMessageType('GetViewResponse', (_message.Message,), {
'DESCRIPTOR' : _GETVIEWRESPONSE,
'__module__' : 'nexus.meta_api.proto.documents_service_pb2'
# @@protoc_insertion_point(class_scope:nexus.meta_api.proto.GetViewResponse)
})
_sym_db.RegisterMessage(GetViewResponse)
_DOCUMENTS = _descriptor.ServiceDescriptor(
@ -397,8 +309,8 @@ _DOCUMENTS = _descriptor.ServiceDescriptor(
index=0,
serialized_options=None,
create_key=_descriptor._internal_create_key,
serialized_start=671,
serialized_end=1045,
serialized_start=524,
serialized_end=801,
methods=[
_descriptor.MethodDescriptor(
name='get',
@ -410,20 +322,10 @@ _DOCUMENTS = _descriptor.ServiceDescriptor(
serialized_options=None,
create_key=_descriptor._internal_create_key,
),
_descriptor.MethodDescriptor(
name='get_view',
full_name='nexus.meta_api.proto.Documents.get_view',
index=1,
containing_service=None,
input_type=_TYPEDDOCUMENTREQUEST,
output_type=_GETVIEWRESPONSE,
serialized_options=None,
create_key=_descriptor._internal_create_key,
),
_descriptor.MethodDescriptor(
name='roll',
full_name='nexus.meta_api.proto.Documents.roll',
index=2,
index=1,
containing_service=None,
input_type=_ROLLREQUEST,
output_type=_ROLLRESPONSE,
@ -433,7 +335,7 @@ _DOCUMENTS = _descriptor.ServiceDescriptor(
_descriptor.MethodDescriptor(
name='top_missed',
full_name='nexus.meta_api.proto.Documents.top_missed',
index=3,
index=2,
containing_service=None,
input_type=_TOPMISSEDREQUEST,
output_type=_TOPMISSEDRESPONSE,

View File

@ -22,11 +22,6 @@ class DocumentsStub(object):
request_serializer=nexus_dot_meta__api_dot_proto_dot_documents__service__pb2.TypedDocumentRequest.SerializeToString,
response_deserializer=nexus_dot_models_dot_proto_dot_typed__document__pb2.TypedDocument.FromString,
)
self.get_view = channel.unary_unary(
'/nexus.meta_api.proto.Documents/get_view',
request_serializer=nexus_dot_meta__api_dot_proto_dot_documents__service__pb2.TypedDocumentRequest.SerializeToString,
response_deserializer=nexus_dot_meta__api_dot_proto_dot_documents__service__pb2.GetViewResponse.FromString,
)
self.roll = channel.unary_unary(
'/nexus.meta_api.proto.Documents/roll',
request_serializer=nexus_dot_meta__api_dot_proto_dot_documents__service__pb2.RollRequest.SerializeToString,
@ -48,12 +43,6 @@ class DocumentsServicer(object):
context.set_details('Method not implemented!')
raise NotImplementedError('Method not implemented!')
def get_view(self, request, context):
"""Missing associated documentation comment in .proto file."""
context.set_code(grpc.StatusCode.UNIMPLEMENTED)
context.set_details('Method not implemented!')
raise NotImplementedError('Method not implemented!')
def roll(self, request, context):
"""Missing associated documentation comment in .proto file."""
context.set_code(grpc.StatusCode.UNIMPLEMENTED)
@ -74,11 +63,6 @@ def add_DocumentsServicer_to_server(servicer, server):
request_deserializer=nexus_dot_meta__api_dot_proto_dot_documents__service__pb2.TypedDocumentRequest.FromString,
response_serializer=nexus_dot_models_dot_proto_dot_typed__document__pb2.TypedDocument.SerializeToString,
),
'get_view': grpc.unary_unary_rpc_method_handler(
servicer.get_view,
request_deserializer=nexus_dot_meta__api_dot_proto_dot_documents__service__pb2.TypedDocumentRequest.FromString,
response_serializer=nexus_dot_meta__api_dot_proto_dot_documents__service__pb2.GetViewResponse.SerializeToString,
),
'roll': grpc.unary_unary_rpc_method_handler(
servicer.roll,
request_deserializer=nexus_dot_meta__api_dot_proto_dot_documents__service__pb2.RollRequest.FromString,
@ -116,23 +100,6 @@ class Documents(object):
options, channel_credentials,
insecure, call_credentials, compression, wait_for_ready, timeout, metadata)
@staticmethod
def get_view(request,
target,
options=(),
channel_credentials=None,
call_credentials=None,
insecure=False,
compression=None,
wait_for_ready=None,
timeout=None,
metadata=None):
return grpc.experimental.unary_unary(request, target, '/nexus.meta_api.proto.Documents/get_view',
nexus_dot_meta__api_dot_proto_dot_documents__service__pb2.TypedDocumentRequest.SerializeToString,
nexus_dot_meta__api_dot_proto_dot_documents__service__pb2.GetViewResponse.FromString,
options, channel_credentials,
insecure, call_credentials, compression, wait_for_ready, timeout, metadata)
@staticmethod
def roll(request,
target,

View File

@ -3,8 +3,6 @@ import time
from grpc import StatusCode
from library.aiogrpctools.base import aiogrpc_request_wrapper
from nexus.meta_api.proto.documents_service_pb2 import \
GetViewResponse as GetViewResponsePb
from nexus.meta_api.proto.documents_service_pb2 import \
RollResponse as RollResponsePb
from nexus.meta_api.proto.documents_service_pb2 import \
@ -16,7 +14,6 @@ from nexus.meta_api.proto.documents_service_pb2_grpc import (
from nexus.models.proto.scimag_pb2 import Scimag as ScimagPb
from nexus.models.proto.typed_document_pb2 import \
TypedDocument as TypedDocumentPb
from nexus.views.telegram import parse_typed_document_to_view
from nexus.views.telegram.registry import pb_registry
from .base import BaseService
@ -52,7 +49,8 @@ class DocumentsService(DocumentsServicer, BaseService):
async def start(self):
add_DocumentsServicer_to_server(self, self.server)
async def _get_typed_document(self, request, context, metadata):
@aiogrpc_request_wrapper()
async def get(self, request, context, metadata) -> TypedDocumentPb:
document = await self.get_document(request.schema, request.document_id, metadata['request-id'], context)
if document.get('original_id'):
original_document = await self.get_document(
@ -101,24 +99,6 @@ class DocumentsService(DocumentsServicer, BaseService):
**{request.schema: document_pb},
)
@aiogrpc_request_wrapper()
async def get(self, request, context, metadata) -> TypedDocumentPb:
return await self._get_typed_document(request, context, metadata)
@aiogrpc_request_wrapper()
async def get_view(self, request, context, metadata) -> GetViewResponsePb:
typed_document = await self._get_typed_document(request, context, metadata)
view = parse_typed_document_to_view(typed_document)
return GetViewResponsePb(
typed_document=typed_document,
filedata=view.get_formatted_filedata(show_filesize=True),
filename=view.get_filename(),
filesize=view.get_formatted_filesize(),
first_authors=view.get_first_authors(),
locator=view.get_formatted_locator(),
)
@aiogrpc_request_wrapper()
async def roll(self, request, context, metadata):
random_id = await self.data_provider.random_id(request.language)

View File

@ -103,10 +103,11 @@ class Searcher(BaseService):
processor_response = None
cache_hit = True
page_size = request.page_size or 5
schemas = tuple(sorted([schema for schema in request.schemas]))
if (
(request.user_id, request.language, request.query) not in self.query_cache
or len(self.query_cache[(request.user_id, request.language, request.query)].scored_documents) == 0
(request.user_id, request.language, schemas, request.query) not in self.query_cache
or len(self.query_cache[(request.user_id, request.language, schemas, request.query)].scored_documents) == 0
):
cache_hit = False
query = despace_full(request.query)
@ -121,7 +122,7 @@ class Searcher(BaseService):
):
with attempt:
requests = []
for schema in request.schemas:
for schema in schemas:
requests.append(
self.summa_client.search(
schema=schema,
@ -149,7 +150,7 @@ class Searcher(BaseService):
)
search_response['scored_documents'] = rescored_documents
search_response_pb = self.cast_search_response(search_response)
self.query_cache[(request.user_id, request.language, request.query)] = search_response_pb
self.query_cache[(request.user_id, request.language, schemas, request.query)] = search_response_pb
logging.getLogger('query').info({
'action': 'request',
@ -162,12 +163,12 @@ class Searcher(BaseService):
'query': request.query,
'query_class': processor_response['class'].value if processor_response else None,
'request_id': metadata['request-id'],
'schemas': [schema for schema in request.schemas],
'schemas': schemas,
'session_id': request.session_id,
'user_id': request.user_id,
})
scored_documents = self.query_cache[(request.user_id, request.language, request.query)].scored_documents
scored_documents = self.query_cache[(request.user_id, request.language, schemas, request.query)].scored_documents
left_offset = request.page * page_size
right_offset = left_offset + page_size
has_next = len(scored_documents) > right_offset

View File

@ -1,5 +1,4 @@
load("@com_github_grpc_grpc//bazel:python_rules.bzl", "py_proto_library")
load("@rules_rust//proto:proto.bzl", "rust_proto_library")
load("@rules_proto//proto:defs.bzl", "proto_library")
load("@rules_proto_grpc//js:defs.bzl", "js_proto_library")
@ -17,13 +16,6 @@ py_proto_library(
deps = [":models_proto"],
)
rust_proto_library(
name = "models_proto_rust",
rust_deps = ["//rules/rust/cargo:protobuf"],
visibility = ["//visibility:public"],
deps = [":models_proto"],
)
js_proto_library(
name = "models_proto_js",
protos = [":models_proto"],

View File

@ -35,7 +35,7 @@ from tenacity import (
stop_after_attempt,
)
DEFAULT_USER_AGENT = 'PylonBot/1.0 (Linux x86_64) PylonBot/1.0.0'
DEFAULT_USER_AGENT = 'curl/7.68.0'
class KeepAliveClientRequest(ClientRequest):

View File

@ -0,0 +1,11 @@
load("@build_bazel_rules_nodejs//:index.bzl", "js_library")
js_library(
name = "js",
package_name = "nexus-views-js",
srcs = glob(["*.js"]),
visibility = ["//visibility:public"],
deps = [
"@npm//dateformat",
],
)

132
nexus/views/js/base.js Normal file
View File

@ -0,0 +1,132 @@
import { castStringToSingleString, quoteUrl } from './utils'
import { getIssuedDate } from './helpers'
export class BaseView {
constructor (dataPb) {
Object.assign(this, dataPb)
}
getFilename () {
const processedAuthor = castStringToSingleString((this.getFirstAuthors()).toLowerCase())
const processedTitle = castStringToSingleString(this.getRobustTitle()).toLowerCase()
const parts = []
if (processedAuthor) {
parts.push(processedAuthor)
}
if (processedTitle) {
parts.push(processedTitle)
}
let filename = parts.join('-')
if (!filename) {
if (this.doi) {
filename = quoteUrl(this.doi, '')
} else {
filename = this.md5
}
}
const year = getIssuedDate(this.issuedDate)
if (year) {
filename = `${filename}-${year}`
}
filename = filename.replace(/-+/g, '-')
return `${filename}.${this.extension}`
}
getExtension () {
if (this.extension) {
return this.extension
} else {
return 'pdf'
}
}
getFirstAuthors (etAl = true, firstNAuthors = 1) {
let etAlSuffix = ''
if (etAl) {
etAlSuffix = ' et al'
}
if (this.authorsList) {
if (this.authorsList.length > firstNAuthors) {
return this.authorsList.slice(0, firstNAuthors).join('; ') + etAlSuffix
} else if (this.authorsList.length === 1) {
if (this.authorsList[0].split(';').length - 1 >= 1) {
const commaAuthors = this.authorsList[0].split(';').map(function (el) {
return el.trim()
})
if (commaAuthors.length > firstNAuthors) {
return (commaAuthors.slice(0, firstNAuthors)).join('; ') + etAlSuffix
} else {
return commaAuthors.join('; ')
}
}
return this.authorsList[0]
} else {
return this.authorsList.join('; ')
}
} else {
return ''
}
}
getFormattedDatetime () {
if (this.issuedAt) {
const date = new Date(this.issuedAt * 1000)
const today = new Date()
const diffTime = Math.abs(date - today)
const diffDays = Math.ceil(diffTime / (1000 * 60 * 60 * 24))
if (diffDays < 365) {
return `${date.getUTCFullYear()}.${date.getUTCMonth()}`
} else {
return date.getUTCFullYear()
}
}
}
getFormattedFiledata () {
const parts = []
if (this.language) {
parts.push(this.language.toUpperCase())
}
parts.push(this.getExtension().toUpperCase())
if (this.filesize) {
parts.push(this.getFormattedFilesize())
}
return parts.join(' | ')
}
getFormattedFilesize () {
if (this.filesize) {
return (Math.max(1024, this.filesize) / (1024 * 1024)).toFixed(2) + 'Mb'
}
return ''
}
getIpfsMultihash () {
if (this.ipfsMultihashesList) {
return this.ipfsMultihashesList[0]
}
return ''
}
getTelegramLink () {
return `https://t.me/libgen_scihub_bot?start=${Buffer.from('NID: ' + this.id.toString()).toString('base64')}`
}
getRobustTitle () {
let result = this.title || ''
if (this.volume) {
if (this.title) {
result += ` ${this.volume}`
} else {
result += this.volume
}
}
return result
}
}

42
nexus/views/js/helpers.js Normal file
View File

@ -0,0 +1,42 @@
import dateFormat from 'dateformat'
export function getMegabytes (bytes) {
try {
if (bytes) {
return (bytes / (1024 * 1024)).toFixed(2) + ' Mb'
}
} catch {
return null
}
}
export function getIssuedDate (unixtime) {
if (!unixtime) return null
try {
return dateFormat(new Date(unixtime * 1000), 'yyyy')
} catch (e) {
console.error(e)
return null
}
}
export function getCoverUrl (cu, fictionId, libgenId, cuSuf, md5) {
if (cu) return cu
let r = ''
if (libgenId || fictionId) {
if (libgenId) {
const bulkId = (libgenId - (libgenId % 1000))
r = `covers/${bulkId}/${md5}`
} else if (fictionId) {
const bulkId = (fictionId - (fictionId % 1000))
r = `fictioncovers/${bulkId}/${md5}`
} else {
return null
}
}
if (cuSuf) {
r = r + `-${cuSuf}`
return `http://gen.lib.rus.ec/${r}.jpg`
}
return null
}

2
nexus/views/js/index.js Normal file
View File

@ -0,0 +1,2 @@
export { ScimagView } from './scimag'
export { ScitechView } from './scitech'

76
nexus/views/js/scimag.js Normal file
View File

@ -0,0 +1,76 @@
import { BaseView } from './base'
export class ScimagView extends BaseView {
schema = 'scimag'
icon = '🔬'
getFormattedLocator () {
const parts = []
if (this.authorsList) {
parts.push(this.getFirstAuthors(true, 3))
}
const journal = this.getRobustJournal()
if (journal) {
parts.push('in', journal)
}
const dt = this.getFormattedDatetime()
if (dt) {
parts.push(`(${dt})`)
}
if (this.getRobustVolume()) {
parts.push(this.getRobustVolume())
}
if (this.getPages()) {
parts.push(this.getPages())
}
return parts.join(' ')
}
getPages () {
if (this.firstPage) {
if (this.lastPage) {
if (this.firstPage === this.lastPage) {
return `p. ${this.firstPage}`
} else {
return `pp. ${this.firstPage}-${this.lastPage}`
}
} else {
return `p. ${this.firstPage}`
}
} else if (this.lastPage) {
return `p. ${this.lastPage}`
}
}
getRobustJournal () {
if (this.type !== 'chapter' && this.type !== 'book-chapter') {
return this.containerTitle
}
}
getRobustTitle () {
let result = this.title || this.doi
if (this.volume) {
if (this.type === 'chapter' || this.type === 'book-chapter') {
result += `in ${this.containerTitle} ${this.volume}`
} else {
result = this.volume
}
}
return result
}
getRobustVolume () {
if (this.volume) {
if (this.issue) {
return `vol. ${this.volume}(${this.issue})`
} else {
if (this.volume === parseInt(this.volume, 10)) {
return `vol. ${this.volume}`
} else {
return this.volume
}
}
}
}
}

21
nexus/views/js/scitech.js Normal file
View File

@ -0,0 +1,21 @@
import { BaseView } from './base'
export class ScitechView extends BaseView {
schema = 'scitech'
icon = '📚'
getFormattedLocator () {
const parts = []
if (this.authorsList) {
parts.push(this.getFirstAuthors(true, 3))
}
if (this.issuedAt) {
const date = new Date(this.issuedAt * 1000)
parts.push(`(${date.getUTCFullYear()})`)
}
if (this.pages) {
parts.push(`pp. ${self.pages}`)
}
return parts.join(' ')
}
}

24
nexus/views/js/utils.js Normal file
View File

@ -0,0 +1,24 @@
const ALNUMWHITESPACE_REGEX = /([^\s\w])/gu
const MULTIWHITESPACE_REGEX = /\s+/g
export function castStringToSingleString (s) {
let processed = s.replace(ALNUMWHITESPACE_REGEX, ' ')
processed = processed.replace(MULTIWHITESPACE_REGEX, '-')
return processed
}
export function quoteUrl (url, safe) {
if (typeof (safe) !== 'string') {
safe = '/'
}
url = encodeURIComponent(url)
const toUnencode = []
for (let i = safe.length - 1; i >= 0; --i) {
const encoded = encodeURIComponent(safe[i])
if (encoded !== safe.charAt(i)) {
toUnencode.push(encoded)
}
}
url = url.replace(new RegExp(toUnencode.join('|'), 'ig'), decodeURIComponent)
return url
}

View File

@ -0,0 +1,20 @@
# New Conditions
We have silently crossed Rubicon. The Internet entered in our life and now it has become an integral and essential part of our lives.
It multiplied our powers and also it multiplied dangers we are put under.
## Technological Leviathan
Starting from 2010s there are rising tensions on the digital frontiers. The Internet that has been created to unite people across the world now is dissipating into divided islands. Rules of these dissected pieces are dictated by those who is hunger to manage and control for the sake of their own stability but oftenly not for the sake of who are hunger to learn and move humankind forward.
Here just a few attacks on freedom to mention:
- Great Firewall of China that is banning the entire country out of presence in the world
- US Corporations that taking responsibility of judging what is good and evil using full power of their technologies and de-facto applying laws of USA extraterritorially
- Russia that is moving rapidly on the Chinese path in her attempts to border Internet traffic. The ultimate goal is spreading lies and propaganda inside and to outside to keep people ignorant.
Many moves that governments make means that we are considered not sane enough to live in the digital world.
## Continuous Education
Increasing demand of educated persons is tightly linked with the accessibilty of knowledge corpus. World has changed and data flows
had been speeded up. We won't be able to rely heavily on classical forms of education like learning for fixed-time in universities further.

View File

@ -1,4 +1,4 @@
# Agenda v.0.1
# Roadmap v.0.1
This paper is composed of lifetime goals for Nexus STC (Standard Template Construct).
@ -6,7 +6,7 @@ Although many of goals looks complex and faraway I strongly believe that we will
#### Legend
- (*) Big theoretical task
- (E) Perhaps non-essential but worth to try
- (E) Non-essential but still worth to try
## Accessibility of Science
@ -26,25 +26,16 @@ Although many of goals looks complex and faraway I strongly believe that we will
- Create Onion configuration
- Discuss the possibility of switching original LibGen backend to Nexus
#### Community
- Announce goals widely
- Write and maintain documentation in clean English language
### Data Accessibility
#### Infrastructure
- Putting scimag collection onto IPFS
- Announce data dumps for both scitech and scimag collections
- Pinning feature in the app that will allow to users pinning subset of the collection in an easy way
- Pinning feature in the app that will allow users to pin subset of the collection in an easy way
- (*) Consider various **reliable** ways to announce new releases of **initial** data dumps
- Maintain and curate the list of already publicly available journals in Pylon
#### Community
- Encourage people to pin in ideological, social and competitionus ways
### Decentilized Publishing
#### Search Server Prerequesties
@ -73,7 +64,7 @@ Although many of goals looks complex and faraway I strongly believe that we will
### References
- Maintain graph statistics (at least PageRank) in Summa/Meta API
- Reference links in Cognitron Web
- Clickable reference links in Cognitron Web (as in the bot)
### Entity Extraction

View File

@ -0,0 +1,13 @@
# Community
Technological Leviathan has already usurped the biggest part of our technological and scientific achivements.
Thus confronting him in a non-public manner has little chances to win. Only spreading ideas of the vital necessity of equal, free and comfortable access to the knowledge into wide layers of people can lead to the real shift.
The ultimate goal is a wide acceptance of the idea that knowledge has no master and it is much more beneficial for all of us to have freely accessible and searchable corpus of already discovered knowledge.
Putting aside dark sides of what big tech companies are doing right now, they also have democratized access to the Internet but still incapable to do it for valuable parts of knowledge corpus due to various technological and legal issues.
- Announcing goals widely
- Write and maintain documentation in clean English language
- Encourage people to participate in spreading by ideological and social ways

View File

@ -1,20 +1,4 @@
# New Conditions
We have silently crossed Rubicon. The Internet entered in our life and now it has become an integral and essential part of our lives.
It multiplied our powers and also it multiplied dangers we are put under.
## Technological Leviathan
Starting from 2010s there are rising tensions on the digital frontiers. The Internet that has been created to unite people across the world now is dissipating into divided islands. Rules of these dissected pieces are dictated by those who is hunger to manage and control for the sake of their own stability but oftenly not for the sake of who are hunger to learn and move humankind forward.
Here just a few attacks on freedom to mention:
- Great Firewall of China that is banning the entire country out of presence in the world
- US Corporations that taking responsibility of judging what is good and evil using full power of their technologies and de-facto applying laws of USA extraterritorially
- Russia that is moving rapidly on the Chinese path in her attempts to border Internet traffic. The ultimate goal is spreading lies and propaganda inside and to outside to keep people ignorant.
Many moves that governments make means that we are considered not sane enough to live in the digital world of information.
Is this a fate we are destined to live with?
## Freedom Armory
There is a plenty of projects that are in need of your time or donation support to keep fighting against digital borders:

View File

@ -0,0 +1,22 @@
## Prerequisite
Install system packages for various OSes:
```shell script
sudo ./repository/install-packages.sh
```
### Ubuntu 20.04
#### Docker
[Installation Guide](https://docs.docker.com/engine/install/ubuntu/)
#### IPFS
[Installation Guide](https://docs.ipfs.io/install/)
### MacOS
#### Docker
[Installation Guide](https://docs.docker.com/docker-for-mac/install/)
#### IPFS
[Installation Guide](https://docs.ipfs.io/install/)

View File

@ -21,6 +21,7 @@
"css-loader": "^5.0.1",
"dateformat": "^4.4.1",
"deepmerge": "^4.2.2",
"electron": "^12.0.5",
"eslint": "^7.17.0",
"eslint-config-standard": "^16.0.2",
"eslint-plugin-import": "^2.22.1",
@ -36,6 +37,7 @@
"grpc-web": "^1.2.1",
"html-entities": "^2.3.2",
"html-webpack-plugin": "^5.3.1",
"ipfs-http-client": "^49.0.4",
"js-cookie": "^2.2.1",
"lodash": "^4.17.20",
"loglevel": "^1.7.1",

File diff suppressed because it is too large Load Diff