Merge pull request #41 from the-superpirate/master

- fix(nexus): Fix DOI detection in messages …
This commit is contained in:
the-superpirate 2021-05-03 15:01:34 +03:00 committed by GitHub
commit 7db858386f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
7 changed files with 52 additions and 36 deletions

View File

@ -1,5 +1,9 @@
# Nexus Search: Meta API
Meta API is a wrapper for Summa Search server that merges results from several sources,
reorders search results using ML and expands queries for getting more appropriate results for
search queries.
```
NEXUS_META_API_summa.url=http://summa bazel run -c opt binary
NEXUS_META_API_summa.url=http://summa bazel run binary
```

View File

@ -9,11 +9,6 @@ message ScoredDocument {
uint32 position = 3;
}
message SearchResponse {
repeated ScoredDocument scored_documents = 1;
bool has_next = 2;
}
message SearchRequest {
repeated string schemas = 1;
string query = 2;
@ -22,6 +17,11 @@ message SearchRequest {
string language = 5;
}
message SearchResponse {
repeated ScoredDocument scored_documents = 1;
bool has_next = 2;
}
service Search {
rpc search (SearchRequest) returns (SearchResponse) {}
}

View File

@ -5,6 +5,7 @@ from nexus.nlptools.regex import (
DOI_REGEX,
ISBN_REGEX,
NID_REGEX,
ONLY_DOI_REGEX,
URL_REGEX,
)
@ -20,10 +21,10 @@ class QueryClass(Enum):
def check_doi(query) -> (QueryClass, str):
# ToDo: rewrite normally, just hotfixed
if query.startswith('references:'):
return
if r := re.search(DOI_REGEX, query):
if (
((r := re.search(DOI_REGEX, query)) and re.search(URL_REGEX, query))
or re.search(ONLY_DOI_REGEX, query)
):
doi = (r[1] + '/' + r[2]).lower()
return {
'doi': doi,

View File

@ -29,4 +29,5 @@ DOI_REGEX = re.compile(r'(10.\d{4,9})\s?/\s?([-._;()<>/:A-Za-z0-9]+[^.?\s])')
ISBN_REGEX = re.compile(r'^(?:[iI][sS][bB][nN]\:?\s*)?((97(8|9))?\-?\d{9}(\d|X))$')
MD5_REGEX = re.compile(r'([A-Fa-f0-9]{32})')
NID_REGEX = re.compile(r'(?:[Nn][Ii][Dd]\s?:?\s*)([0-9]+)')
ONLY_DOI_REGEX = re.compile(r'^(10.\d{4,9})\s?/\s?([-._;()<>/:A-Za-z0-9]+[^.?\s])$')
PUBMED_ID_REGEX = re.compile(r'(?:(?:https?://)?(?:www.)?ncbi.nlm.nih.gov/pubmed/|[Pp][Mm][Ii][Dd]\s?:?\s*)([0-9]+)')

View File

@ -3,18 +3,37 @@
We have silently crossed Rubicon. The Internet entered in our life and now it has become an integral and essential part of our lives.
It multiplied our powers and also it multiplied dangers we are put under.
Here I'd like to consider two issues that looks important to me, interlinked and could be reached by the single movement.
## Scientific Frontiers and Automation of the Science
For centuries the primary form of scholarly communication has been publishing researches in peer-reviewed journals and books.
Thus, every further move in the science naturally stood and stand on the shoulders of predecessors. Our intrinsic endeavor to explore and to learn and to share learnt were born much earlier then we had ever recognized ourselves as human. The fact that we must admit is that knowledge discovery has had never shared anything with market laws
but just collaborated with them for its own prosperity.
It is the main reason why recent decades in academic publishing have passed under the shadow of ongoing controversies. The advent of the Internet was a game changing event that opened the way to reduce paper costs down to zero and relief financial pressure on the science. Nevertheless, practicies of the past times still keeping us in chains of outdated copyright laws and of publishers monopoly.
While humankind is already armed with state-of-the-are technologies to process enormous amounts of textual data we are still not applying this power due to presence of these burden chains.
Currently we have a unique historical chance to align scientific movement with its natural order.
Organizing the continuous flow of knowledge available for everybody and in the name of our prosperity is the first-class goal that
should be solved in short terms.
## Technological Leviathan
Starting from 2010s there are rising tensions on the digital frontiers. The Internet that has been created to unite people across the world now is dissipating into divided islands. Rules of these dissected pieces are dictated by those who is hunger to manage and control for the sake of their own stability but oftenly not for the sake of who are hunger to learn and move humankind forward.
Starting from 2010s there are rising tensions on the digital frontiers.
The Internet that has been created to unite people across the world now is dissipating into divided islands.
Rules of these dissected pieces are dictated by those who is hunger to manage and control
for the sake of their own stability but often not for the sake of who are hunger to learn and move humankind forward.
Here just a few attacks on freedom to mention:
- Great Firewall of China that is banning the entire country out of presence in the world
- US Corporations that taking responsibility of judging what is good and evil using full power of their technologies and de-facto applying laws of USA extraterritorially
- Russia that is moving rapidly on the Chinese path in her attempts to border Internet traffic. The ultimate goal is spreading lies and propaganda inside and to outside to keep people ignorant.
Many moves that governments make means that we are considered not sane enough to live in the digital world.
What is more important is that these actions are disrupting natural flows of knowledge exchange.
Wide adoption of new decentralized technologies like IPFS, libp2p, distributed routing schemas (as cjdns/yggdrasil)
is essential for science flourishing in the world ruled and crowded by ignorant persons.
## Continuous Education
Increasing demand of educated persons is tightly linked with the accessibilty of knowledge corpus. World has changed and data flows
had been speeded up. We won't be able to rely heavily on classical forms of education like learning for fixed-time in universities further.

View File

@ -1,13 +1,14 @@
# Community
# Community Roadmap v.0.1
Technological Leviathan has already usurped the biggest part of our technological and scientific achivements.
## Documenting
Thus confronting him in a non-public manner has little chances to win. Only spreading ideas of the vital necessity of equal, free and comfortable access to the knowledge into wide layers of people can lead to the real shift.
- Write and maintain documentation in clean English language
The ultimate goal is a wide acceptance of the idea that knowledge has no master and it is much more beneficial for all of us to have freely accessible and searchable corpus of already discovered knowledge.
Putting aside dark sides of what big tech companies are doing right now, they also have democratized access to the Internet but still incapable to do it for valuable parts of knowledge corpus due to various technological and legal issues.
## Finding Participants
- Announcing goals widely
- Write and maintain documentation in clean English language
- Encourage people to participate in spreading by ideological and social ways
## Data Choarding
- Establish reliable connections with persons who are capable to host and seed large amounts of data

View File

@ -1,10 +0,0 @@
## Freedom Armory
There is a plenty of projects that are in need of your time or donation support to keep fighting against digital borders:
- [Library Genesis](https://libgen.fun) [[1]](http://libgen.rs) [[2]](https://t.me/libgen_scihub_bot) - the biggest scientific library in the world
- [Sci-Hub](https://sci-hub.do) [[1]](https://t.me/libgen_scihub_bot) - project aimed to make scientific knowledge accessible for everybody
- [IPFS](https://ipfs.io) - user-friendly replacement for torrent technology allowing you to exchange files without possibility for copyretards to ban exchange
- [TOR](https://www.torproject.org) / [I2P](https://geti2p.net) - tools for improving your anonymity in the Internet by hiding your IPs and other traits that could deanonymize you
- [Yggdrasil](https://yggdrasil-network.github.io) / [Cjdns](https://github.com/cjdelisle/cjdns) - tools for allowing you to route your Internet packages without relying on centralized state-controlled equipment. It could be useful to encounter Internet connectivity disruptions arranged by governments. It also allowes you to create mesh networks with your neighbors for keeping high connectivity. It would be useful in densely populated areas or even during peaceful demonstrations.
- [Nexus STC](https://github.com/nexus-stc/hyperboria) that is aimed to store important data and make them searchable.