Newer
Older
Recall that {{site.title}} maintains [a bouquet of services](services/); here we air some technical concerns and ideas.
We have a [{{site.title}} group](http://gl.kwarc.info/SIGMathLing) on the [GitLab](https://en.wikipedia.org/wiki/GitLab) server [gl.kwarc.info](http://gl.kwarc.info), where we will start making repositories on.
This allows us to use Git permissions for access control and the GitLab permission UI for management.
We estimate that for the first two years {{site.title}} will have below 25 members (reducing the traffic) and below 5 TB data sets.
gl.kwarc.info should be able to serve that given that most data sets will be served via [Git LFS](https://git-lfs.github.com/).
Should space or traffic become a problem for the KWARC servers to handle, we will try to raise money for a more scalable solution.
We will also have a close look at [Zenodo](http://zenodo.org) and see whether we can delegate hosting to them.
### Standardizing Datasets and Resources
We will need to develop standards for representing, classifying, describing, and citing data sets and reources.
1. *Representation*: file formats, repository layout, data models
2. *Classification/description*: is the dataset
* a corpus (raw, processed, ...),
* a set of annotations to a corpus,
* automatically/automatically created, by which process/system?
* an evaluation data set (gold standard)?
* what is the quality? f-measure,
* what is the license.
3. *Citation* The idea is to have a "landing page per resourcer that address all
the points in 1. and 2. as well as the authors that can be cited. The landing page
should also have pre-made bibTeX (and possibly EndNote) entries to make citations
easier.
Currently, this is just a manually curated [page on the {{site.title}} web site](/resources/), eventually we will statically generate it from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent.
Currently, this is just a manually curated [page on the {{site.title}} web site](/systems/), eventually we will statically generate it from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent.
### Math Analysis Blackboard
MK would like develop and publish an annotation schema (using the KAT schema as a starting point) and establish a math result triple store that manages all of these. Technical details are still open how best to do this, but Deyan is quite skeptical.