Skip to content
Snippets Groups Projects
technical.md 2.6 KiB
Newer Older
Michael Kohlhase's avatar
Michael Kohlhase committed
---
layout: page
Michael Kohlhase's avatar
Michael Kohlhase committed
title: SigMathLing - Technical Concerns
Michael Kohlhase's avatar
Michael Kohlhase committed
---
Michael Kohlhase's avatar
Michael Kohlhase committed
Recall that {{site.title}} maintains [a bouquet of services](services/); here we air some technical concerns and ideas.
Michael Kohlhase's avatar
Michael Kohlhase committed

### Resource Repositories

Michael Kohlhase's avatar
Michael Kohlhase committed
We have a [{{site.title}} group](http://gl.kwarc.info/SIGMathLing) on the [GitLab](https://en.wikipedia.org/wiki/GitLab) server [gl.kwarc.info](http://gl.kwarc.info), where we will start making repositories on.
Michael Kohlhase's avatar
Michael Kohlhase committed
This allows us to use Git permissions for access control and the GitLab permission UI for management.
Michael Kohlhase's avatar
Michael Kohlhase committed
We estimate that for the first two years {{site.title}} will have below 25 members (reducing the traffic) and below 5 TB data sets.
Michael Kohlhase's avatar
Michael Kohlhase committed
gl.kwarc.info should be able to serve that given that most data sets will be served via [Git LFS](https://git-lfs.github.com/).
Should space or traffic become a problem for the KWARC servers to handle, we will try to raise money for a more scalable solution.

We will also have a close look at [Zenodo](http://zenodo.org) and see whether we can delegate hosting to them. 

### Standardizing Datasets and Resources

We will need to develop standards for representing, classifying, describing, and citing data sets and reources.
1. *Representation*: file formats, repository layout, data models
2. *Classification/description*: is the dataset
  * a corpus (raw, processed, ...),
  * a set of annotations to a corpus,
  * automatically/automatically created, by which process/system?
  * an evaluation data set (gold standard)?
  * what is the quality? f-measure,
  * what is the license.
Michael Kohlhase's avatar
Michael Kohlhase committed
3. *Citation*  The idea is to have a "landing page per resourcer that address all
   the points in 1. and 2. as well as the authors that can be cited. The landing page
   should also have pre-made bibTeX (and possibly EndNote) entries to make citations
   easier. 
Michael Kohlhase's avatar
Michael Kohlhase committed

### Resource Reference Page

Michael Kohlhase's avatar
Michael Kohlhase committed
Currently, this is just a manually curated [page on the {{site.title}} web site](/resources/), eventually we will statically generate it from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent. 
Michael Kohlhase's avatar
Michael Kohlhase committed

### Suite of Systems and Libraries

Michael Kohlhase's avatar
Michael Kohlhase committed
Currently, this is just a manually curated [page on the {{site.title}} web site](/systems/), eventually we will statically generate it from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent. 
Michael Kohlhase's avatar
Michael Kohlhase committed

### Math Analysis Blackboard

MK would like develop and publish an annotation schema (using the KAT schema as a starting point) and establish a math result triple store that manages all of these. Technical details are still open how best to do this, but Deyan is quite skeptical.