Skip to content
Snippets Groups Projects
technical.md 2.35 KiB
Newer Older
Michael Kohlhase's avatar
Michael Kohlhase committed
---
layout: page
Michael Kohlhase's avatar
Michael Kohlhase committed
title: SigMathLing - Technical Concerns
Michael Kohlhase's avatar
Michael Kohlhase committed
---
Michael Kohlhase's avatar
Michael Kohlhase committed
Recall that SIGMathLing maintains [a bouquet of services](services/); here we air some technical concerns and ideas.

### Resource Repositories

We have a [SIGMathLing group](http://gl.kwarc.info/SIGMathLing) on the [GitLab](https://en.wikipedia.org/wiki/GitLab) server [gl.kwarc.info](http://gl.kwarc.info), where we will start making repositories on.
This allows us to use Git permissions for access control and the GitLab permission UI for management.
We estimate that for the first two years SIGMathLing will have below 25 members (reducing the traffic) and below 5 TB data sets.
gl.kwarc.info should be able to serve that given that most data sets will be served via [Git LFS](https://git-lfs.github.com/).
Should space or traffic become a problem for the KWARC servers to handle, we will try to raise money for a more scalable solution.

We will also have a close look at [Zenodo](http://zenodo.org) and see whether we can delegate hosting to them. 

### Standardizing Datasets and Resources

We will need to develop standards for representing, classifying, describing, and citing data sets and reources.
1. *Representation*: file formats, repository layout, data models
2. *Classification/description*: is the dataset
  * a corpus (raw, processed, ...),
  * a set of annotations to a corpus,
  * automatically/automatically created, by which process/system?
  * an evaluation data set (gold standard)?
  * what is the quality? f-measure,
  * what is the license.
3. how to cite them. 

### Resource Reference Page

Currently, this is just a manually curated [page on the SIGMathLing web site](/resources/), eventually we will statically generate it from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent. 

### Suite of Systems and Libraries

Currently, this is just a manually curated [page on the SIGMathLing web site](/systems/), eventually we will statically generate it from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent. 

### Math Analysis Blackboard

MK would like develop and publish an annotation schema (using the KAT schema as a starting point) and establish a math result triple store that manages all of these. Technical details are still open how best to do this, but Deyan is quite skeptical.