Skip to content
Snippets Groups Projects
technical.md 2.79 KiB
Newer Older
Michael Kohlhase's avatar
Michael Kohlhase committed
---
layout: page
Michael Kohlhase's avatar
Michael Kohlhase committed
title: SigMathLing - Technical Concerns
Michael Kohlhase's avatar
Michael Kohlhase committed
---
Michael Kohlhase's avatar
Michael Kohlhase committed
Recall that {{site.title}} maintains [a bouquet of services](services/); here we air some technical concerns and ideas.
Michael Kohlhase's avatar
Michael Kohlhase committed

### Resource Repositories

We have a [{{site.title}} group](http://gl.kwarc.info/SIGMathLing) on the [GitLab](https://en.wikipedia.org/wiki/GitLab) server [gl.kwarc.info](http://gl.kwarc.info), where we have hosted a range of data repositories.
Michael Kohlhase's avatar
Michael Kohlhase committed
This allows us to use Git permissions for access control and the GitLab permission UI for management.
We estimate that for the first two years (2017-2019) {{site.title}} will have below 25 members (reducing the traffic) and below 5 TB data sets.
Michael Kohlhase's avatar
Michael Kohlhase committed
gl.kwarc.info should be able to serve that given that most data sets will be served via [Git LFS](https://git-lfs.github.com/).
Should space or traffic become a problem for the KWARC servers to handle, we will try to raise money for a more scalable solution.

[Zenodo](http://zenodo.org) has officially turned down hosting the SIGMathLing resources due to the large volume of data, but we are open to exploring alternative providers - feel free to reach out!
Michael Kohlhase's avatar
Michael Kohlhase committed

### Standardizing Datasets and Resources

We will need to develop standards for representing, classifying, describing, and citing data sets and reources.
1. *Representation*: file formats, repository layout, data models
2. *Classification/description*: is the dataset
  * a corpus (raw, processed, ...),
  * a set of annotations to a corpus,
  * automatically/automatically created, by which process/system?
  * an evaluation data set (gold standard)?
  * what is the quality? f-measure,
  * what is the license.
3. *Identification*: we are looking into obtaining a DOI data identifier for each resource
4. *Citation*  The idea is to have a "landing page per resourcer that address all
Michael Kohlhase's avatar
Michael Kohlhase committed
   the points in 1. and 2. as well as the authors that can be cited. The landing page
   should also have pre-made bibTeX (and possibly EndNote) entries to make citations
Michael Kohlhase's avatar
Michael Kohlhase committed

### Resource Reference Page

Currently, this is just a manually curated [page on the {{site.title}} web site](/resources/), eventually we will statically generate it from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent.
Michael Kohlhase's avatar
Michael Kohlhase committed

### Suite of Systems and Libraries

Currently, this is just a manually curated [page on the {{site.title}} web site](/systems/), eventually we will statically generate it from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent.
Michael Kohlhase's avatar
Michael Kohlhase committed

### Math Analysis Blackboard

MK would like develop and publish an annotation schema (using the KAT schema as a starting point) and establish a math result triple store that manages all of these. Technical details are still open how best to do this, but Deyan is quite skeptical.