--- layout: page title: SigMathLing - Technical Concerns --- Recall that {{site.title}} maintains [a bouquet of services](services/); here we air some technical concerns and ideas. ### Resource Repositories We have a [{{site.title}} group](http://gl.kwarc.info/SIGMathLing) on the [GitLab](https://en.wikipedia.org/wiki/GitLab) server [gl.kwarc.info](http://gl.kwarc.info), where we have hosted a range of data repositories. This allows us to use Git permissions for access control and the GitLab permission UI for management. We estimate that for the first two years (2017-2019) {{site.title}} will have below 25 members (reducing the traffic) and below 5 TB data sets. gl.kwarc.info should be able to serve that given that most data sets will be served via [Git LFS](https://git-lfs.github.com/). Should space or traffic become a problem for the KWARC servers to handle, we will try to raise money for a more scalable solution. [Zenodo](http://zenodo.org) has officially turned down hosting the SIGMathLing resources due to the large volume of data, but we are open to exploring alternative providers - feel free to reach out! ### Standardizing Datasets and Resources We will need to develop standards for representing, classifying, describing, and citing data sets and reources. 1. *Representation*: file formats, repository layout, data models 2. *Classification/description*: is the dataset * a corpus (raw, processed, ...), * a set of annotations to a corpus, * automatically/automatically created, by which process/system? * an evaluation data set (gold standard)? * what is the quality? f-measure, * what is the license. 3. *Identification*: we are looking into obtaining a DOI data identifier for each resource 4. *Citation* The idea is to have a "landing page per resourcer that address all the points in 1. and 2. as well as the authors that can be cited. The landing page should also have pre-made bibTeX (and possibly EndNote) entries to make citations easier. ### Resource Reference Page Currently, this is just a manually curated [page on the {{site.title}} web site](/resources/), eventually we will statically generate it from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent. ### Suite of Systems and Libraries Currently, this is just a manually curated [page on the {{site.title}} web site](/systems/), eventually we will statically generate it from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent. ### Math Analysis Blackboard MK would like develop and publish an annotation schema (using the KAT schema as a starting point) and establish a math result triple store that manages all of these. Technical details are still open how best to do this, but Deyan is quite skeptical.