Skip to content
Snippets Groups Projects
Commit f922d0e9 authored by Michael Kohlhase's avatar Michael Kohlhase
Browse files

more

parent 8cf5b89f
No related branches found
No related tags found
No related merge requests found
......@@ -9,19 +9,18 @@
</div>
<nav class="sidebar-nav">
{% include sidebar-nav-item.html url="/" title="Home" %},<br/>
{% include sidebar-nav-item.html url="/documents/" title="Documents"%},
{% include sidebar-nav-item.html url="/" title="Home" %},
{% include sidebar-nav-item.html url="/resources/" title="Resources"%},<br/>
{% include sidebar-nav-item.html url="/development/" title="Development" %},
{% include sidebar-nav-item.html url="/projects/" title="Projects"%},
{% include sidebar-nav-item.html url="/software/" title="Software &amp; Tools"%}.<br/>
{% include sidebar-nav-item.html url="/development/" title="Devel." %},
{% include sidebar-nav-item.html url="/systems/" title="Software/Tools"%}.<br/>
{% include sidebar-nav-item.html url="/news/" title="News" %},
{% include sidebar-nav-item.html url="/follow/" title="Follow &amp; Contact us" %}
{% include sidebar-nav-item.html url="/follow/" title="Contact" %},
<span class="sidebar-nav-item">
<a href="{{ site.baseurl }}/atom.xml"><img class="icon" src="{{ site.baseurl }}/public/feed_w.png" alt="atom feed"/></a>
<a href="https://twitter.com/{{ site.author.twitter }}"><img class="icon" src="{{ site.baseurl }}/public/twitter_w.png" alt="twitter"/></a>
<a href="https://github.com/{{ site.github.owner_name }}"><img class="icon" src="{{ site.baseurl }}/public/github_w.png" alt="github"/></a>
<!-- <a href="https://twitter.com/{{ site.author.twitter }}"><img class="icon"
src="{{ site.baseurl }}/public/twitter_w.png" alt="twitter"/></a> -->
<a href="https://github.com/{{ site.github.owner_name }}"><img class="icon" src="{{ site.baseurl }}/public/gitlab.png" alt="github"/></a>
</span><br/>
<a class="sidebar-nav-item" href="{{ site.github.zip_url | replace: 'zipball',
......
---
layout: page
title: SIGMathLing Resources
title: SIGMathLing - Datasets and Resources
---
none yet, but see the [plan](/techical/)
......@@ -6,8 +6,7 @@ title: SIGMathLing Servcies
SIGMathLing provides its member and the research community with a set of services to meet its [objectives](objectives). These are jointly funded and maintained by SIGMathLing members ([technical concerns](technical/)
In particular, SIGMathLing maintains
1. a system of repositories for math linguistics resources.
2. the math analysis blackboard, i.e. an information system, where analysis results
4. a public citable reference page of all the math linguistic resources (just descriptions; not necessary public access for non-members).
5. a suite of systems and libraries
6. internal and outreach communication channels.
1. a system of repositories for math linguistics datasets an resources.
2. a public citable reference page of all the math linguistic resources (just descriptions; not necessary public access for non-members).
3. a suite of systems and libraries
4. internal and outreach communication channels.
......@@ -3,14 +3,46 @@ layout: page
title: Technical Concerns
---
Recall that SIGMathLing maintains [a bouquet of services](services/); here we air some
technical concerns and ideas.
1. a system of **resource repositories**. MK: I would just make a GitHub/Lab organization and somehow pay for their services or use our KWARC GitLab. Git LFS should help us deal with the large files involved and Git would take care of permission management.
2. the **math analysis blackboard** I would develop and publish an annotation schema
(using the KAT schema as a starting point) and establish a math result triple store
that manages all of these. Technical details are still open how best to do this, but I am sure Deyan has some ideas.
3. A **web site**, MK: I would go via GitHub/GitLab pages and jekyll, that makes communal
development
4. a **resource reference page**: MK, this is just a page on the web page, probably automatically generated from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent.
5. a **suite of systems and libraries**: Initially, this will be a page on the website with links to their repositories (the LlaMaPuN library, CorTeX, KaT, .... ), mostly by reference to public resources.
6. **communication channels**: we start out with a members mailing list, a public atom feed for announcments (from the web site), later there may even be a regular newsletter that digests these.
Recall that SIGMathLing maintains [a bouquet of services](services/); here we air some technical concerns and ideas.
### Resource Repositories
We have a [SIGMathLing group](http://gl.kwarc.info/SIGMathLing) on the [GitLab](https://en.wikipedia.org/wiki/GitLab) server [gl.kwarc.info](http://gl.kwarc.info), where we will start making repositories on.
This allows us to use Git permissions for access control and the GitLab permission UI for management.
We estimate that for the first two years SIGMathLing will have below 25 members (reducing the traffic) and below 5 TB data sets.
gl.kwarc.info should be able to serve that given that most data sets will be served via [Git LFS](https://git-lfs.github.com/).
Should space or traffic become a problem for the KWARC servers to handle, we will try to raise money for a more scalable solution.
We will also have a close look at [Zenodo](http://zenodo.org) and see whether we can delegate hosting to them.
### Standardizing Datasets and Resources
We will need to develop standards for representing, classifying, describing, and citing data sets and reources.
1. *Representation*: file formats, repository layout, data models
2. *Classification/description*: is the dataset
* a corpus (raw, processed, ...),
* a set of annotations to a corpus,
* automatically/automatically created, by which process/system?
* an evaluation data set (gold standard)?
* what is the quality? f-measure,
* what is the license.
3. how to cite them.
### Resource Reference Page
Currently, this is just a manually curated [page on the SIGMathLing web site](/resources/), eventually we will statically generate it from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent.
### Suite of Systems and Libraries
Currently, this is just a manually curated [page on the SIGMathLing web site](/systems/), eventually we will statically generate it from an internal data base of resources and/or harvested from the repositories. Licensing should be made transparent.
Initially, this will be a page on the website with links to their repositories (the LlaMaPuN library, CorTeX, KaT, .... ), mostly by reference to public resources.
### Communication Channels
We start out with a members mailing list, a public atom feed for announcments (from the web site), later there may even be a regular newsletter that digests these.
### Math Analysis Blackboard
MK would like develop and publish an annotation schema (using the KAT schema as a starting point) and establish a math result triple store that manages all of these. Technical details are still open how best to do this, but Deyan is quite skeptical.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment