website merge requestshttps://gl.kwarc.info/SIGMathLing/website/-/merge_requests2018-01-24T18:58:37Zhttps://gl.kwarc.info/SIGMathLing/website/-/merge_requests/1Document 08.2017 arxlmiv dataset release2018-01-24T18:58:37ZDeyan GinevDocument 08.2017 arxlmiv dataset releaseHere is the first stab of the page I made, with some fields yet to be filled in (e.g. license, citation bibtex, 2 of the dataset info rows).
Feedback welcome!
I guess gitlab calls these "merge requests", will try to remember.Here is the first stab of the page I made, with some fields yet to be filled in (e.g. license, citation bibtex, 2 of the dataset info rows).
Feedback welcome!
I guess gitlab calls these "merge requests", will try to remember.Michael Kohlhasemichael.kohlhase@fau.deMichael Kohlhasemichael.kohlhase@fau.dehttps://gl.kwarc.info/SIGMathLing/website/-/merge_requests/2arxmliv 08.2018 release2018-09-25T14:47:47ZDeyan Ginevarxmliv 08.2018 releasearxmliv 08.2018 release:
* Ready with the dataset repository and description for SIGMathLing website
* Working on the embedding generation and release, at which point this PR can be mergedarxmliv 08.2018 release:
* Ready with the dataset repository and description for SIGMathLing website
* Working on the embedding generation and release, at which point this PR can be mergedDeyan GinevDeyan Ginevhttps://gl.kwarc.info/SIGMathLing/website/-/merge_requests/6Update embeddings description with "nomath" controls2019-07-20T02:54:50ZDeyan GinevUpdate embeddings description with "nomath" controlsI have added a vocabulary and GloVe embedding with all mathematics from the data discarded/ignored.
They have been used to evaluate the specific contribution of the math lexemes to statement classification models. I'm pushing all resour...I have added a vocabulary and GloVe embedding with all mathematics from the data discarded/ignored.
They have been used to evaluate the specific contribution of the math lexemes to statement classification models. I'm pushing all resources out before submitting a write-up for that experiment.https://gl.kwarc.info/SIGMathLing/website/-/merge_requests/7Add arxmliv subresource page for bibtex url to resolve2019-07-20T22:09:13ZDeyan GinevAdd arxmliv subresource page for bibtex url to resolveI just realized our official bibtex for the datasets points to a URL that does not exist:
```bibtex
@MISC{SML:arXMLiv:08.2018,
author = {Deyan Ginev},
title = {arXMLiv:08.2018 dataset, an HTML5 conversion of arXiv.org},
howpublishe...I just realized our official bibtex for the datasets points to a URL that does not exist:
```bibtex
@MISC{SML:arXMLiv:08.2018,
author = {Deyan Ginev},
title = {arXMLiv:08.2018 dataset, an HTML5 conversion of arXiv.org},
howpublished = {hosted at \url{https://sigmathling.kwarc.info/resources/arxmliv/}},
note = {SIGMathLing -- Special Interest Group on Math Linguistics},
year = 2018}
```
So, I am adding that placeholder page, without linking it anywhere, as the resource has already been cited, so we should facilitate the URL. I will make sure to use the specific dataset URL for the 2019 bibtex entry.https://gl.kwarc.info/SIGMathLing/website/-/merge_requests/8First announcement of statement dataset2019-08-30T00:33:12ZDeyan GinevFirst announcement of statement dataset* Adding a new release note for the statement dataset
* Dataset is now uploaded at https://gl.kwarc.info/SIGMathLing/statements-arxmliv-08-2018
* TODO: arXiv submission happening soon, need to update paper link before merging
Should be ...* Adding a new release note for the statement dataset
* Dataset is now uploaded at https://gl.kwarc.info/SIGMathLing/statements-arxmliv-08-2018
* TODO: arXiv submission happening soon, need to update paper link before merging
Should be good to go in a couple of dayshttps://gl.kwarc.info/SIGMathLing/website/-/merge_requests/9Adding 2019 arXMLiv resources2019-09-19T13:06:54ZDeyan GinevAdding 2019 arXMLiv resourcesThe new dataset has been packaged and a GloVe embedding has been generated. This PR documents both and announces them, also tidying up the various related pages and adding a news item.The new dataset has been packaged and a GloVe embedding has been generated. This PR documents both and announces them, also tidying up the various related pages and adding a news item.https://gl.kwarc.info/SIGMathLing/website/-/merge_requests/10updated 2019 embeddings handling apostrophe tokenization better2019-10-01T12:33:35ZDeyan Ginevupdated 2019 embeddings handling apostrophe tokenization betterIn the 2019 version of the embeddings I have made an effort to preserve text mode punctuation into the plain text model, so that we at least have the option of using the full context, as is now customary in very large corpora pre-trainin...In the 2019 version of the embeddings I have made an effort to preserve text mode punctuation into the plain text model, so that we at least have the option of using the full context, as is now customary in very large corpora pre-training of language models.
However, some of that is more subtle than seen at first glance. In particular the best practice for apostrophes is to conserve a fixed set of English abbreviated uses as individual word entries. So I updated the llamapun generation to do so, keeping ten hand-selected words and using a standalone "apostrophe token" in all other cases - which are mostly single 'quote' quotations.
The resulting entries in the GloVe vocabulary are:
vocabulary rank | word | corpus frequency
----:|:----|-----:
240 | 's | 6623738
267 | ' | 5697131
2545 | 't | 327186
8069 | 'un | 40822
9238 | 'th | 30858
9768 | 'll | 27310
11821 | 'il | 18036
11946 | 'd | 17683
12283 | 've | 16714
15093 | 're | 10872
20422 | 'm | 5803
This leads to a further denoised vocabulary, now counting 989,136 distinct words (freq 5+). I'll upload to gitlab, and merge the pull request when the data looks good.https://gl.kwarc.info/SIGMathLing/website/-/merge_requests/11Fix gitlab link in sidebar2020-01-31T18:34:45ZDeyan GinevFix gitlab link in sidebarMinor, but since I spotted it, might as well fixMinor, but since I spotted it, might as well fixhttps://gl.kwarc.info/SIGMathLing/website/-/merge_requests/12Grounding dataset v12020-03-26T06:30:16ZTakuto ASAKURAGrounding dataset v1I have added a landing page for our "Dataset for Grounding of Formulae, Version 1".
The repository of the actual dataset mentioned as "[download link](https://gl.kwarc.info/SIGMathLing/grounding-dataset-v1)" also has been prepared.
Ple...I have added a landing page for our "Dataset for Grounding of Formulae, Version 1".
The repository of the actual dataset mentioned as "[download link](https://gl.kwarc.info/SIGMathLing/grounding-dataset-v1)" also has been prepared.
Please kindly check all the changes to the website are fine for you.https://gl.kwarc.info/SIGMathLing/website/-/merge_requests/13Announce arxmliv 2020 release2021-01-24T20:18:40ZDeyan GinevAnnounce arxmliv 2020 releaseThe 2020 arxmliv release is now ready at the git LFS server. Proofreading and merging here in a bit...The 2020 arxmliv release is now ready at the git LFS server. Proofreading and merging here in a bit...Deyan GinevDeyan Ginevhttps://gl.kwarc.info/SIGMathLing/website/-/merge_requests/14update seminar.md: add a link to Andre's talk2021-03-24T14:23:11ZTakuto ASAKURAupdate seminar.md: add a link to Andre's talkIt seems even "Developer" cannot push his commit to protected-branches including the master branch. Thus, I need to open this Merge Request. Please kindly merge it.It seems even "Developer" cannot push his commit to protected-branches including the master branch. Thus, I need to open this Merge Request. Please kindly merge it.Frederik SchaeferFrederik Schaeferhttps://gl.kwarc.info/SIGMathLing/website/-/merge_requests/15download links2022-07-01T07:22:24ZFrederik Schaeferdownload linksReplace download links with links to the new download link repository.Replace download links with links to the new download link repository.Frederik SchaeferFrederik Schaeferhttps://gl.kwarc.info/SIGMathLing/website/-/merge_requests/5Resolve "collect a SIGMathLing bibliography"2021-01-20T15:25:56ZLuis BerliozResolve "collect a SIGMathLing bibliography"Closes #13Closes #13https://gl.kwarc.info/SIGMathLing/website/-/merge_requests/4Add bibliography2019-07-17T19:27:35ZLuis BerliozAdd bibliographyStart a common bibliography file for SIGMathLing.
To create a pdf file with the \nocite{*} option: run make inside the `bibliography` directory.
Request for additions and changes in the issues page.Start a common bibliography file for SIGMathLing.
To create a pdf file with the \nocite{*} option: run make inside the `bibliography` directory.
Request for additions and changes in the issues page.https://gl.kwarc.info/SIGMathLing/website/-/merge_requests/3WIP: Resolve "collect a SIGMathLing bibliography"2019-07-13T19:59:07ZLuis BerliozWIP: Resolve "collect a SIGMathLing bibliography"Closes #13Closes #13