Snippets Groups Projects

semantics extractions based on machine learning

We have a couple of corpora from which we want to extract semantical features.

Examples are

quantity expressions like "3m/s" (three meters per second) or "two furlongs per fortnight"
polarity of identifiers in formulae (essentially, which symbols in a formula can be substituted for)
where are "definitions/theorems/assumptions" (and what are their definienda, definienses, and statemnets).
or more generally what is the content form of a formula If we know any of those, we could extend nice semantic features (e.g. better screen readers for visually challenged people or better scientific search engines) relatively directly. We have a couple of large corpora e.g. the arXMLiv corpus or the data behind the Online Encyclopaedia of Integer Sequences All of them are (probably) amenable to machine-learning methods. In some cases, we already have some data about the phenomena above which can act as a baseline.
The topic is to pick one or more of these aspects of semantics and see what contemporary statistical AI methods can do to scale these up to corpus size and develop an symbolic application (possibly with a lot of help from the group).

Designs

Child items ...

Activity

Michael Kohlhase added Obsolete -- only as an example label 2 months ago

added Obsolete -- only as an example label

Please register or sign in to reply