Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • T thesis-projects
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 47
    • Issues 47
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • KWARC
  • thesis-projects
  • Issues
  • #14

Closed
Open
Created Apr 13, 2017 by Michael Kohlhase@mkohlhaseOwner

semantics extractions based on machine learning

We have a couple of corpora from which we want to extract semantical features.

Examples are

  • quantity expressions like "3m/s" (three meters per second) or "two furlongs per fortnight"
  • polarity of identifiers in formulae (essentially, which symbols in a formula can be substituted for)
  • where are "definitions/theorems/assumptions" (and what are their definienda, definienses, and statemnets).
  • or more generally what is the content form of a formula If we know any of those, we could extend nice semantic features (e.g. better screen readers for visually challenged people or better scientific search engines) relatively directly. We have a couple of large corpora e.g. the arXMLiv corpus or the data behind the Online Encyclopaedia of Integer Sequences All of them are (probably) amenable to machine-learning methods. In some cases, we already have some data about the phenomena above which can act as a baseline.
    The topic is to pick one or more of these aspects of semantics and see what contemporary statistical AI methods can do to scale these up to corpus size and develop an symbolic application (possibly with a lot of help from the group).
Assignee
Assign to
Time tracking