@@ 240,16 +240,16 @@ This enables (i) prototyping universally applicable Deep FAIR services that impr
% \caption{Summary of expertise}\label{fig:expertisecoverage}
% \end{figure}
Even though the \TheProject consortium has only seven members, it brings together partners representing all the major stakeholders' in mathematical data:
Even though the \TheProject consortium has only seven members, it brings together partners representing all the major stakeholders in mathematical data:
\begin{compactitem}
\item\emph{Dataset authors} are represented by \site{FAU} (OAF) for symbolic, \site{FIZ} (zbMATH) for linked, \site{UL} (graph datasets), and \site{CHA} (LMFDB) for concrete data.
\item\emph{Mathematicians as users} are represented by \site{PS}, \site{CHA} and \site{UL}  in fact, the \site{PS} PI Nicolas Thiery only initiated the OpenDreamKit project in order to improve the tools so that he could better pursue his actual research.
Primo\v{z} Poto\v{c}nik at \site{UL} and Stefan Lemurell at \site{CHA} need services built on top of datasets both as their users as well as providers of several datasets.
\item\emph{Users from other sciences} are represented by \site{CAE} and \site{EMS} (via the PI's personal research) for the modeling of cyberphysical systems.
\item\emph{Service providers} are represented by \site{FAU} (MathHub and MathWebSearch) and \site{CAE} (Emmo) for symbolic data, \site{FIZ} (zbMATH and swMATH) for linked data, \site{CHA} (LMFDB), and \site{UL} (DisreteZOO) for concrete data.
\item\emph{Mathematical institutions} are represented by \site{EMS} (European Mathematical Society) in general and for mathematical knowledge bases by \site{FAU} and \site{FIZ} (the PIs are members of the International Mathematical Knowledge Trust IMKT).
\item\textbf{Dataset authors} are represented by the sites \site{FAU} (OAF) for symbolic, \site{FIZ} (zbMATH) for linked, and \site{UL} (graph datasets), and \site{CHA} (LMFDB) for concrete data.
\item\textbf{Mathematicians as users} are represented by the sites \site{PS}, \site{CHA} and \site{UL}  in fact, the \site{PS} PI Nicolas Thiery only initiated the OpenDreamKit project in order to improve the tools so that he could better pursue his actual research.
Similarly, the PIs Primo\v{z} Poto\v{c}nik at \site{UL} and Stefan Lemurell at \site{CHA} need services both as providers and users of their datasets.
\item\textbf{Users from other sciences} are represented by \site{CAE} and \site{EMS} (via the PIs' personal research) for the modeling of cyberphysical systems.
\item\textbf{Service providers} are represented by the sites \site{FAU} (MathHub and MathWebSearch) and \site{CAE} (Emmo) for symbolic data, \site{FIZ} (zbMATH and swMATH) for linked data, and \site{CHA} (LMFDB) and \site{UL} (DisreteZOO) for concrete data.
\item\textbf{Mathematical institutions} are represented by the sites \site{EMS} (European Mathematical Society) in general and for mathematical knowledge bases by \site{FAU} and \site{FIZ} (the PIs are members of the International Mathematical Knowledge Trust IMKT).
\end{compactitem}
This allows optimally leveraging reusing stakeholder knowledge, networks, and user communities.
This allows optimally leveraging stakeholder knowledge, networks, and user communities.
@@ 296,17 +296,20 @@ This allows optimally leveraging reusing stakeholder knowledge, networks, and us
\eucommentary{Describe any national or international research and innovation activities which will be linked with the project, especially where the outputs from these will feed into the project}
\site{FAU} and \site{UL} are collaborating on a living survey of concrete record datasets~\cite{bercic:cmo:table}, which has served as a ``market study'' for the \pn project. This study was very well received in the mathematical community, and by community wordofmouth even led to the discovery of several orphaned datasets and collections.
Even though the \pn initiative wants to eventually give a home on the EOSC to all mathematical datasets, we commit to integrating a (representative) subset in the project itself.
The concrete selection presented in this section is driven by the availability of mathematical services driven by them and the overlap with the project participants  see Section~\ref{sec:pilotsets} for details.
The sites \site{FAU} and \site{UL} are collaborating on a living survey of concrete record datasets~\cite{bercic:cmo:table}, which has served as a market study for the \pn project.
This study was very well received in the mathematical community, and wordofmouth even led to the discovery of several orphaned datasets.
To maximize impact and advertise FAIR sharing, we commit to already integrating a significant representative collection of datasets during the project lifetime.
The concrete selection presented in this section is driven by the size and importance of the datasets, the sizes of the involved user groups, and the goal of reaching a diverse set of users. See Section~\ref{sec:pilotsets} for details.
\pn will incorporate the data collections in Figure~\ref{fig:datasets} and link to the external infrastructures summarized in Figure~\ref{fig:services}; the internal technologies used in \pn are listed and discussed separately in Section~\ref{sec:trls}.
This means that the datasets and the input/output facilities of the services need to be semantically preloaded i.e. enriched by semantic information as discussed in~\ref{sec:saod} above and brought into the \pn data standard form (see \WPref{services}).
Concretely, \pn will incorporate the datasets from Figure~\ref{fig:datasets} and link to the external infrastructures summarized in Figure~\ref{fig:services}.
Note that the technologies used internally in \pn are listed and discussed separately in Section~\ref{sec:trls}.
%This means that the datasets and the input/output facilities of the services need to be semantically preloaded i.e. enriched by semantic information as discussed in~\ref{sec:saod} above and brought into the \pn data standard form (see \WPref{services}).
Infrastructure & Maintainer & Description & PI contact \\\hline
\hline
EOSC/EUDat & EU & European Open Science Cloud & Florian Rabe (\site{FAU})\\\hline
GitHub & GitHub, Inc. & data repositories, e.g. for Modelica & Peter Harman (\site{CAE})\\\hline
% MathWebSearch & \cite{MathWebSearch:on} & \site{FAU} & Symbolic data search engine & Michael Kohlhase (\site{FAU})\\\hline
...
...
@@ 327,10 +330,12 @@ This means that the datasets and the input/output facilities of the services nee
FR: the example activities above are EUspeak for the innovation cycle; more relevant for us are the activities in the scope section, which I've used to derive the formulations used below; avoid deleting those phrases when revising this section}
\paragraph{Open data framework}
\inparahighlight{We will develop a framework for representing mathematical datasets using symbolic, concrete, and linked data with accessible semantics.}
\highlight{We will develop a framework for representing mathematical datasets using symbolic, concrete, and linked data with accessible semantics.}
%This will allow the automated discovery and reuse of datasets and items within these sets.
%This will critically boost useroriented open science because it covers the typical usage scenario where the reuser is not familiar with the details of the dataset she is hoping to find or reuse.
Because the distinct advantages of the three kinds of data are very difficult to combine, we allow each kind, employing established widelyused open formats:
Because the distinct advantages of the three kinds of data are very difficult to combine, we allow each kind and integrate all kinds into a coherent whole.
This will be based on established open formats:
\begin{compactitem}
\item For symbolic data, we use the OMDoc representation language.
It provides uniform encodings of symbolic data in a single standardized concrete language.
...
...
@@ 341,15 +346,16 @@ It provides uniform encodings of symbolic data in a single standardized concrete
To make the semantics of linked data identifiers accessible, we use MMT URIs that allow symbolic and linked data to share the same identifiers.
\end{compactitem}
All datasets will have URIs via which they become citable.
We will also define a dataset metadata standard that allows for tracking provenance, version, and license.
These URIs and the associated metadata form a living linked dataset themselves, which is also stored in the framework.
All datasets and all objects in them will have URIs via which they become accessible.
We will also define a metadata standard that allows for tracking provenance, version, and license of mathematical datasets and their entries.
These URIs and the associated metadata form a linked dataset themselves, which is also stored in the framework.
These research efforts are detailed in \WPref{foundations}.
\paragraph{Pilot datasets}\label{sec:pilotsets}
\inparahighlight{We integrate a representative selection of major datasets from different areas of mathematics into our infrastructure.}
We have carefully put together our consortium such that these communities are represented by partners  because the partners have built or maintain the database themselves or have close ties to them.
\highlight{We integrate a representative selection of major datasets from different areas of mathematics into our infrastructure.}
We have carefully put together our consortium such that these communities are represented by partners, i.e., for all pilot datasets there are partners who are the maintainers themselves or have close ties to them.
These pilot datasets were chosen for multiple purposes:
\begin{compactitem}
...
...
@@ 362,34 +368,34 @@ These pilot datasets were chosen for multiple purposes:
Figure~\ref{fig:datasets} gives an overview of the datasets we have chosen.
It also indicates
\begin{inparaenum}
\begin{inparaenum}[(i)]
\item how formidable the challenge is given the number and sizes of the datasets and
\item how well the \TheProject consortium matches the challenge, with each partner being an expert on one of these datasets.
\end{inparaenum}
These research efforts are detailed in \WPref{cases}.
\paragraph{Service prototypes and their integration into the EOSC Hub}
\inparahighlight{The core service prototyped by \TheProject is the semanticsaware FAIR data sharing infrastructure that allows the uniform integration and interoperation of mathematical services across all datasets.}
\highlight{The core service prototyped by \TheProject is the semanticsaware FAIR data sharing infrastructure that allows the uniform integration and interoperation of mathematical services across all datasets.}
This service will be deployed on a major server or small cluster of servers that are funded by \TheProject.
A reference instance of these services will be maintained by FAU but all hardware and software will be designed such that the services can be easily ported or replicated by other providers such as FIZ Karlsruhe.
A reference instance of these services will be maintained by the coordinating site \site{FAU}, but all hardware and software will be designed such that the services can be easily ported or replicated by other providers such as the \site{FIZ} site.
These servers can be maintained for some time beyond the \TheProject life time.
In addition several deeper innovative services will be realized that are enabled by the accessible semantics of the datasets.
These are browsing and visualization, validation, citability, versioning, and provenance tracking (see~\WPref{services} for details).
In addition several advanced innovative services will be realized that are enabled by the accessible semantics of the datasets.
These are deep citability, versioning, and provenanc tracking; validation, browsing and visualization; as well as computation including the integration with existing computation systems (see~\WPref{services} for details).
To ensure that our service prototypes can be integrated into the EOSC Hub at the completion of \TheProject, we undertake some efforts already during the \TheProject lifetime.
These include in particular the collection hardware and software interfaces, legal issues, and accessibility requirements (e.g., those set out in projects funded under the EINFRA122017 topic).

These include in particular the specification of hardware and software interfaces, legal issues, and accessibility requirements (e.g., those set out in projects funded under the EINFRA122017 topic).
These research efforts are detailed in \WPref{services}.
\paragraph{Outreach and Adoption in Different User Communities}
\inparahighlight{We ensure the scalability of our results by providing our services to user communities from different disciplines during the \TheProject lifetime.}
In fact, we will start with outreach activities immediately at the start of the project and gradually ramp them up.
\highlight{We ensure the scalability of our results by providing our services to user communities from different disciplines during the \TheProject lifetime.}
We will start with outreach activities immediately at the start of the project and gradually ramp them up.
This will also aid with raising the awareness of FAIR concepts in the mathematical community.
Concretely, we adopt a twopronged approach.
Concretely, we adopt a multipronged approach.
Firstly, we engage existing dataset maintainers and support them in sharing their datasets via the existing EOSC services.
If necessary, we offer doing this for them in order to decrease the necessary efforts on their side.
This will quickly and for the first time create a collection of a few dozen datasets from all kinds of mathematical fields accessible in a single place.
...
...
@@ 399,10 +405,10 @@ Because this is already a major improvement on the current disparate and adhoc
Secondly, the above initial collection of datasets will initially not be reusable or interoperable, let alone searchable, because the datasets will not yet conform to a common standard.
To develop requirements for, advertise, and collect feedback on our standard and the enabled advanced services, we will organize two extended workshop events, which we dub ``Summer of Math Data''.
They will feature a series of partially overlapping research visits of individual dataset providers, anchored by short workshops open to all mathematicians.
Finally, we will organize topical workshops on mathematical software conferences like the ICMS and CICM, and in the summer of 2022, we will organize a major workshop at the $4$annual International Congress of Mathematicians ICM, where we will officially release the final results of \TheProject.
They will feature a series of partially overlapping research visits of individual dataset providers, anchored by short workshops and conferences open to all mathematicians.
Finally, we will organize topical workshops at mathematical software conferences like the ICMS and CICM.
And in the summer of 2022 we will organize a major workshop at the $4$annual International Congress of Mathematicians (ICM), where we will officially release the final results of \TheProject.