@@ -43,15 +43,15 @@ For symbolic data, this has been solved by the use of meta-logical frameworks an

For linked data and concrete, the use of ontology languages like OWL resp. database schemas is common; but these only offer general purpose datatypes like numbers, strings, and untyped lists, which is too weak for the complex datatypes that pervade mathematical sciences like polynomials, multidimensional arrays, graphs, towers of algebraic structures (e.g.

matrices over polynomials over algebraic extensions over finite fields), physical quantities, or numbers with error intervals.

%These include base types such as string, integers, boolean; collection types such as finite sets, lists, vectors, matrices; aggregation types such as products, unions, and records; algebraic types such as rings and fields; and symbolic types such as rational fields and polynomials.

These have to \emph{encoded} in terms of the low-level datatypes.

These have to be \emph{encoded} in terms of the low-level datatypes.

If these encodings are not described in detail, the data is not reusable.

For example, Kohonen's lattice dataset uses 5 encoding steps: lattices are encoded as graphs with canonically labelled nodes, the graphs as adjacency matrices, the adjacency matrices as bit vectors and the bit vectors as \texttt{digraph6} strings (similar to \texttt{base64}), and finally the entire file containing many lattices is gzipped.

Similar steps are needed for the graph datasets.

Even when these encodings are documented, they are tedious and error-prone, and make difficult any automated processing as needed for data validation, reproduction, or machine learning.

Even when these encodings are documented, they are tedious and error-prone, and make difficult any automated processing needed for data validation, reproduction, or machine learning.

In the OpenDreamKit project, the FAU group has developed a systematic solution by annotating datasets with formal schemas that specify both the high-level mathematical type and the encoding function.

In this task, we expand on this efforts.

In this task, we expand on these efforts.

We standardize a fixed set of mathematical datatypes (such as the one mentioned above) that subsumes at least all datatypes occurring in the datasets of \WPref{cases}.

Moreover, we standardize encodings for these datatypes, again subsuming those used by practitioners in building their datasets.

The biggest subtask here is in surveying the practically used datasets and making sure our standard is comprehensive enough.

@@ -86,7 +86,7 @@ This will be a from-scratch implementation but building on previous experience o

In addition to a human-oriented user interface for the shallow services, we will develop two advanced accessibility services that are specific to the needs of users in the mathematical sciences and engineering.

Firstly, we develop accessibility services for users with disabilities, e.g., to read out mathematical datasets for blind users.

This critically requires the codec framework developed in \taskref{foundations}{dtypes} as reading the encoded data is useless to humans --- only the decoded data yields a mathematical object that can be communicated to a user

This critically requires the codec framework developed in \taskref{foundations}{dtypes} as reading the encoded data is useless to humans --- only the decoded data yields a mathematical object that can be communicated to a user.

Secondly, we adapt the existing visualization components developed by the partners to make mathematical data accessible in ways more enticing and practical for human users.

This includes the browsing and management of large symbolic datasets (MathHub, \site{FAU}), the visualization of large graphs of mathematical objects (TGView(3D), \site{FAU}), the semantic interaction with symbolic data (MMT, \site{FAU}), property-based presentation of mathematical objects (Sage-explorer, \site{PS}), native visualization of mathematical objects within computational systems (Sage, \site{PS}), and the exploration of datasets of mathematical objects via their mathematical invariants (DiscreteZOO, \site{UL}; LMFDB, \site{CHA}).

...

...

@@ -103,7 +103,7 @@ This includes a substitution tree index for all symbolic data, a value index of

We use this index to build an efficient search service and integrate it into the user interface.

This will also allow for an innovative form of conjecturing by finding connections between seemingly unrelated data objects in different datasets that share sub-objects.

This will be based on several technologies already developed by project partners: the MathWebSearch search engine for symbolic data at \site{FAU}, the publication meta-data search service at \site{FIZ}, and the search capabilities developed for concrete mathematical data in the LMFDB \site{CHA}.

This will be based on several technologies already developed by project partners: the MathWebSearch search engine for symbolic data at \site{FAU}, the publication meta-data search service at \site{FIZ}, and the search capabilities developed for concrete mathematical data in the LMFDB (\site{CHA}).

This task will be led by \site{UL} and \site{FAU}, with contributions from \site{FIZ} and \site{CHA} as indicated above.