Commit 15eba9fb authored by Florian Rabe's avatar Florian Rabe
Browse files

no message

parent c8a851f7
...@@ -111,7 +111,7 @@ This task will be led by \site{UL} and \site{FAU}, with contributions from \site ...@@ -111,7 +111,7 @@ This task will be led by \site{UL} and \site{FAU}, with contributions from \site
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{task}[id=I,title=System Interoperability and Computation,lead=PS,partners={UL},PM=12,wphases=12-36!.5] \begin{task}[id=I,title=System Interoperability and Computation,lead=PS,partners={UL},PM=12,wphases=12-36!.5]
In this task, we demonstrate the interoperability between the FAIRMat services and computational systems, by implementing a client in the SageMath system and showcasing its use in highly user-oriented applications. In this task, we demonstrate the interoperability between the FAIRMat services and computational systems, by implementing a client in the SageMath system and showcasing its use in highly user-oriented applications.
The minimal functionality is to allow upload/download of datasets to/from our servers directly from within SageMath. The minimal functionality is to allow upload/download of datasets to/from our servers directly from within SageMath.
This will already allow users to interact with shared datasets at the ``push of a button'' -- additional functionality will be added in later tasks of this work package, e.g. search and querying in \localtaskref{F}. This will already allow users to interact with shared datasets at the ``push of a button'' -- additional functionality will be added in later tasks of this work package, e.g. search and querying in \localtaskref{F}.
...@@ -121,16 +121,10 @@ Because SageMath is written in Python and MMT exposes a Python API, SageMath can ...@@ -121,16 +121,10 @@ Because SageMath is written in Python and MMT exposes a Python API, SageMath can
This will allow seamless and immediate computation with \TheProject datasets, e.g., to access a dataset, perform a computation on it, and share the results as a second dataset --- all in a single operation. This will allow seamless and immediate computation with \TheProject datasets, e.g., to access a dataset, perform a computation on it, and share the results as a second dataset --- all in a single operation.
The implementation will be based on and sustain the continuity of OpenDreamKit results, where \site{FAU} and \site{PS} have developed the integration of external datasets (specifically those of LMFDB) with SageMath via codecs. The implementation will be based on and sustain the continuity of OpenDreamKit results, where \site{FAU} and \site{PS} have developed the integration of external datasets (specifically those of LMFDB) with SageMath via codecs.
\begin{newpart}{MK@FR: re-read}
This task will also study the trade-offs between data storage and lazy re-computation, and develop joint abstractions of schema theories and virtual theories that allow to leave open (and thus transparently change) between those two modes of data provisioning.
Even flexible mixed models and models that self-adapt under differing loads are possible and will be studied.
We expect that this will give rise to programming abstractions that allow to memoize useful and computationally expensive objects persistently in data stores and thus directly support large-scale and distributed computations.
We will build on very promising experiments in the OpenDreamKit project.
\end{newpart}
In addition, this task will enable several innovative uses of the EOSC-level infrastructure. In addition, this task will enable several innovative uses of the EOSC-level infrastructure.
Firstly, it yields deep accessibility and deep reuse where only fragments of a dataset are accessed, computed with, or updated. Firstly, it yields deep accessibility and deep reuse where only fragments of a dataset are accessed, computed with, or updated.
Secondly, it enables the use of the FAIRMat sharing services as persistent memoization layer for computational systems. Secondly, it enables the use of the \TheProject sharing services as a persistent memoization layer that allows better trade-offs between data storage and on-demand re-computation, including the possibility of transparently changing between them.
This would allow memoizing computationally expensive objects persistently in data stores and thus support large-scale distributed computations.
This task will be led by \site{PS}, which is one of the sites developing the SageMath system. This task will be led by \site{PS}, which is one of the sites developing the SageMath system.
\site{UL} will contribute with the DiscreteZOO package and by providing the needed server-side interfaces. \site{UL} will contribute with the DiscreteZOO package and by providing the needed server-side interfaces.
......
...@@ -34,11 +34,9 @@ SageMath has become extremely popular among mathematicians, especially those act ...@@ -34,11 +34,9 @@ SageMath has become extremely popular among mathematicians, especially those act
Due to its size and integrative nature, it has become somewhat of a cross--programming language packaging and distribution platform, via which researchers can disseminate specialized Open Source computation libraries. Due to its size and integrative nature, it has become somewhat of a cross--programming language packaging and distribution platform, via which researchers can disseminate specialized Open Source computation libraries.
It integrates a number of important mathematical datasets such as copies of some LMFDB datasets or GAP's small groups library. It integrates a number of important mathematical datasets such as copies of some LMFDB datasets or GAP's small groups library.
\begin{newpart}{MK@FR: re-read} Note that there is a blurry line between computation and data in mathematics --- and any science where data is computed as opposed to measured: instead of storing data, we can always recompute it on demand; dually, computation can be replaced by tabulation of results.
Note that there is a blurry line between computation and data in mathematics -- and any science where data is computed as opposed to measured: instead of storing data we can always recompute it on demand. Which one is better, depends on the trade-off between time and space, i.e., computing and storage costs.
Dually, computation can often replaced by tabulation of results, and which one is ``better'' depends on ``secondary virtues'' like computational complexity, storage costs, and input change rates. And this trade-off in turn changes constantly according to hardware power and costs as well as community size and usage patterns.
The underlying trade-offs have not been explored for symbolic, record, and linked data; getting a good handle on them will be an important consideration for the \pn data/software framework (see \taskref{services}{I}).
\end{newpart}
\paragraph{Semantic Interoperability}\label{sec:mitm} \paragraph{Semantic Interoperability}\label{sec:mitm}
The SageMath integration layer described above is largely non-semantic in the sense that it relies on custom ``glue code'' in Python that is unverified and can be broken by any update of one of the integrated systems or datasets. The SageMath integration layer described above is largely non-semantic in the sense that it relies on custom ``glue code'' in Python that is unverified and can be broken by any update of one of the integrated systems or datasets.
......
No preview for this file type
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment