We have seen how a viewfinder can be used for theory \emph{discovery} and finding constants with specific desired properties, but many other potential use cases are imaginable. The main problems to solve with respect to these is less about the algorithm or software design challenges, but user interfaces.

The theory discovery use case described in Sec. \ref{sec:usecase} is mostly desirable in a setting where a user is actively writing or editing a theory, so the integration in jEdit is sensible. However, the across-library use case in Sec. \ref{sec:pvs} already would be a lot more useful in a theory exploration setting, such as when browsing available archives on MathHub\ednote{cite} or in the graph viewer integrated in \mmt\ednote{cite}. Additional specialized user interfaces would enable or improve the following use cases:

The theory discovery use case described in Sec. \ref{sec:usecase} is mostly desirable in a setting where a user is actively writing or editing a theory, so the integration in jEdit is sensible. However, the across-library use case in Sec. \ref{sec:pvs} already would be a lot more useful in a theory exploration setting, such as when browsing available archives on MathHub~\cite{mathhub} or in the graph viewer integrated in \mmt~\cite{RupKohMue:fitgv17}. Additional specialized user interfaces would enable or improve the following use cases:

\end{newpart}

\begin{itemize}

\item\textbf{Model-/Countermodel Finding:} If the codomain of a view is a theory representing a specific model, it would tell her that those

title="{IMPS: An Interactive Mathematical Proof System}",

journal="{Journal of Automated Reasoning}",

pages="213--248",

volume="11",

number="2",

year="1993",

}

@Book{isabelle,

author="L. Paulson",

title="{Isabelle: A Generic Theorem Prover}",

publisher="Springer",

series="Lecture Notes in Computer Science",

volume="828",

year="1994",

}

@INPROCEEDINGS{NorKoh:efnrsmk07,

author={Immanuel Normann and Michael Kohlhase},

title={Extended Formula Normalization for $\epsilon$-Retrieval and

Sharing of Mathematical Knowledge},

pages={266--279},

crossref={MKM07},

keywords={conference},

pubs={mkohlhase,mws,omdoc}}

@inproceedings{KMOR:pvs:17,

author="M. Kohlhase and D. M{\"u}ller and S. Owre and F. Rabe",

title="{Making PVS Accessible to Generic Services by Interpretation in a Universal Format}",

year="2017",

pages="319--335",

booktitle="Interactive Theorem Proving",

editor="M. Ayala-Rincon and C. Munoz",

publisher="Springer"

}

@inproceedings{hol_isahol_matching,

author={T. Gauthier and C. Kaliszyk},

title="{Matching concepts across HOL libraries}",

booktitle={Intelligent Computer Mathematics},

editor={S. Watt and J. Davenport and A. Sexton and P. Sojka and J. Urban},

pages={267--281},

year=2014,

publisher={Springer}

}

@online{OAFproject:on,

label={OAF},

url={https://kwarc.info/projects/oaf/},

title={{OAF}: An Open Archive for Formalizations},

urldate={2018-04-26}}

@Article{lf,

author="R. Harper and F. Honsell and G. Plotkin",

title="{A framework for defining logics}",

journal="{Journal of the Association for Computing Machinery}",

year=1993,

volume=40,

number=1,

pages="143--184",

}

@inproceedings{KKMR:alignments:16,

author="C. Kaliszyk and M. Kohlhase and D. M{\"u}ller and F. Rabe",

title="{A Standard for Aligning Mathematical Concepts}",

year="2016",

pages="229--244",

booktitle="Work in Progress at CICM 2016",

editor="A. Kohlhase and M. Kohlhase and P. Libbrecht and B. Miller and F. Tompa and A. Naummowicz and W. Neuper and P. Quaresma and M. Suda",

publisher="CEUR-WS.org"

}

@inproceedings{ODK:mitm:16,

author="P. Dehaye and M. Iancu and M. Kohlhase and A. Konovalov and S. Leli{\`e}vre and D. M{\"u}ller and M. Pfeiffer and F. Rabe and N. Thi{\'e}ry and T. Wiesing",

title="{Interoperability in the ODK Project: The Math-in-the-Middle Approach}",

year="2016",

pages="117--131",

booktitle="Intelligent Computer Mathematics",

editor="M. Kohlhase and L. {de Moura} and M. Johansson and B. Miller and F. Tompa",

publisher="Springer"

}

@inproceedings{MRLR:alignments:17,

author="D. M{\"u}ller and C. Rothgang and Y. Liu and F. Rabe",

title="{Alignment-based Translations Across Formal Systems Using Interface Theories}",

year="2017",

pages="77--93",

booktitle="Proof eXchange for Theorem Proving",

editor="C. Dubois and B. {Woltzenlogel Paleo}",

publisher="Open Publishing Association"

}

@INPROCEEDINGS{mathhub,

author="M. Iancu and C. Jucovschi and M. Kohlhase and T. Wiesing",

title="{System Description: MathHub.info}",

booktitle="Intelligent Computer Mathematics",

editor="S. Watt and J. Davenport and A. Sexton and P. Sojka and J. Urban",

pages="431--434",

publisher="Springer",

year={2014},

}

@inproceedings{RupKohMue:fitgv17,

author={Marcel Rupprecht and Michael Kohlhase and Dennis M{\"u}ller},

@@ -73,14 +73,14 @@ Automatically and systematically searching for new theory morphisms was first un

However, at that time no large corpora of formalized mathematics were available in standardized formats that would have allowed easily testing the ideas in large corpora.

This situation has changed since then as multiple such exports have become available.

In particular, we have developed the MMT language \cite{RK:mmt} and the concrete syntax of the OMDoc XML format \cite{omdoc} as a uniform representation language for such corpora.

And we have translated multiple proof assistant libraries into this format, including the ones of PVS in \cite{KMOR:pvs:17} and HOL Light in \cite{RK:hollight:15}.

In particular, we have developed the MMT language \cite{RK:mmt:10} and the concrete syntax of the OMDoc XML format \cite{omdoc} as a uniform representation language for such corpora.

And we have translated multiple proof assistant libraries into this format, including the ones of PVS in \cite{KMOR:pvs:17} and HOL Light in \cite{KR:hollight:14}.

Building on these developments, we are now able, for the first time, to apply generic methods --- i.e., methods that work at the MMT level --- to search for theory morphisms in these libraries.

While inspired by the ideas of \cite{NorKoh:efnrsmk07}, our design and implementation are completely novel.

In particular, the theory makes use of the rigorous language-independent definitions of \emph{theory} and \emph{theory morphism} provided by MMT, and the practical implementation makes use of the MMT system, which provides high-level APIs for these concepts.

\cite{cezary+thibault paper} applies techniques related to ours to a related problem.

\cite{hol_isahol_matching} applies techniques related to ours to a related problem.

Instead, of theory morphisms inside a single corpus, they use machine learning to find similar constants in two different corpora.

Their results can roughly be seen as a single partial morphism from one corpus to the other.

@@ -141,7 +141,7 @@ Defined constants can be eliminated by definition expansion.

As part of the OAF project~\cite{OAFproject:on}, we have imported several proof assistant libraries into the MMT system. To motivate some of the design choices made in this paper, we will outline the general procedure behind these imports.

\paragraph{} First, we formalize the core logical foundation of the system. We do so by using the logical framework LF\ednote{cite} (at its core a dependently-typed lambda calculus) and various extensions thereof, which are implemented in and supported by the MMT system. In LF, we can formalize the foundational primitives using the usual judgments-as-types and higher-order abstract syntax encodings -- hence theorems and axioms are represented as constants with a type $\vdash P$ for some proposition $P$, and primitive constructs like lambdas are formalized as LF-functions taking LF-lambda-expressions -- which serve as a general encoding of any variable binders -- as arguments.

\paragraph{} First, we formalize the core logical foundation of the system. We do so by using the logical framework LF~\cite{lf} (at its core a dependently-typed lambda calculus) and various extensions thereof, which are implemented in and supported by the MMT system. In LF, we can formalize the foundational primitives using the usual judgments-as-types and higher-order abstract syntax encodings -- hence theorems and axioms are represented as constants with a type $\vdash P$ for some proposition $P$, and primitive constructs like lambdas are formalized as LF-functions taking LF-lambda-expressions -- which serve as a general encoding of any variable binders -- as arguments.

The resulting formalizations are then used as meta-theory for imports of the libraries of the system under consideration. This results in a theory graph as in Figure \ref{fig:oaf}.

@@ -23,7 +23,7 @@ We have so far assumed one fixed meta-theory for all theories involved; we will

Obviously, various differences in available foundational primitives and library-specific best practices and idiosyncracies can prevent the algorithm from finding desired matches. There are two approaches to increasing the number of results in these cases:

\begin{itemize}

\item In many instances, the translations between two foundations is too complex to be discovered purely syntactically. In these cases we can provide arbitrary translations between theories, which are applied before computing the encoding.\ednote{Mention/cite alignment-translation paper}

\item In many instances, the translations between two foundations is too complex to be discovered purely syntactically. In these cases we can provide arbitrary translations between theories, which are applied before computing the encoding (e.g. alignment-based translations as in \cite{MRLR:alignments:17}

\item We can do additional transformations before preprocessing theories, auch as normalizing expressions, eliminating higher-order abstract syntax encodings or encoding-related redundant information (such as the type of a typed equality, which in the presence of subtyping can be different from the types of both sides of an equation), or elaborating abbreviations/definitions.

\end{itemize}

When elaborating definitions, it is important to consider that this may also reduce the number of results, if both theories use similar abbreviations for complex terms, or the same concept is declared axiomatically in one theory, but definitionally in the other. For that reason, we can allow \textbf{several abstract syntax trees for the same constant}, such as one with definitions expanded and one ``as is''.

@@ -133,9 +133,9 @@ Therefore, we consider the \emph{common} symbols as a fixed part of the abstract

\paragraph{Picking starting theories:} If we try to find views between whole collections of theories, we can obviously disregard all theories that are already included in some other theory in our collections, since we work with a normalized (and dependency-closed) version of a theory. Consequently, by only using maximal theories we do not find any fewer views but speed up the search significantly.

\begin{newpart}{DM}

\paragraph{Morphism aggregation:} The morphisms found by the algorithm are always induced by a single assignment $c\mapsto c'$. Depending on what we intend to do with the results, we might prefer to consider them individually (e.g. to yield \emph{alignments} in the sense of \ednote{cite}), aggregate them into ideally total views (by merging compatible morphisms, where two morphisms $v_1,v_2$ are compatible if there are no assignments $(c\mapsto c_1)\in v_1$ and $(c\mapsto c_2)\in v_2$ with $c_1\neq c_2$) with varying degrees of modularity.

\paragraph{Morphism aggregation:} The morphisms found by the algorithm are always induced by a single assignment $c\mapsto c'$. Depending on what we intend to do with the results, we might prefer to consider them individually (e.g. to yield \emph{alignments} in the sense of \cite{KKMR:alignments:16}), aggregate them into ideally total views (by merging compatible morphisms, where two morphisms $v_1,v_2$ are compatible if there are no assignments $(c\mapsto c_1)\in v_1$ and $(c\mapsto c_2)\in v_2$ with $c_1\neq c_2$) with varying degrees of modularity.

\end{newpart}

\paragraph{Storing Encodings} Finally, for computing the encodings of a theory we only need to know the symbols to be fixed in the component $\cn{h}(C)$ of an encoding, for which only the meta-theories of the theories are relevant. They also determine the specific preprocessings and translations we want to likely use. Since there is only a small number of meta-theories incolved that are relevant in practice, we can store and retrieve the encodings for the most important situations. Since computing the encodings (as well as sorting the theories in a library by their dependencies) is the most expensive part of the algorithm, this -- once computed and stored -- makes the viewfinding process itself rather efficent.

\paragraph{Storing Encodings} Finally, for computing the encodings of a theory we only need to know the symbols to be fixed in the component $t$ of an abstract syntax tree, for which only the meta-theories of the theories are relevant. They also determine the specific preprocessings and translations we want to likely use. Since there is only a small number of meta-theories incolved that are relevant in practice, we can store and retrieve the encodings for the most important situations. Since computing the encodings (as well as sorting the theories in a library by their dependencies) is the most expensive part of the algorithm, this -- once computed and stored -- makes the viewfinding process itself rather efficent.