@@ -18,7 +18,7 @@ challenges but user interfaces and determining the right application context.

\item\textbf{Model-/Countermodel Finding:} If the codomain of a view is a theory representing a specific model, it would tell Jane that those are \emph{examples} of her abstract theory.

Furthermore, partial views -- especially those that are total on some included theory -- could lead to insightful \emph{counterexamples}.

%

\item\textbf{Library Refactoring:} Given that the view finder looks for \emph{partial} views, we can use it to find natural extensions of a starting theory. Imagine Jane removing the last of her axioms for ``beautiful sets'' -- the other axioms (disregarding finitude of her sets) would allow her to find e.g. both Matroids and \emph{Ideals}, which would suggest to her to refactor her library accordingly.

\item\textbf{Library Refactoring:} Given that the view finder looks for \emph{partial} views, we can use it to find natural extensions of a starting theory. Imagine Jane removing the last of her axioms for ``beautiful sets'' -- the other axioms (disregarding finiteness of her sets) would allow her to find e.g. both Matroids and \emph{Ideals}, which would suggest to her to possibly refactor her library such that both extend her new theory.

Additionally, \emph{surjective} partial views would inform her, that her theory would probably better be refactored as an extension of the codomain, which would allow her to use all theorems and definitions therein.

%

\item\textbf{Theory Generalization:} If we additionally consider views into and out of the theories found, this can make theory discovery even more attractive. For example, a view from a theory of vector spaces intro matroids could inform Jane additionally, that her beautiful sets, being matroids, form a generalization of the notion of linear independence in linear algebra.

@@ -56,7 +56,7 @@ of the fourth condition -- and maybe rethink the name.

% \caption{Theory Classification for beautiful sets}\label{fig:theory-classification-ex}

% \end{figure}

In this paper we reduce the theory classification problem to that of finding theory morphisms (views) between \MMTtheories in a library $\cL$: given a query theory $Q$, the view finder computes all (total) views from $Q$ into $\cL$ and returns presentations of target theories and the assignments made by the views.

In this paper we reduce the theory classification problem to the problem of finding theory morphisms (views) between theories in a library $\cL$: given a query theory $Q$, the algorithm computes all (total) views from $Q$ into $\cL$ and returns presentations of target theories and the assignments made by the views.

\paragraph{Related Work}

Existing systems have so far only worked with explicitly given views, e.g., in IMPS \cite{imps} or Isabelle \cite{isabelle}.

...

...

@@ -76,7 +76,7 @@ Instead of views inside a single corpus, they use machine learning to find simil

Their results can roughly be seen as a single partial view from one corpus to the other.

\paragraph{Approach and Contribution}

Our contribution is twofold. Firstly, we present a the design and implementation of a

Our contribution is twofold. Firstly, we present the design and implementation of a

generic view finder that works with arbitrary corpora represented in MMT. The

algorithm tries to match two symbols by unifying their types. This is made efficient by

separating the term into a hashed representation of its abstract syntax tree (which serves

Intuitively, \mmt is a declarative language for theories and views over an arbitrary object language.

Its treatment of object languages is abstract enough to subsume most practically relevant logics and type theories.

Its treatment of object languages is abstract enough to subsume most logics and type theories which are practically relevant.

Fig.~\ref{fig:mmt} gives an overview of the fundamental MMT concepts.

In the simplest case, \textbf{theories}$\Sigma$ are lists of \textbf{constant declarations}$c:E$, where $E$ is an expression that may use the previously declared constants.

...

...

@@ -9,7 +9,7 @@ We say that $\Sigma'$ includes $\Sigma$ if it contains every constant declaratio

\begin{figure}[htb]\centering

\begin{tabular}{|l|l|l|}\hline

\multicolumn{3}{|c|}{meta-theory: a fixed theory $M$}\\\hline

& Theory $\Sigma$& iew $\sigma:\Sigma\to\Sigma'$\\\hline

& Theory $\Sigma$&View $\sigma:\Sigma\to\Sigma'$\\\hline

set of & typed constant declarations $c:E$& assignments $c\mapsto E'$\\

$\Sigma$-expressions $E$& formed from $M$- and $\Sigma$-constants & mapped to $\Sigma'$ expressions \\\hline

\end{tabular}

...

...

@@ -21,10 +21,10 @@ To be well-typed, $\sigma$ must preserve typing, i.e., we must have $\vdash_{\Si

Here $\ov{\sigma}$ is the homomorphic extension of $\sigma$, i.e., the map of $\Sigma$-expressions to $\Sigma'$-expressions that substitutes every occurrence of a $\Sigma'$-constant with the $\Sigma'$-expression assigned by $\sigma$.

We call $\sigma$\textbf{simple} if the expressions $e'$ are always $\Sigma'$-\emph{constants} rather than complex expressions.

The type-preservation condition for an assignment $c\mapsto c'$ reduces to $\ov{\sigma}(E)=E'$ where $E$ and $E'$ are the types of $c$ and $c'$.

We call $\sigma$\textbf{partial} if it does not contain an assignment for every $\Sigma$-constant.

We call $\sigma$\textbf{partial} if it does not contain an assignment for every $\Sigma$-constant and \textbf{total} otherwise.

A partial view from $\Sigma$ to $\Sigma'$ is the same as a total view from some theory included by $\Sigma$ to $\Sigma'$.

Importantly, we can then show generally at the MMT-level that if $\sigma$ is well-typed, then $\ov{\sigma}$ preserves all $\Sigma$-judgments.

Importantly, we can then show generally at the MMT-level that if $\sigma$ is well-typed, then $\ov{\sigma}$ preserves all typing and equality judgments over $\Sigma$.

In particular, if we represent proofs as typed terms, views preserve the theoremhood of propositions.

This property makes views so valuable for structuring, refactoring, and integrating large corpora.

...

...

@@ -59,7 +59,7 @@ For example, the axiom $\forall x:\cn{set},y:\cn{set}.\; \cn{beautiful}(x) \wedg

Finally, we remark on a few additional features of the MMT language that are important for large-scale case studies but not critical to understand the basic intuitions of results.

MMT provides a module system that allows theories to instantiate and import each other. The module system is conservative: every theory can be elaborated into one that only declares constants.

MMT provides a module system that allows theories to instantiate and import each other. The module system is conservative: every theory can be \emph{elaborated} into one that only declares constants.

MMT constants may carry an optional definiens, in which case we write $c:E=e$.

Defined constants can be eliminated by definition expansion.

@@ -4,7 +4,7 @@ However, due to the various idiosyncrasies of logics, tools' library structuring

For example, unless the logics are closely related, meta-views usually do not even exist and must be approximated.

Therefore, a lot of tweaking is typically necessary, and it is possible that multiple runs with different trade-offs give different interesting results.

As an example, we present a large case study where we find view from the MitM library used in the running example so far into the PVS/NASA library.

As an example, we present a large case study where we find views from the MitM library used in the running example so far into the PVS/NASA library.

These include elaboration of structured theories and definition expansion, which we mentioned in Sect.~\ref{sec:prelim}.

Importantly, we do not fully eliminate defined constant declarations $c:E=e$ from a theory $\Sigma$: instead, we replace them with primitive constants $c:E$ and replace every occurrence of $c$ in other declarations with $e$.

If $\Sigma$ is the domain theory, we can simply ignore $c:E$ (because views do not have to provide an assignment to defined constants).

But if the $\Sigma$ is the codomain theory, retaining $c:E$ increases the number of views we can find.

But if the $\Sigma$ is the codomain theory, retaining $c:E$ increases the number of views we can find; in particular in situations where $E$ is a type of proofs, and hence $c$ a theorem.

\textbf{Meta-theory-level normalization} applies an $M$-specific normalization function.

In general, we assume this normalization to be given as a black box.

...

...

@@ -56,8 +56,8 @@ However, because many practically important normalization steps are widely reusa

This has the additional benefit or shrinking the abstract syntax trees and speeding up the search.

\item Equalities are normalized such that the left hand side has a smaller abstract syntax tree.

\end{compactenum}

Above multiple normalization steps make use of a total order on abstract syntax trees$(t,s)$.

We omit the details and only remark that we try to avoid using the names of the constants in $s$ in the definition of the order --- otherwise, declarations that could be matched by a view would be normalized differently.

Above multiple normalization steps make use of a total order on abstract syntax trees.

We omit the details and only remark that we try to avoid using the names of constants in the definition of the order --- otherwise, declarations that could be matched by a view would be normalized differently.

Even when breaking ties between requires comparing two constants, we can first try to recursively compare the syntax trees of their types.

%In addition to the above optimizations in the algorithm, we can normalize the theories themselves in various ways to improve the number and quality of the morphisms found. To accommodate these, we add two additional features to the algorithm:

...

...

@@ -104,7 +104,7 @@ The corresponding long syntax tree is :

For our algorithm, we pick the \emph{long} abstract syntax tree, which may appear surprising.

The reason is that shortness is not preserved when applying a simple view: whenever a view maps two different constants to the same constant, the resulting tree is not short anymore.

Longness, on the other hand, is preserved.

Length, on the other hand, is preserved.

The disadvantage that long trees take more time to traverse is outweighed by the advantage that we never have to renormalize the trees.

%\begin{oldpart}{FR: replaced with the above}

...

...

@@ -126,7 +126,7 @@ The disadvantage that long trees take more time to traverse is outweighed by the

\subsection{Search}

Consider two constants $c:E$ and $c':E'$ preprocessed into long abstract syntax trees $(t,s)$ and $(t',s')$.

Consider two constants $c:E$ and $c':E'$, where $E$ and $E'$ are preprocessed into long abstract syntax trees $(t,s)$ and $(t',s')$.

It is now straightforward to show the following Lemma:

\begin{lemma}

The assignment $c\mapsto c'$ is well-typed in a view $\sigma$ if $t=t'$ (in which case $s$ and $s'$ must have the same length $l$) and $\sigma$ also contains $s_i\mapsto s'_i$ for $i=1,\ldots,l$.

...

...

@@ -135,7 +135,7 @@ Of course, the condition about $s_i\mapsto s'_i$ may be redundant if $s$ contain

This lemma is the center of our search algorithm explained in

\begin{lemma}[Core Algorithm]

Consider two constant declarations $c$ and $c'$ in theories $\Sigma'$ and $\Sigma'$.

Consider two constant declarations $c$ and $c'$ in theories $\Sigma$ and $\Sigma'$.

We define a view by starting with $\sigma=c\mapsto c'$ and recursively adding all prerequisites to $\sigma$ until

\begin{compactitem}

\item either the recursion terminates

...

...

@@ -167,7 +167,7 @@ The union of compatible well-typed views is again well-typed.

Consider the partial view from Example \ref{ex:beautifinite} and imagine a second partial view for the axioms

$\cn{beautiful}(\emptyset)$ and $\cn{finite}(\emptyset)$. The former has the requirements

The latter requires only $\cn{set}\mapsto\cn{powerset}$ and $\emptyset\mapsto\emptyset$. Since both views agree on all assignments, we can merge them into one view mapping both axioms and all requirements of both.

The latter requires only $\cn{set}\mapsto\cn{powerset}$ and $\emptyset\mapsto\emptyset$. Since both views agree on all assignments, we can merge all oo them them into a single view, mapping both axioms and all requirements of both.

- "view finder" is not a concept that most people will be familiar with, you should explain it at the beginning of the paper.

-> TODO

- When the key phrase 'view finder' is introduced it should have some explanatory annotation to avoid semantic dissonance as the tyro reader meets it for the first time. For example "With a view finder (something to helpfind logical views) ..." or something better.

-> TODO

- Popular proof assistants such is Isabelle/HOL, Coq and Mizar have large amounts of theory knowledge tied up in various ways in their type systems. The paper does not discuss how this knowledge could be accessed. Has this been considered and what are the challenges?

-> Don't know much about Isabelle/Coq/Mizar -- @Florian?

- Up front, this section needs to make clear that the "constants" of a theory include types, constants, functions and proof witnesses as in Automath or constructive type theories, and that in declarations "c : E", the types E can be not only types, but also type classses and propositions (by a propositions as types correspondence). A concrete simple example of of a theory would help ensure the reader appreciates this.

-> Check

- The phrase "Naturally, E must be subject to some type system" is cryptic, especially given E itself is understood to be a kind of type.

-> Confused?

- When introducing the "view" terminology, give synonyms such as "theory morphism" and "theory interpretation" to help the reader. Is the introduction of a new term "view" really necessary? It confuses as much as helps this reviewer.

-> Check

- Explain the phrase "\Sigma-judgements".

-> Check

- Explicitly define what it means for a view to be total.

-> TODO

-> Extended version?

- A few more words on the "module system" would help, given that expanding out theory imports and instantiations is a key part of the normalisation pre-processing step.

-> Check

- Make clear that constants with definiens can be used to encode the derived theorems of a theory.

-> Check

- The idea of "refactoring" theories needs explanation.

-> TODO

- What does "elaboration" mean?

-> TODO

- Are "structured theories" the same as "modules" of the previously mentioned "module system"? Could just one term be used for these?

-> Check

-> Extended version?

- Add a remark on how normalisation handles the case of defined constants c : E = e when E is a theorem in a system like PVS where proof witnesses e do not exist.

-> Rebuttal? (Insert dummy proofs)

-> Extended vesion?

- The core algorithm is very informally sketched. A little more precision would be helpful.

-> Page limitations; rebuttal?

- p8 Can it be made clear that the two \emptyset{}s come from different theories?

-> TODO

-> Extended version

- 3.2 ends rather abruptly. So how much amalgamation is done? Does the implementation compute all maximal amalgamations? Is every possible amalgamation reported, or just the maximal ones?

-> Design choice - Rebuttal?

-> Design choice - Extended version?

- Remark on the example working because the typed sets happen to be implemented as predicates.

-> TODO

-> Extended version

- Sec 4: The discussion of meta-views is very condensed. Could an example be given which illustrates how they have to be approximated?

-> Check

- Fig 5. The ability to discover commutative operators seems very poor. Can this be discussed?

-> Check

-> Extended version?

- Fig 6.

- The "Aggregated" column needs explanation.

- It would be nice if the rows for the inter and intra library cases

were clearer. Maybe add a column for the co-domain with a "NASA

Library" entry for each row.

-> Check

- The presentation of the results is not that satisfactory. The text suggests that most of the numbers in the table in Fig 6 are spurious, but the reasons why are not well explained. And the good examples found of theory morphisms were hard to understand. The last sentence of Section 4 needs expanding. Maybe some concrete details would help. The reader is left with the impression that the overall performance is poor. Is that true?

-> Check

- Here and elsewhere the paper has good discussion of possible applications of morphism discovery. But the evidence that the implemented system generates falls far short of what these applications need. More honesty is needed about the preliminary nature of the work and the challenges ahead.

-> Rebuttal?

-> Extended version?

NITPICKS:

- Add a period after the subheading on paragraphs. For example,

instead of "Motivation" write "Motivation.".

- p. 2. Define MMT before using it.

- p. 2, fourth paragraph. "Firstly, we present a the..." should be

"Firstly, we present the..."

- p. 3, second paragraph. Reword "practically relevant logics...",

because relevance logic is often called relevant logic. This

sentence can be confusing.

- p. 3, Fig. 1. "iew" should be "View"

- p. 7, Lemma 2. "in theories \Sigma' and \Sigma'..." should be

"in theories \Sigma and \Sigma'...".

- p. 10, last paragraph. "we find view..." should be "we find

views...".

- We have simple English words for two cases already and do not need neologisms longness => length, finitude => finiteness

- In Fig 1, "iew" -> "View"

- p6 First use of "(t,s)" notation before the notation is introduced is

cryptic. Some reordering of the introduction of this notation is

needed.

- p7 "Consider two constants c ∶ E and c' ∶ E' preprocessed...". -> "Consider two constants c ∶ E and c' ∶ E' where types E and E' are preprocessed...".