Commit e4243a1a authored by Ulrich's avatar Ulrich

text

parent b172b1e8
......@@ -208,17 +208,30 @@ to hear the encoding of the expression, here \mq{three M sub circled dot operato
%running example. \ednote{reference the report for the mathml association?}
\subsection{Semantic Search}
Tom Wiesing created a semantic search engine for quantity expressions~\cite{wiesingbsc}.
With it, we can search for quantity expressions and retrieve results also in equivalent formats.
For instance, we also find results including \mq{$90 \; \rm km/h$} when searching for \mq{$25 \; \rm m/s$}.
\label{ssec:implharvest}
Tom Wiesing created a semantic search engine for quantity expressions~\cite{wiesingbsc} which
allows to also retrieve results in equivalent formats.
For instance, we find results including \mq{$90 \; \rm km/h$} when searching for \mq{$25 \; \rm m/s$}.
However, Wiesing's work is dependent on an external source of data.
We thus postprocess our results of the spotting of quantity expressions and use them
to provide a corpus for Wiesing's search engine.
to provide a corpus for Wiesing's search engine. This is only a simple transformation from the
output format of the spotter (compare Figure~\ref{fig:EntryExample}) to the
format displayed in Figure~\ref{fig:tomharvest}.
It contains one \inline{rdf:Description} entry per spotted quantity expressions.
Here \mq{Identifier} denotes a unique path to the document and the quantity expression therein.
We use the value of the \inline{resource} attribute from the old format for this case, since it exactly
references the quantity expression in the original document.
In the same way, \mq{Content MathML} coincides with the corresponding expression from the old format.
During the conversion, we omit quantity expressions with a negative score.
\begin{figure}
\lstinputlisting[language=XML, frame=single]{xml/tom.xml}
\centering
\caption{The general Frame of Harvest Files for Tom Wiesing's Search Engine.}
\label{fig:tomharvest}
\end{figure}
\label{ssec:implharvest}
\ednote{adapt to Tom's format}
%MathWebSearch~\cite{kohlhase2006search} is a search engine for mathematical
%formulae that works on content representations such as MathML.
%Due to problems with its maintenance, the author could not use it and can thus
......@@ -238,10 +251,10 @@ to provide a corpus for Wiesing's search engine.
%we still need a frontend which has to extract the
%semantics of the user input and to transform it into
%the query language of the search engine.
\begin{figure}
\lstinputlisting[language=XML, frame=single]{xml/mwsharvest.xml}
\centering
\caption{The general frame of harvest files for MathWebSearch.}
\label{fig:mwsharvest}
\end{figure}
\ No newline at end of file
%
%\begin{figure}
% \lstinputlisting[language=XML, frame=single]{xml/mwsharvest.xml}
% \centering
% \caption{The general frame of harvest files for MathWebSearch.}
% \label{fig:mwsharvest}
%\end{figure}
\ No newline at end of file
......@@ -201,12 +201,13 @@ Erlangen, \today
\label{sec:implementation}
\input{tex/implementation.tex}
\section{Applications}
\section{Semantic Services}
\label{sec:applications}
\input{tex/applications.tex}
\section{Future Work}
\label{sec:futurework}
This section lists suggestions for the further development of the presented system.
The first two items focus on the use of additional natural language processing tools and on the
detection of more quantity expressions. New technologies that can enhance this system
......@@ -245,18 +246,22 @@ are suggested in item 3 and 4. The last recommendation mentions the runtime.
the arXMLiv documents.
\end{enumerate}
\section{Conclusions}
In this thesis, we have described how to extract the meaning of quantity expressions and units from STEM documents
and presented an implementation thereof on the arXMLiv corpus. It proved to be easily extensible, scalable and to
deliver promising results.
We have exploited these results to offer useful semantic services like the automatic conversion of
\section{Conclusion}
\label{sec:conclusion}
In this thesis, we have described the extraction
of meaning from STEM documents with a special focus on quantity expressions and units.
We have presented a rule-based and modular implementation thereof on the arXMLiv corpus which delivered promising results.
The architecture proved to be easily extensible by additional detection methods such as Frederik Schäfer's declaration spotter.
We have exploited the detection results to offer useful semantic services like the automatic conversion of
units in scientific papers which supports users to stay more focused while reading
and helps them to prevent calculation errors.
and helps them to prevent calculation errors by allowing them to convert quantity expressions by right clicking on them.
With the semantic enhancement of screen reading programs, we have also contributed
to the field of accessibility for STEM documents and hereby lowered the burden for
visually impaired people to participate in the scientific discourse.
These applications demonstrate the benefit of semantic information.\ednote{How to extend this?}
Additionally, we converted the spotting results in such a way that they can be exploited by Tom Wiesing's semantic search
engine for quantity expressions. It searches not only for the entered expression, but takes also equivalent forms into
account, say it also finds 212 degree Fahrenheit when searching for 100 degree Celsius.
These applications demonstrate the additional benefit of semantic information compared to common syntactic data.
\newpage
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment