Commit 1a043e14 authored by Ulrich's avatar Ulrich

small bugfixes

parent 267fab09
......@@ -40,7 +40,7 @@ z zepto
y yocto
Units:
m meter
m meters meter
g gram
s sec second
A ampere
......
......@@ -169,7 +169,7 @@ pub fn analyze_part(part : &str, mut word_id : i32, para : &Node, dom : &DOM, pa
let (byte, char) = new_content.char_indices().nth(i).unwrap();
let c_string = char.to_string();
let c_str = &c_string.as_str();
if intermediate_delimiter.contains(c_str) {
if intermediate_delimiter.contains(c_str) && !char.eq(&',') {
// println!("byte {}, char {}", byte, char);
// println!("first {}", &new_content[last..byte]);
......@@ -191,7 +191,7 @@ pub fn analyze_part(part : &str, mut word_id : i32, para : &Node, dom : &DOM, pa
word_id += 1;
para.add_child(&wnode2).unwrap();
last = byte + char.len_utf8();
}else if char.is_digit(10) || char.eq(&'.'){
}else if char.is_digit(10) || char.eq(&'.') || char.eq(&','){
digits.push(char);
if byte > last{
......
......@@ -455,8 +455,23 @@ pub fn parse_config(map_path : &str) -> Config{
let p_m = prefix_map.clone();
let u_m = units_map.clone();
let prefixes : Vec<String> = p_m.keys().map(|x| x.to_owned()).collect();
let units : Vec<String> = u_m.keys().map(|x| x.to_owned()).collect();
let mut prefixes : Vec<String> = p_m.keys().map(|x| x.to_owned()).collect();
let mut units : Vec<String> = u_m.keys().map(|x| x.to_owned()).collect();
let mut prefix_values : Vec<String> = p_m.values().map(|x| x.to_owned()).collect();
let mut unit_values : Vec<String> = u_m.values().map(|x| x.to_owned()).collect();
for p in prefix_values.clone(){
prefix_map.insert(p.clone(),p);
}
for u in unit_values.clone(){
units_map.insert(u.clone(),u);
}
prefixes.append(&mut prefix_values);
units.append(&mut unit_values);
Config {prefix_map : prefix_map, units_map : units_map, prefix_symbols: prefixes,
unit_symbols: units, not_found_error : String::from("UNDEF CHECK MAPPING")
......@@ -480,6 +495,7 @@ pub fn node_is_or_ends_with_numeric(node : &Node, document : &Document) -> Optio
let mut content = node.get_all_content();
content = content.trim().to_string();
content = content.trim_left().to_string();
content = content.replace(",","");
let res = content.parse::<f64>();
......
......@@ -197,16 +197,21 @@ In order to spot quantity expressions, the rough pattern for their
detection consists of a numeric expressions followed by prefix and unit
symbols. There are two strains of this pattern, one for expressions that
are completely in MathML and one which involves changes between text and math.
We discuss the former first. Its pattern is shown in Figure \ref{fig:PatternCNML}.
For this, we have to define numeric terms, which we do inductively as follows:
\begin{figure}
\lstinputlisting[language=XML, frame=single]{xml/basicPattern.xml}
\centering
\caption{Basic pattern for finding quantity expressions in Content MathML.}
\label{fig:PatternCNML}
\end{figure}
We discuss the former first.
For the detection of quantity expressions in MathML, we first define the two
terms \textit{numeric expression} and \textit{unit symbol expression}. Numeric expressions are
inductively defined as follows. This definition allows to successfully detect,
for instance, terms like $5.0 \cdot 10^{-20}$.
%Its pattern is shown in Figure \ref{fig:PatternCNML}.
%For this, we have to define numeric terms, which we do inductively as follows:
%\begin{figure}
% \lstinputlisting[language=XML, frame=single]{xml/basicPattern.xml}
% \centering
% \caption{Basic pattern for finding quantity expressions in Content MathML.}
% \label{fig:PatternCNML}
%\end{figure}
%
\begin{itemize}
\item \inline{<cn>} nodes are numeric
\item \inline{<apply>} nodes are numeric if the first child is a
......@@ -218,29 +223,59 @@ For this, we have to define numeric terms, which we do inductively as follows:
\end{itemize}
and all other children are numeric.
\end{itemize}
Using this definition, terms like e.g. $5.0 \cdot 10^{-20}$ can
successfully be identified as numeric terms.
Quantity terms are either \inline{ci} or \inline{mtext} nodes or
\inline{apply} nodes, where
the first child is a \inline{<csymbol>superscript</csymbol>}, the second one is
a \inline{ci} or \inline{mtext} node and the third one -- the exponent --
is numeric.
\noindent
%Using this definition, terms like e.g. $5.0 \cdot 10^{-20}$ can
%successfully be identified as numeric terms.
We use to term \textit{unit symbol expression} to denote the encoding of unit symbols, possibly with
their corresponding exponents, in MathML. The simple case for this definition are \inline{ci}
and \inline{mtext} nodes, which contain unit symbols.
The format is the following for the case of additional exponents for the unit symbols:
\lstinputlisting[language=XML]{xml/applysuperscript.xml}
Note that \inline{ci} and \inline{mtext} nodes are further processed in
exactly the same way, since both are often used similarly, as we discussed
in the last section.
%are either \inline{ci} or \inline{mtext} nodes or
%\inline{apply} nodes, where
%the first child is a \inline{<csymbol>superscript</csymbol>}, the second one is
%a \inline{ci} or \inline{mtext} node and the third one -- the exponent --
%is numeric.
\inline{apply} nodes indicating a subscript are also quantity terms, but depict
a special case. Nodes with subscripts are ignored by default and only
exceptions are further processed. At the moment, these are limited to
astronomic mass units, like $M_\odot$ for the mass of the sun and
$M_\oplus$ for the mass of the earth, compare Examples 8 and 9 from Table~\ref{tab:simpmutcat}.
astronomic mass and radius units, like $M_\odot$ for the mass of the sun and
$R_\oplus$ for the radius of the earth, compare Examples 8, 9 and 10 from Table~\ref{tab:simpmutcat}.
Using the definition of numeric and unit symbol expressions, we can now
distinguish different formats of quantity expressions in MathML.
This can be illustrated by the following three examples: $5\;s$, $5/s$ and $5\;m/s$.
The general pattern of the encoding of the first two is displayed in
Figure~\ref{fig:PatternCNML} with \inline{times} or \inline{divide}
respectively. The latter example is encoded as one multiplication and one
division -- the corresponding content MathML pattern in shown in Figure~\ref{fig:DivPatternCNML}.
\begin{figure}
\lstinputlisting[language=XML, frame=single]{xml/basicPattern.xml}
\centering
\caption{Basic pattern for quantity expressions in Content MathML.}
\label{fig:PatternCNML}
\end{figure}
\begin{figure}
\lstinputlisting[language=XML, frame=single]{xml/dividePattern.xml}
\centering
\caption{Divide pattern for quantity expressions in Content MathML.}
\label{fig:DivPatternCNML}
\end{figure}
This pattern matches for the examples on the left hand side of Figure~\ref{fig:first}
and~\ref{fig:second}. For the further analysis the quantity terms are simplified
Two samples matching the presented patterns are displayed on the left side of
Figure~\ref{fig:first} and~\ref{fig:second}.
For the further analysis the quantity terms are simplified
to a list of strings with the corresponding exponents begin attached, i.e.
[(\textmu, 1), (m,1)] and [(Wcm, -2), (\textmu, 1), (m, 2)] for the mentioned examples.
%This pattern matches for the examples on the left hand side of Figure~\ref{fig:first}
%and~\ref{fig:second}.
%which has the advantage that there is no difference in
......@@ -360,7 +395,6 @@ at the end:
% multiplication and followed by a ``C'' or ``F'', then the unit is
% classified as either degree Celsius or degree Fahrenheit. Otherwise it
% is classified as degree of arc.
\ednote{find examples of degree C and add them to the categorization}
\subsubsection{Format of the Annotations} \label{sssec:output}
......
<apply>
<times/> or <divide/>
Numeric Term
Quantity Terms
Numeric expression
Unit symbol expressions
</apply>
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment