We now consider practical aspects arising from the increased expressiveness provided by algebraic datatypes for the purpose of machine learning.
In most machine learning systems missing values receive special treatment. Unfortunately, this often does not suit well to the particular application, because there may be different reasons why values are missing: they may have been dropped because they were not observable, they may not make sense in a specific context, etc. Therefore, a uniform treatment does not really solve the problem.
A straightforward and interesting solution to the treatment of missing values is provided by algebraic datatypes. For example, let’s assume we have a variable of type t consisting of values A and B, but that also allows for missing values. Then we could encode this as follows:
data t = A | B data maybe_t = Missing | Observed t
This way the machine learning system can explicitly learn models for cases where missing values occur without having to discriminate on t, which encodings like
data maybe_t = A | B | Missing
would enforce.
More about recoding existing data specifications using algebraic datatypes, advantages that can be achieved this way and more details on how to treat missing values in a fine-grained way can be found in [Mottl2002].
The lack of structure on both input and output values, as is the case for attribute-value representations, makes decision trees not very suitable for expert systems: some data may be collected in a step-wise manner, yielding ever increasing levels of detail. Since only structured attributes allow this, an expert system building on normal decision trees would sometimes force the user to reveal more information in one step to the system than necessary for particular queries. The dual case is that users may only want to know certain aspects of an output value, but normal decision trees would require them to provide sufficient information until a leaf node is reached, even though the partial answer might already be predictable at a much earlier point of time.
As we have explained in section 3.1.2, the presented algorithm (including model factorization) guarantees that a maximum of information is predicted with a minimum of input. Therefore, when a user enters data, the system can immediately tell, what can already be predicted from this information. Dually, when the user asks for a specific part of the result, the system will exactly ask questions only that are really required for determining the requested value.
There is a tight relationship between algebraic datatypes and context-free grammars (CFGs)22: from a mathematical point of view they are both so-called semirings, i.e. they are both defined using equations involving a product operator (product types vs. string concatenation) and a sum operator (sum types vs. alternative productions), both of which are semigroup operations. Types correspond to nonterminals of a CFG while the creation of sum types (alternatives) corresponds to alternative productions of a certain nonterminal. Arguments of constructors correspond to the structure of a production.
Thus, there is always an isomorphic data specification for context-free grammars involving algebraic datatypes. For example, let’s take the following CFG in Backus-Naur form (BNF):
<Sentence> := <Subject> <Predicate> | <Interjection> "!" <Subject> := "He" | "She" <Predicate> := "eats" | "sleeps" <Interjection> := "Wow" | "Ouch"
A corresponding set of type definitions could be specified as follows:
data sentence = SP subject predicate | IJ interjection data subject = He | She data predicate = Eats | Sleeps data interjection = Wow | Ouch
As we can see, each production requires the introduction of a new constructor. If some production contains structure, this constructor (e.g. SP) takes corresponding arguments for each nonterminal in the production. Recursive grammars imply recursive type equations.
Applying the generalized decision tree learner to two such encoded grammars (one for the source language, one for the target language) and sample data (parse trees of words in the two context-free languages), the system essentially induces tree transducers for derivation trees of context-free languages. The resulting algebraic datatypes can be unambiguously converted to words of the context-free target language again.