Using Algebraic Datatypes as Uniform Representation for Structured Data

Author: Markus Mottl
Date: March 10, 2003


Austrian Research Institute for Artificial Intelligence
Machine Learning Group

Abstract: The question of how to uniformly encode structured data for the purpose of machine learning seems to have been rather neglected so far by researchers in comparison to the abundance of approaches how knowledge could be inferred from certain representations. We therefore propose a well-understood concept from the formal semantics of programming languages as vehicle for the uniform representation of discrete data, namely algebraic datatypes. It will be demonstrated in theory and practice that current encodings severely limit the power of widespread machine learning techniques especially what concerns handling of structured information, how algebraic datatypes elegantly extend expressiveness to evade these limitations and that this concept can guide the way to new learning algorithms. As an example, it will be shown how ordinary decision tree learning can be efficiently generalized to this data representation and that the latter provides for an interesting solution both to the missing value problem and to the representation of structured multi-attribute goals. Tight theoretical relations to logic yield valuable insights into complexity and expressiveness.


Submitted to: Machine Learning Journal, Special Issue on Inductive Logic Programming and Relational Learning
Copyright   ©  2003 Kluwer Academic Publishers
Author: Markus Mottlmarkus@oefai.at⟩
This document was translated from LATEX by HEVEA.