Data Mining in Networks

When the results of data mining algorithms contribute to high-level decision making, it is particularly important that the learned models be both easy to understand (comprehensible) and relatively simple (parsimonious). This enables human analysts to understand what has been discovered, to diagnose potential biases and errors, and to guide future data collection.

Unfortunately, many techniques in wide use today do not produce models that are comprehensible and parsimonious. In some cases, this is due to the underlying representation used to express the model. Neural networks, for example, can be extremely difficult or impossible for humans to understand. Another problem is excess complexity introduced by flawed algorithms. Tim Oates and I published a series of papers in the late 1990's on one such a flaw in algorithms for constructing classification trees. Finally, some recent work I've done with Jen Neville has shown that statistical biases can cause some relational learning algorithms to construct models with incorrect components. These models indicate that certain variables are correlated when no such correlation actually exists.