Data Mining in Networks

A key challenge of data mining for counter-terrorism, as opposed to the applications I discussed earlier, is that the data are often relational. That is, the data consist of relationships among people, places, things, and events. Recall the example of money laundering detection — the key facts were relationships among several persons, businesses, bank accounts, and deposits. The individual records are essentially meaningless in themselves. Instead, only the network of relations among the people, places, things, and events forms a meaningful pattern.

This should be clear to nearly everyone, based on what has appeared in the news media since the September 11th attacks. Headlines talk of "terrorist networks" and "links" between individuals and known terrorist groups. The stories talk of meetings, financial transactions, and familial ties. Data about relations can be assembled to provide a picture of higher-level organizations and activities. These, in turn, form the basis for watch lists, indicators, and warnings of terrorist attacks.

This has important implications for the type of data mining and inference that is needed. The problem is far worse than "finding a needle in a haystack". In that analogy, the needle is easy to identify once it is observed. In contrast, many problems of counter-terrorism are, in the words of DARPA's Ted Senator, about "assembling and identifying dangerous needles in stacks of needle pieces." The problem is to infer the existence of clandestine organizations and activities, based on lower-level records that relate people, places, things, and events.