A company called Paterva
has, for the past few years, been developing a data mining tool, or as they describe it, an "open source intelligence" package, called Maltego. The "open source" does not refer to the source code, but to the sources of information it uses.
Essentially, it takes small amounts of information, like domain names, phone numbers, NS and MX records, email addresses, webpages and similar information and "transforms" it, which is a fancy word for doing things like WHOIS or dig or other network tools that most people who have been Internet geeks for a long time already know how to do manually. That stuff isn't terribly interesting, and anyone likely to be interested by this software already knows how to do most of this stuff.
What is interesting is not the kind of data it gets, although it is sort of a killer app in doing all the kinds of things you do when investigating a target already. What is interesting is how it correlates the information and enables one to automate large numbers of such queries and view the map of pieces of data and their connections in a number of graphical formats, from a "mining" view that is suitable for gathering information, to an "edge-based" view that, based on the number of incoming links to a node, increases them in size to indicate which ones might actually be important and worth pursuing further.
It has a number of clever selection and deselection tools to make it possible to winnow out the enormous amounts of chaff it can generate when doing shotgun searches.
There's a series of videos explaining how to use
it on YouTube. I'd advise watching it first, because the package is highly complex with a fairly steep learning curve and I haven't yet grokked whether it has great utility (at least for my purposes) or whether it just shows promise.
I also have to warn anyone not just to download it and start doing massive numbers of searches. A lot of the "transforms" are quite cryptic if you don't know what they are, and some of them (DNS zone transfers for instance) are the sort of thing that look like (and kind of are) hacking. Other utilities, similarly, may do things like issue hundreds or thousands of DNS requests to nameservers, another thing that might look like hacking or a DOS attack. Be careful, and don't do anything if you don't know what it is. Be particularly wary of doing all available transforms simultaneously.
The software isn't limited to Internet sources and users can import or export CSV files (databases) or other forms of their own data and then import it, merge it into existing datasets, invent their own "transforms," and otherwise tweak the software. That kind of stuff is above my ability as of now.
This software used to have Facebook scraping capabilities, but the makers removed that after legal threats. Apparently, the user community has re-created this capability and it's possible to put it back in.
While the examples in the videos are mostly mapping corporate networks, matching IP and MAC addresses in a list of suspected hash crackers to user IDs and passwords to try to figure out a network of crackers, and things like this, it can also be used on social networks and groups of people. When combining the sets of Internet-based information with other knowledge one has, the outlying nodes can be surprising and interesting. A lot of it is stuff I could have come up with eventually by fumbling around for a while, but it can turn up some connections remarkably quickly.
Caveats: the user interface is very tricky, the software does things that could attract unwanted attention if used carelessly, especially by creating large amounts of suspicious looking network activity. Things like scraping Facebook and other social networks also raise legal issues and, at the very least, risk violating Facebook TOS and getting your account canned. Additionally, many of the more arcane searches are done through Paterva's servers, which in the event of a legal issue resulting in subpoenas, could leave discoverable evidence there.
There is a free version that limits the number of results each "transform" can return to 12 (the commercial version goes up to thousands), but for most of my uses, that would seem to be sufficient. When the targets are people in person to person social networks, if you're getting much over 12, most of the results are going to be pure noise anyway. The commercial version of the client is $650 with a $320 annual license renewal. That would probably be well worth it to a commercial user using it for its core purposes. Where they appear to be marketing to high rollers, they also have a server product, essentially to have one's own server instead of transmitting potentially sensitive information over the Internet. The server license is $19,000 for "professional" and $12,000 for "basic" with an annual license renewal fee of 20% of the initial cost. For that kind of price, of course, they guarantee tech support to keep it working. This seems to be marketed at law enforcement agencies, who obviously can't be transmitting that kind of information to other people's servers. That's also marketed at anyone who has their own proprietary information they want to mine, which obviously wouldn't be on Paterva's servers in the first place.
I'm currently just toying with this, but it may be useful in investigating the networks of crazy birthers. If so, even the free version is probably worth playing with. Again, be careful with it. It does a lot of things you probably shouldn't do unless you know what they are. Specifically, I'd avoid doing DNS zone transfers unless you have a legitimate purpose for it. Some network admins take it amiss. (Of course, most of those simply turn off the ability to do it to them, but you never know.)
Edit: Fixing a misquote of the price.