NSA has Greatest Sociological Dataset of the 21st Century

| No Comments | 1 TrackBack

Between the phone records, emails, and instant messaging, the NSA can now map the social and business fabric of America and see how it is changing over time. By collecting our phone records, the NSA has created a dataset that can map our social connections. As I've mentioned in my previous post, the NSA can now map the calls we make and the calls our callees make, creating a giant tree of connections. A wrong number might get you placed on a no-fly list, but I'm not going into morality. (The current administration is the de facto law, so legality becomes a moot argument.) I'm just going to explore how they might use this data and what they could find from it.

The phone call data structure (date/time, origin number, destination number, duration, ID) could easily be used to scan emails and instant messaging as well: date/time, origin address, destination address, length, ID). Using the same reflexive queries, the NSA can track our social and business connections through email. They could also tell how those annoying chain letters get spread (Good TImes virus warning: forward this to all your friends). However, the size of each record would be larger, probably more than double the ~48 bytes required for a phone record. The size would slow down the query and require more storage. Also, geocoding the emails would be more difficult, although it could be done based on where the person writing them maintains a physical address, or on the originating Internet Protocol address.

With all this data, the NSA is in a position to know the difference in social networks that use telephones and those that use email. If they've had this program in place for long enough, they could even recognize trends social communication. With the right set of queries, you could spot the difference between neighborhoods where people know each other and neighborhoods where nobody knows each other and plot them in different colors on a map. What percentage of Americans calls their mom on Mother's Day? In which neighborhoods do people call their mothers most often?

It would also be somewhat easy to plot our connections in a GIS tool such as Google Earth. You would just export your target connection tree into an XML kmz/kml file and import it into Google Earth or ESRI's ArcGIS. The graphs derived from the data would by anonymous. (Of course, research ethics might prevent us from using the data.)

Sadly, none of the derived information will ever see the light of day. Academically, it would produce fascinating results that could teach us about how we communicate with each other and how our communications are changing. Then again, no local, state, or federal government information sytem about citizens has ever been immune from misuse. IRS workers read their neighbors' tax returns. Cops run license "plates for dates," and use NCIC (National Criminal Information Center) criminal background checks to help those running for election denounce their opponents. If a dataset about us exists, it will be abused by those with access, but appropriate security controls should prevent that.

1 TrackBack

TrackBack URL: http://cw.sampas.net/cgi-bin/mt7/mt-tb.cgi/119

Being assigned a data warehousing/data mining project for class sounds like fun, but where am I supposed to get a data set? I can buy a database of all area codes and exchanges with latitude and longitude, but I would... Read More

Leave a comment

About this Entry

This page contains a single entry by Larry published on May 13, 2006 2:36 PM.

How the NSA Might use our Phone Records was the previous entry in this blog.

Reflexive vs. Recursive Queries and Self-Joins is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.