Programming: May 2006 Archives

The Washington Post's David Ignatius postulated about how the National Security Agency's system might work. In doing so, he provided an excellent example of data mining. What the NSA is trying to do is simple and complex at the same time. The data structure is simple, but the sheer volume makes it complex.

The problem may seem hopelessly complex, but if you use common sense, you can see how the NSA has tried to solve it. Suppose you lost your own cellphone and bought a new one, and people really needed to find out that new number. If they could search all calling records, they would soon find a number with the same pattern of traffic as your old one -- calls to your spouse, your kids, your office, your golf buddies. They wouldn't have to listen to the calls themselves to know it was your phone. Simple pattern analysis would be adequate -- so long as they had access to all the records.

The trouble is, simple pattern analysis isn't that simple when you start trying to code it. You would have a giant data cube, and you would have millions of slices to compare with each other. On the other hand, if you have one target number and have a query that pulls all its callees, you could craft another query that searches for those same numbers. You could then score new numbers based on old queries: each query would have a rating of between 0 and 1 with 1 being just like the original number.

If you have voice matching that can confirm a 1, you could design an artifcial neural network that learns as it targets new numbers. Voice matching would require eavesdropping -- but if you got a score of 1, it would be worth the trouble. This way, your neural network could learn what the score is between 0 and 1 that should trigger voice matching.

People (mostly database people) keep asking what I meant by reflexive query in my previous posts. Some thought I was confusing a self-join with a recursive query that would allow my DNS server to answer a DNS query for a domain for which it is not authoritative. What I mean is a query that returns a caller and a callee; and then another query that returns the callee's callees. In a self-join, I can match employers to their managers, since both are in the same table. While the NSA-phone tracking system might use some self-joins, what makes the network part work is getting queries from queries, and jumping from callers to callees. Of course I'd like all the results timestamped, too. Recursive queries are a part of this, and SQL 2005 can do it. If I had a few gigabytes worth of phone data, I'd love to let SQL 2005 loose and see what connections I could see. When I say reflexive, though, I mean that I'm going to use my queries to start other queries, and not just as sub-queries. The recursive part could lead to infinite loops. I wonder if the NSA hit any infinite loops when testing their system. Fortunately, you can specify limits on recursion in SQL 2005.

The scary part of this is what would happen if I had the resources to make it run really fast and tuned it to be as efficient as possible. I could select a target and find all of its connections in a few seconds or a few minutes. The difference between seconds and minutes would make a huge difference. In a system where it takes a couple of seconds to generate results, nobody would notice if I ran a few "unofficial" queries on my friends. If it took a few minutes and precious computer time, then people would notice. Utilitarian ethics.

This begs the question: how much computer time (and tax dollars) does our government use tracking down everyone who calls reporters? Then again, leaking classified information is unethical, but people do it anway. Contextual ethics. Of course, there's also the issue of selective enforcement, but that's a legal issue, not an ethical one.

Thus we're left with utilitarian ethics vs. contextual ethics. Who knew that efficient queries and more processing power could give one type of ethics an advantage over the other?

I do Windows, Unix (Solaris), and Linux (mostly Red Hat). Everyone who's into "open-source" keeps telling me how much more secure it is. I'm a CISSP and I've been installing open-source OSes since I had to know the chipset, IRQ and DMA of the NICs in my box to get networking to work. (The DEC Tulip was my favorite.) When I started working with Solaris 7 and Red Hat 4.x, telnet was enabled by default. I still wonder if telnet was enabled on a Trusted Solaris 7 default install. People who tell me any form of Unix is inherently more secure than any Windows don't seem to be familiar with the Morris worm, the Leshka Sendmail exploit, or BIND vulnerabilities. In fact, just mentioning BIND and sendmail in the same sentence is likely to send your security coordinator into the bunker for the rest of the day. Mind you, I've also seen IIS flaws. Can't we all just get along and implement security best practices on whatever platforms we're using?

Ruby on Rails shows a lot of promise as to helping people get up and running on applications quickly. The tutorials are pretty helpful , but there are a a couple of caveats:

In the configuration wizard, you can also just accept all of the defaults, except that in the security panel you must uncheck the "Modify Security Settings" checkbox (Figure 4). This is because starting with version 4.1.7, MySQL uses a new authentication algorithm that is not compatible with older client software, including the current version of Rails. By unchecking this box, you can access MySQL without a password.

This is not the path to secure computing. MySQL should NOT ship with a blank root password. Tutorials should not encourage the use of blank root passwords.

And they have you set up your server as to leave database.yml publicly available. I see Drupal attacks (xmlrpc.php) every day; it's only a matter of time before I start to see RoR attacks.

It's the developers' job to make it work. It's your job to make it work securely. Today's hackers don't even know C and have never heard of Kernighan and Ritchie; all they need is a script and an Internet connection to take advantage of your vulnerabilities.

About this Archive

This page is a archive of entries in the Programming category from May 2006.

Programming: April 2006 is the previous archive.

Programming: April 2007 is the next archive.

Find recent content on the main index or look in the archives to find all content.