Main

April 9, 2007

I graduate from grad school, get a new job, a new house, and become a father.

I graduated from Grad School with a Master's in Information Systems Technology, focusing on Management Information Systems.

It's official. George Washington University sent me my degree in the mail. They took three and a half months to get it out. Even the registrar didn't change my status until March after a couple of phone calls. A lot has happened since then: I moved into a new (old) house, started a new job, and am about to become a father.

What did I really learn in grad school anyway? I learned a lot, but every class covered, to some degree, entity-relationship diagrams (ERDs), data-flow diagrams (DFDs), and object-oriented diagrams, which can be state charts, class diagrams, and use cases, to name a few. Some classes went so far as to cover the theory behind them. Every class covered the relational database model, which hasn't changed much in thirty years and is still useful and relevant to just about every information system I've ever worked with.

Since IS grad school is part of the business school, we learned to work in teams. It's not about writing code -- it's about finishing projects on time. The funny part is the professors don't teach much about team projects -- they just expect you to manage yourselves.

October 2, 2006

How to Reverse Engineer a Database with Microsoft Visio

What do we study in Information Systems Grad School? If there's a single topic that comes up in every class, it's databases. If we haven't memorized the first three normal forms by now, we haven't learned much. While few of us will bother going into Boyce-Codd Normal Form, 4NF and 5NF, every specification for a system we write that has a database needs an entity relationship diagram. An ERD is a visual representation of your data model, and your data model is probably the single most important part of any system you design. A good data model will survive several major versions of your software; a poor data model will make your system useless. Thus, we spend a lot of time doing data models and documenting them with ERDs.

As much as I love Visio, drawing the things from scratch is somewhat tedious. It's much easier to design and test in Access. (I have it on good authority that even elite Oracle DBAs who hand-tune Solaris for better performance will design and test in Access just because it's easy.) So what do you do when you have a decent test DB in Access and you don't want to diagram every little change in your masterful Visio ERD? Reverse engineer.

In Visio, it's fairly easy, but there are a couple of spots where it doesn't behave as nicely as it should. I'm going to refer to Visio 2007, still in Beta and free for the download and registration. Visio 2003 is almost the same. Visio 2003 Enterprise Architect Edition will create the database from your diagram, in case you can design an enterprise DB but don't know how to create the tables in SQL. (Not really someone you'd want touching your SQL server.)

1. Open Visio and select New | Software & Database | Database Model Diagram with the units of your choice.
2. Now that you have a database model diagram open, the database menu will appear. Select "Reverse Engineer" off of the database diagram.
3. A confusing dialog box will appear. Use it to verify you have the right drivers installed.
4. For Microsoft Access, choose Microsoft Access as your driver, and hit Next.
5. A username and password dialog box will pop up. Unless you've assigned a username and pw to the database, leave it blank and hit OK.
6. Navigate your filesystem and select your database. Ignore the clunkiness and be grateful that you can see filenames longer than 8.3.
7. When you have found your .MDB file, choose it and hit OK.
8. Select the types of objects you would like to import and hit Next. (No, you don't get stored procedures and triggers in Access, but you would in SQL.)
9. Select the specific tables, queries, etc. you want to see in your diagram and hit Next.
10. Select Yes to add them to your current diagram, and hit Finish. (Select no you have a lot of tables, queries, etc.)
11. You should see your tables in the diagram.
12. To add the crow's feet and cardinality, select options on the database menu. (Database | Options | Document )
13. You get three sections to change here: The General tab covers symbol sets: IDEF1X or Relational, Conceptual, Physical, both, or names based on symbol set. The table tab lets you display keys, indexes, non keys, and the IDEF1X optionality 0. The relationship tab lets you display relationships (duh), crow's feet, cardinality, and referential actions. You must select cardinality before you select crow's feet. (caridinality gets greyed out when crow's feet is checked.)
14. To update your diagram, select "Refresh Model" on the Database menu...

Now that you can see your information model, you know why it's messed up. That database that your business/organization/department runs on -- it's not in any kind of normal form. Or it has about 100 tables more than you thought it should.

The database people in my office have a debate: were the software engineers just trying to make it impossible to wean your organization off of their support, or were they just bad at information modeling?

September 22, 2006

Visit my new Educational Technologies Blog

Given that I have a new job in education, I started doing a few entries about recent developments in education and technology. I use the framework of educational technology to answer President Bush's question, "Is our children learning?"

At least it explains the dearth of recent posts here.

September 14, 2006

Threat Analysis and Modeling Tool, Office Groove Beta

Just when I thought I would stop experimenting with new Microsoft products and hunker down into my new job and my capstone project, I find a couple that will save me time and aggravation: Microsoft's Threat Analysis and Modeling Tool. This dot.Net 2.0 application has a wizard to create for us a CRUD matrix, that will can paste into our system security chapter. A CRUD matrix is simply a table of which users need what kind of access to which tables. It can get more complex if you have column-level security in your database. We don't, so it won't be overly detailed.

The other tool that can save me time, and possibly my group members, is Microsoft's Office Groove 2007 Beta. Like many cool Microsoft products, Groove did not originate at Microsoft. Most people I talk to about Groove don't really get what it does, but I blame that more on bad IT and IS metaphors than anything else. Groove is simply a shared workspace where a team can share documents, discussions, and contacts. When you set up Groove, you can set up one or more shared workspaces. You then add specific files to that workspace. Then all your invited and verified team members can access those documents whether you are online or off. No more uploading to Sharepoint or Blackboard, or whatever. You've got anywhere, anytime sharing, provided you have an Internet connection. I have it on my work computer and my home computer and share documents between them without having to email them back and forth. So far, I have no firewall issues either.

What if you want to secure your data and not have it stored on third-party servers? Use Groove Server, and give more money to Microsoft. Before the Office 2007 launch, both Groove and Groove server are free for the download, registration and product keys required. Try it and at least you'll know what people are talking about. The next big thing is enabling teamwork and collaboration and making the world "flat."

August 4, 2006

Services for Unix in Six Easy Steps

After tiring of redoing samba.conf files over and over again, I finally tried out Microsoft's (free) Services for Unix for simple file sharing between my Fedora Core 4 box and servers on my domain.

1. Copy over /etc/passwd and /etc/group to a secure folder on the windows machine where you are going to install SfU.
2. Download and install Microsoft's Services for Unix, and tell it to use password files during the installation process. (This is not a lesson on setting up a NIS domain -- just connecting one Linux box to your Windows server quickly and reliably. SfU installs several other items by default, like Unix Perl and grep; ActiveState Perl is optional. You want NFS Server.) Reboot.
3. In Windows, right click on the folder you'd like to share, click and NFS sharing tab, and select "share this folder."
4. On your linux box, as root, add a line like this to /etc/fstab:
myserver.com:/somefolder /somefolder nfs defaults 0 0
5. On linux, mount /pub
6. cd /pub and ls -la to your heart's content.

There are many security implications for Windows and Linux, like sharing your password and group files, and I'll sniff and trying cracking the passwords later.

August 3, 2006

Business Intelligence Studio: A Wizard for your Data Mart

My grad school project group finished our data mart for class. I learned a few things about SQL Server 2005 Analysis Services, Reporting Services, and Business Intelligence Studio along the way. One of our group members is a SAS programmer, so he provided us with simulated data: attendance records for a theoretical amusement park that included zip code and promotion type, with zip-code ACORN differences. Our idea was that if you came to the park with a coupon, we would know from the bar code where you came from; if you didn't have a coupon, the gate would ask you your zip code, just like at the retail store.

We had six million rows in our fact table, which included dateID, promotionID, zipID, and attendance, giving us three dimensions. We were going more for scalability rather than trying to pile in a lot of dimensions. I took the flat files from our SAS master and imported them into a SQL 2005 database. For some reason, SQL 2005's import tool defaults to a nvarchar(50) type. While six million records isn't much, the records were fixed-size and much smaller, so I was able to stuff them in an nchar type that was small and efficient because it doesn't require an offset column array. Our six million fact table records matched to 5 types of promotion, about 10,000 zip codes, and individual days for 3 years.

Once we created the database, I exited SQL Server Management Studio and opened up a new Business Intelligence Studio project. One difference between Management Studio and BI Studio is that BI Studio does not use SQL authentication. You must use Windows/Domain accounts. There is no sa in Business Intelligence Studio.

Once in BI Studio,you create a data source, just like with any project that involves a database. Then you create a data source view by selecting the tables you want to use and creating a dimensional model diagram. Microsoft calls this their Unified Dimensional Model. While your fact table doesn't need a primary key, your dimension tables do. The arrows should be pointing from your fact table to the dimension tables. I don't know why, but I often get this wrong when I'm creating a new data source view.

Finally, your fact table columns need to reference the right dimensions. Matching these up took a little longer than expected because I had to guess which fields were what, because our SAS genius didn't document which numbers were what. I looked at the 8-digit numbers and guessed wrong. The auto-build cube wizard worked fine, and the dimensions looked OK. Processing the cube failed when it timed out, and the error messages didn't say anything about orphaned fact rows.

Once the columns were matched up correctly, everything worked fine, and the auto-build cube wizard is pretty impressive. If you did everything right, building the cube should give you dimensions. You can create new dimensions if the wizard missed some. The time dimension has many options. At first I set time to be a regular dimension to avoid promblems: is day of week a number or a name? Monday or 1? While this will give you good reports, the order of weekdays and months will be alphabetical. (Which makes be think we should rename all month and day names so they can be alphabetical and still be in order, or re-sort them so that April is the first month and Friday the first day of the week.) Setting the time dimension as a time dimension rather than a "standard" dimension will fix this.

Processing the cube took under five minutes. (You also need apppropriate account permissions to process the cube) If the cube processes sucessfully, you can browse pivots immediately with the browser in BI Studio. Once you see that your dimensions work, you can start connecting to SQL Server Analysis services from Excel or from SS Report Server. (Again, with appropriate Windows Domain accounts.)

Microsoft's BI Studio Cube and Dimension wizards work the way they should: if you have defined your dimensional model accurately and your data are valid, they'll create the right dimensions for you. The wizards are no substitute for a valid dimensional model and they can't fix bad data or orphaned records.

Next: Creating Pivot Table Reports using our cube and SQL Server Reporting Services for IIS.

July 20, 2006

Google vs. the National Security Agency

Apparently, watching Google is now as much sport as watching the NSA, according to the latest in Baseline Magazine. Discovering the way Google solves data-related problems may be more interesting because Google, unlike the NSA, is not encumbered by government contracting procedures and regulations.

Think about it: you can search the web with Google and find files faster than you can when you're looking for files on your own computer using Windows Search. To learn why it takes longer to search a hard disk on your own *#$% computer than the web, read the Baseline story.

The National Security Agency and Google are in the same business, essentially: take a firehose spitting out information and sort it into something useful. Both the NSA and Google keep their collective mouths shut about sources and methods. The NSA has been slightly better about keeping purchases of high bandwidth out of the news, but only because they have an organization advantage of operating outside the traditional business community (assuming Watkins-Johnson is not a normal business).

The Baseline story estimates the number of Google servers at somewhere around 450,000, but you should think of them as a much smaller number of MPP supercomputers. Google initially had trouble because most data centers couldn't deliver enough watts per square foot to power dense server blade environments, so they turned to AMD processors. That's a process of scaling up computing power, and I wonder how the NSA solved the same problem, although I assume they just pumped in more watts for processors and cooling. Those of you familiar with Microsoft's current file system, NTFS, may know that you can set disk cluster size from 4 kbytes to 64 kbytes. Google's file system has a cluster size of 64 Mbytes. Their files are large, and a large cluster size leads to more efficiency. Google has re-engineered kernel, filesystems, and who knows what else for scalability. Did they re-engineer from the ground up more efficiently than the NSA?

Another facet of the Baseline Google story is the office-in-a-box. As a former IT contractor for political campaigns, I had to figure out the cheapest fastest way to set up a computing infrastructure for a field office in, say, Des Moines, Manchester, or Columbia. (Columbia is the capital of South Carolina, for those of you out of the primary calendar loop.) My setup was fairly simple: router, firewall, server (Domain Controller also running DHCP, DNS, and a Global Catalog, of course), printer/copier.

Google has office IT-in-a-box that would put mine to shame. Google also has shipping containers converted into server infrastructures that they can ship anywhere. Baseline implies that the military's IT- infrastructures-in-a-shipping-container exist in Powerpoint only.

As far as the Google vs. NSA operating efficiency battle goes, at least there's more than one career option for deep geeks. I would have a hard time deciding between the two because they both offer serious computing power. As far as ethical considerations go, both have pluses and minuses. The NSA doesn't make money selling advertising; as hard as Google tries, running a business requires some level of compromise to make money. The power of both organizations could be abused. The way things are going in the Intelligence Community, Google will be more secret than the NSA in five years.

What do you think?

June 29, 2006

Network Analysis With Free EtherPeek: Ethereal Gets Some Competition

Way back in 1999, I was looking for a packet analyzer. I was familiar with EtherPeek for the Macintosh from a few years before, and I found that the AG Group was producing EtherPeek for Windows, too. The AG Group is now WildPackets, and they are exceedingly helpful to anyone that has to troubleshoot data networks. AG Group always offered some cool network freebies: IP Subnet Calculator, netTools and a great protocol reference chart.

One of their people, J. Scott Haugdahl, has an excellent book, Network Analysis and Troubleshooting, which offers a bottom-up review of the OSI 7-layer model . (Which one are you: All People Say They Need Data Processing or Please Do Not Throw Sausage Pizza Away?)

I liked EtherPeek and the book so much that I bought both and paid out of my own pocket even though my job was managing the network. Of course, this was back in the day when running tcpdump required you to know your IRQ, DMA and chip set (i.e. DEC Tulip). My job at the time was helping change a campus network from Netware to TCP/IP when Windows and Macintosh didn't even install a TCP/IP stack by default. We went from three-and-a-half network protocols (two different Netware frame types) to one and a half (we still had a couple of AppleTalk issues.) Each computer was on the Internet with a public IP address and no firewall. The ping of death still worked against most machines, and we also got hit with Smurf and Trinoo attacks that would disrupt all online activity.

WildPackets makes some excellent packet analyzers for wired and wireless networks. Now their base-level product is free: OmniPeek Personal. While I have been using Ethereal since my old version of EtherPeek became obsolete because it was on my ancient Dell laptop, I missed EtherPeek because it was the first packet analyzer I really got to know well. I could create filters and find exactly what I needed to find. EtherPeek also had good summary statistical functions, which could tell me who was producing the most traffic on my networks. Omnipeek Personal is better than my copy of EtherPeek was because it includes some expert analysis about bad packets and delayed response times. It also produces HTML statistics just like the original, and it has a better interface than Ethereal, using color to show differences between packets.

For those of you that underestimate the power of color, try printing a Google or Mapquest map in black and white and one in color and see which one is easier to read while you're driving. OmniPeek makes it easier to read your packet stats and is easier on your eyes than Ethereal. It's also supposed to do wireless captures -- I'll update when I get a compatible chipset wireless card.

June 28, 2006

The Visio 2007 Beta

Since I started using visio in 1999 or so, I loved it. Microsoft bought Visio for $1.5 billion, which was the most Microsoft had paid anyone for anything. Since then, Microsoft has incorporated it into its Office line.

I usually don't use a lot of Beta, but I had no choice. Visio 2003 does not connect to MS SQL Server 2005, even with SQL Native Client installed on my laptop. I had two choices: download and install my (student free) copy of Visio for Enterprise Architects on my soon-to-be-dead laptop, or download a free Visio 2007 Beta.

I use Visio for diagramming almost anything technical, from rack diagrams to network and Active Directory diagrams to schoolwork like data flow diagrams, class diagrams, statecharts, and entity relationship diagrams. (I can't afford ERWin.) You can even export from MS Project into MS Visio to create GANNT and PERT charts that are more customizable than what you can do in Project. (Although for updating diagrams on large projects quickly, nothing beats Critical Tools which does a much better job of creating Work Breakdown Structures and PERT charts than MS Project.

One of my favorite features in Visio is reverse-engineering databases. I find it much easier to create databases in Access and then reverse-engineer the diagram in Visio. I can also test out the Access database and see if I can get the reports I need with the right queries. (I hear that even Oracle DBAs with years of experience test things in Access.) I can also use this feature to investigate vendor-supplied databases. (One-size-fits-none databases tend to have hundreds of tables.)

In Visio, I just create a new database diagram, then select Database | Reverse Engineer and point it at my data source, which is still a little cumbersome to set up on a new non-Access database. After importing the tables, indexes and queries I need, I can select Database | Options | Document and hit the checkboxes for cardinality, crow's feet, and actions for relationships. This box has changed slightly for Visio 2007, and it looks like the IDEF1X symbol set is also new, and it will be especially helpful to defense contractors.

Another good thing about Visio 2007 is that I can use all my old stencils, including the giant pack of slightly dated Altima stencils that came with a 3com switch. Since I can't afford to buy lots of custom stencils, I am very happy to see that more vendors are offering free equipment representations for their products at places like the Visio Cafe.

If you're looking for a free version of Visio to work with, the Visio 2007 Beta will work. Mine hasn't even crashed yet.

June 20, 2006

Counting Web Attacks

I see a lot of 404 errors in my Apache logs. A 404 error is a file not found, e.g. someone has requested a file that's not there. Often it means I made a typo in a configuration or HTML someplace. More often, it means someone someplace is probing my server for weak web applications.

Linux and open source software have made it easy to add web applications running under Apache and MySQL. The problem is as more and more sites start using these cool web applications, hackers are able to find holes in them. The developers fix the holes and release patches, but many webmasters don't apply the patches. Thus I see probes like the one below in my Apache logs:

212.83.253.101 - - [19/Jun/2006:09:24:49 -0400] "GET /a1b2c3d4e5f6g7h8i9/nonexistentfile.php HTTP/1.0" 404 320 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:49 -0400] "GET /adxmlrpc.php HTTP/1.0" 404 294 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:49 -0400] "GET /adserver/adxmlrpc.php HTTP/1.0" 404 303 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:49 -0400] "GET /phpAdsNew/adxmlrpc.php HTTP/1.0" 404 304 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:50 -0400] "GET /phpadsnew/adxmlrpc.php HTTP/1.0" 404 304 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:50 -0400] "GET /phpads/adxmlrpc.php HTTP/1.0" 404 301 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:50 -0400] "GET /Ads/adxmlrpc.php HTTP/1.0" 404 298 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:50 -0400] "GET /ads/adxmlrpc.php HTTP/1.0" 404 298 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:50 -0400] "GET /xmlrpc.php HTTP/1.0" 404 292 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:51 -0400] "GET /xmlrpc/xmlrpc.php HTTP/1.0" 404 299 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:51 -0400] "GET /xmlsrv/xmlrpc.php HTTP/1.0" 404 299 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:51 -0400] "GET /blog/xmlrpc.php HTTP/1.0" 404 297 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:51 -0400] "GET /drupal/xmlrpc.php HTTP/1.0" 404 299 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:52 -0400] "GET /community/xmlrpc.php HTTP/1.0" 404 302 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:52 -0400] "GET /blogs/xmlrpc.php HTTP/1.0" 404 298 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:52 -0400] "GET /blogs/xmlsrv/xmlrpc.php HTTP/1.0" 404 305 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:52 -0400] "GET /blog/xmlsrv/xmlrpc.php HTTP/1.0" 404 304 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:52 -0400] "GET /blogtest/xmlsrv/xmlrpc.php HTTP/1.0" 404 308 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:53 -0400] "GET /b2/xmlsrv/xmlrpc.php HTTP/1.0" 404 302 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:53 -0400] "GET /b2evo/xmlsrv/xmlrpc.php HTTP/1.0" 404 305 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:53 -0400] "GET /wordpress/xmlrpc.php HTTP/1.0" 404 302 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:53 -0400] "GET /phpgroupware/xmlrpc.php HTTP/1.0" 404 305 "-" "-"

This is a probe, not an attack. There's nothing illegal about requesting files that aren't on my server, is there? But if I touch /var/www/html/adxmlrpc.php, we may find out what happens next. Note that most of these requests, while probing for different applications, share one thing in common: RPC on PHP.

The below is chart of probes by date and request on this webserver. There's not enough space to list each one as it corresponds to the color. (MS Excel shows me data point details info on mouseover in my pivot table.)

Attacks by Application

June 14, 2006

Friendster and LinkedIn Meet Vonage and Skype, or Link Analysis and Transitive Closure

The ultimate social and business network would include more than just email contacts as Friendster or LinkedIn do. It would include the people you call using Vonage and/or Skype. Using the the Call Detail Records (as the National Security Agency might) to do what law enforcement calls "Link Analysis," a social or business network could connect you via phone numbers.

The telcos have been doing link analysis for years as part of their fraud detection programs, and what the NSA might be doing is not much different. Link analysis is really transitive closure, but most computer security and law enforcement people don't know relational algebra, so they call it link anlysis.

Transitive closure (aka recursive closure), at its simplest is this: The transitive closure of relation [table] R with attributes [columns] (A1, A2) defined on the same domain is the relation R augmented with all tuples [rows] successively deduced by transitivity; that is if (a,b) and (b,c) are tuples of R, the tuple (a,c) is also added to the result. (From Connolly and Begg's Database Systems, in reference to Timothy Merrett's Relational Information Systems, 1984). Since I was interested in the relational algebra, I bought a "new" copy of Merrett's book from an Amazon reseller for $8. In defining closure, Merrett refers to Aho, Hopcroft and Ullman (1974), and says, "to do so here would involve too much of a mathematical digression." It's 2006, and a book published in 2005 references another book from 1984 (!) that refers to a book from 1974. The relational database model has not changed much since E.F. Codd's work in 1971. What has changed is the scalability of hardware that we use to run our relational database management systems.

One example of transitive closure that project managers might understand is an exercise in the Merrett book: "Find the expression which gives PATHS the duration of all sets of activities in" and lists the data for the PERT chart. A query (or your relational algebra expression) would show all the paths through the network, and should probably show the critical path as well.

What makes the PERT chart example interesting is that it can show more than one path through a network between two nodes. When talking about link analysis using call detail records, many models show single links between nodes. In Investigative Data Mining for Security and Criminal Detection, Jesus Mena lists a couple of COTS off-the-shelf link analysis tools, ATAC and Analysts' Notebook. These systems can take call detail records and produce links and even chart them on graphs. Mena's book lists many tools, including some free applications and others with a free demo. For the documention of the tools alone, the book is worth the price. Mena's book details a lot of the history of AI and datamining in the security community, but it also confuses database terminology (related relations, e.g.) to make it understandable by the law enforcement community. Despite this, Mena implies that law enforcement in the 21st Century is going to need a lot more artificial intelligence and database experts.

Sample query to bring up people in your network:

SELECT callee
FROM table.cdr
WHERE callee = 'my_target_no'
UNION
SELECT callee
FROM table.cdr
WHERE caller = (SELECT callee FROM table.cdr WHERE caller = 'my_target_no');

The trouble with this query, adapted from the manger-employee recursive example that everybody learns in database school, is that it would eventually return everyone with a telephone. Thus the iterations must be controlled, and I need to adapt the query above from a recursive query into an iterative one if I were going to make it work on SQL 2005.

June 11, 2006

Connecting Sharepoint to SQL 2005 Report Server

It seemed simple: Export OLAP reports from SQL 2005 Reporting Services into Sharepoint. I like Sharepoint because it solves a ton of problems in organizations. I'm still surprised at how many Microsoft shops don't use Sharepoint because it's free and it integrates into Active Directory. (Sharepoint Portal Server, a different product, costs money, scales more and is personalizable.) All you need for Sharepoint is IIS and SQL or the MSDE; and FrontPage 2003 if you want to edit graphics. Microsoft has a lot of Sharepoint resources available for download, but they're not well organized.

The details slowed me down a few hours. There are several different ways of configuring security contexts, and you will have to keep your accounts and passwords straight. I have yet to find a step-by-step on Technet, but I'm still looking. I did see a page showing cool OLAP reports in Sharepoint on Technet, but no link to help me set it up.

The biggest problem that I've seen many other folks have is the 404 Bad Request error in the /Reports ReportManager Virtual Directory. /ReportServer worked the first time, but without the ReportManager Virtual Directory, it's not so useful. At first I thought this was a DCOM security issue because of the event log entries I got. (Ten of these on the first request for http://myreportserver/reports after restarting IIS and then no more until restarting IIS.)

The application-specific permission settings do not grant Local Activation permission for the COM Server application with CLSID {BA126AD1-2166-11D1-B1D0-00805FC1270E} to the user NT AUTHORITY\NETWORK SERVICE SID (S-1-5-20). This security permission can be modified using the Component Services administrative tool.

The trouble with that message is that there's no DCOM component in Component Services that corresponds to the CLSID. This didn't stop me from searching the registry for a while, finding that the CLSID is involved with about a dozen basic network services, none of which are in the Component Services MMC.

I gave up searching the Registry and I added NT Authority\Network Service to the DCOM user group on the local machine and restarted IIS. No joy. I was able to clear the event log of that error this way, but I still got the same error requesting http://myreportserver/Reports, just with no event log entries. I rechecked all the settings in SQL Server's Report Configuration Tool, which is very useful, but still didn't solve the problem.

I googled the source code on the error page:

System.Net.WebException: The request failed with HTTP status 400: Bad Request.

and found a site at MIT concerning a totally unrelated applicaton that threw the same error. I had one other Virtual Web on the machine, so I deleted it and reset my Default Web Site set to All Unassigned IP addresses and restarted IIS. Bingo. I can manage reports over the Web -- it just takes a while to start up the first time you request http://myresportserver/Reports. I can access it from http://localhost/Reports on that box now; before localhost requests failed, and I didn't know why.

I still have to set the right permissions for everything. I also need to choose whether to share a data connection or use the web visitor's security context. Just listing all the security contexts makes me dizzy: The Sharepoint App Pool, the Report Server App Pool, SQL Report Server Data Sources, the DCOM permissions mentioned above, and finally, your users' accounts in Sharepoint and Reports.

Sharepoint doesn't hold the Report -- it just passes your request on to the Report Server. Thus, you'll need to set permissions for the Sharepoint and the SQL Report Server. If you have Sharepoint permissions but not Report Server permissions, the Report Explorer web part will be blank.

Steps that worked for me:
1. Start with a good SQL 2005 install with all necessary components -- like Reporting Services.
2. Install IIS and ASP.net 2 if they're not installed already. I installed SQL 2005 Service Pack 1 after this step. (Make sure you have only a default web site on IIS to avoid my issues.)
3. Use the SQL 2005 Report Configuration Manager. This is when you'll need to decide which security schema you're going to use before you can complete this. The Configuration Manager saves a lot of time because you won't have to touch IIS Manager. (The whole scripting IIS configurations in XML thing is going to make my IIS skills obsolete before long.)
4. Create a simple report. SQL Books Online has a tutorial using the Adventure Works database.
5. Verify that http://yourreportserver/Reports and http://yourreportserver/ReportServer work.

Now move to your Sharepoint box running WSS.

6. Use stsadm.exe to install the web part. You will find the report explorer and report viewer web parts on your SQL box: (Search the Report Services library for Sharepoint for more details.)
C:\Program Files\Microsoft SQL Server\90\Tools\Reporting Services\SharePoint\RSWebParts.cab
7. Open your SharePoint site and add the Report Explorer Web Part from the Virtual Server Gallery.
8. Point the Report Explorer at http://yourreportserver/Reports and leave the start path blank for now.
9. You should be able to see your SQL Reports on your Sharepoint site.

My example runs on two boxes: SQL 2005 and Reporting Services/IIS on one box, (along with the Exchange 12 Beta), and my Sharepoint on another box. Sharepoint doesn't seem to run on the same box as the Exchange 12 Beta.

June 9, 2006

Your (Firewall) Data are Ugly. Please Fix It.

Data warehousing and data marts would be simple to construct if only the data were in a standard format. Five years from now, businesses will take OLAP for granted. (OLAP is a fancy way of saying we're going to automate the sums and averages of your sales data over time so you don't have to do all that stuff in Excel any more.) Five to ten years from now, businesses will live or die by their data mining algorithms. (I classify DM as a step above standard OLAP.) Before this can happen, the data have to be available in a usable form.

I come from an information security background, thus I spend far too much time poring over computer logs: web server access logs, firewall logs, Windows event logs, not to mention /var/log/*. I have learned lots of stupid log tricks, like using logwatch, grep (my favorite), Snare to send Windows logs to syslog, and now, Microsoft's free Logparser tool. Logparser has poor documentation but will certainly pay you back for time taken to learn to use it. There's even a non-Microsoft site dedicated to logparser.

Note: Syslog does not store data in 3NF rows. If you want to be able to sort by fields with destPort, sourcePort, sourceIP, destIP, without doing text search, you'll be doing a LOT of ETL work.

This week I was thinking about replacing my firewall/router (a Netopia R9100 with the hardware VPN upgrade that I trade off with a Linksys WRT-54GS (v3) when I'm not paranoid about using wireless.) And yes, I'm not supposed to tell you that, but it doesn't really make a difference if we're both using nmap. So I looked at firewall vendors websites to learn what I could about logging capabilities. I'm slightly less concerned about security in my home lab than I am about collecting data on attacks. Firewalls have been around for over ten years now, so you'd think they would have logging down.

Watchguard: several logging options, including syslog and XML, SNMP costs extra.

Juniper/NetScreen: syslog, SNMP, NetIQ (If I feel like paying for that, too.)

Checkpoint: "Eventia Reporter™ is a complete reporting system that delivers in-depth network security activity and event information from Check Point log data." This means I can look at CheckPoint logs, but I can't correlate them to anything else. This Checkpoint vs. Cisco page is also interesting.

SonicWall: "ViewPoint®, Local Log, Syslog, WebTrends" I can pay extra for SonicWall's "Viewpoint" product, but I still can't correlate SonicWall logs to any other logs. One SonicWall includes a "secure" switch in their firewall: I would love to see what happens when I try an arp spoof. (If I wanted a switch, I would buy one.)

Cisco PIX: SNMP, Syslog, and AAA ("Authentication, Authorization, and Accounting Support") It does Cisco logging. It also has a CLI. (Command-Line Interface.) Unless Cisco starts giving me free hardware, I'm not sure why I'd use a PIX. If I blow a command, my network is not secure. A CLI is fine when it's obvious if a command is working or not, as with routing, but with firewalls, it makes me nervous. Then again, you should test every port after entering a rule change on your firewall.

Microsoft ISA Server: "ISA Server 2004 provides detailed security and access logs in standard data formats, such as delimited text files, Microsoft SQL Server databases, or SQL Server 2000 Desktop Engine (MSDE) databases."

I don't even like software firewalls, but Microsoft makes it easy for me. At $1,500 plus $250 for decent software, Watchguard is more expensive than ISA server. Checkpoint and Juniper won't even tell me how much their products cost. Sonicwall, Watchguard, and ISA Server are all priced on CDW.

If firewall data are this disparate, I can't imagine what a pain it must be to build data warehouses with data from other sources. Current firewall products seem to create their own silos and make it difficult to track intruders across a network rather than just at the perimeter.

June 6, 2006

The Sum of All Ports, coming to a SQL server near you.

Using syslog, MS SQL 2005, SQL Server Analysis Services, and MS Excel, I can build a cube with my firewall log violations and then import the cube into Excel and produce pivot tables. While this might seem more complicated than it needs to be, I could produce a daily scorecard of attacks. The only catch is that I need a firewall that logs to SQL server or a syslog to SQL server connector. The syslog => SQL connection would be tough because my router/firewall doesn't do uniform syslog notifications. I know enterprise-level firewalls do much better logging, like the Watchguard X-series which I was fond of just because I could make them do almost anything. The last time I checked, though, they still cost $1,500 for the base model plus $500 for the appropriate software.

With the Watchguard's new XML logging, I could create a SQL Server Integration Services package to import the data regularly. From there, I could get SQL Server Analysis services to process my cube each night. Then I use Microsoft Sharepoint's Scorecard or OLAP web part to display statistics. Best of all, I wouldn't have to mess with doing my own manual extract-transform-load (ETL) of my router log data.

The graph below represents a simple count of attacks by port on my router. Port 0 corresponds to ICMP. (I don't respond to ping requests.) The rest of the ports are closed, except for port 80, which you're using now. I ban a few IPs on port 80 because they won't stop posting junk trackbacks onto my blog. The ports are in alphabetical order rather than numerical order because I must store them in text fields rather than numerical fields in the database. If the port numbers aren't text then SSAS will OLAP them and I'll end up with the sum of all ports, which is nonsense but nevertheless might make a good statistic for MBA-types. While the graphic may not be all that impressive, the scalability is. Using SQL and SSAS, I could track probes and attacks on hundreds of firewalls at a time, track trends over time, and even predict the level of future probes.

Probes by Port

June 5, 2006

Assessing Attacks; or 18th Century Epistolary Novels vs. Data Structures

Being assigned a data warehousing/data mining project for class sounds like fun, but where am I supposed to get a data set? I can buy a database of all area codes and exchanges with latitude and longitude, but I would still have to simulate a hundred million records to address scalability and query optimization issues. Then I could find out if my estimations of the size of records is within a factor of ten, but the networks I see still wouldn't be "real" and I would have no idea if that's what real social networks looked like. (As an undergrad English Lit major, I was reading 18th Century epistolary novels instead of taking Data Structures like my Computer Science major classmates. The sad part is that Data Strutures would have been more interesting.)

Fortunately, data magically appear on my Linux box every day.

Each morning at four am, logwatch runs on my Fedora Core 4 (Red Hat Linux) box. It tells me how many times nonexistent files on my webserver have been requested, and how many router firewall violation attempts have been logged. It also tells me how many times Apache logged a "method not allowed" 405 code. I have several daily log files that give me useful information on attacks. The problem is that there are so many attacks that if I banned every IP that looked for a web application hole or probed a port I wouldn't have time for anything else.

So it makes sense to look for attack source IP (Internet Protocol) addresses that probe my router AND request holes in web apps. To do this I need three files: my router log from syslog; and two greps of all my Apache logs. (grep -h will suppress file names at the beginning of each line) looking for 404 and 405 errors. This gives me three tables, from which I can do inner joins on source IP in each. Of course, I have do do some tedious data cleanup to get the text log files into Excel and from there Access. (I always underestimate the time it takes to clean up data.) From Access, I'm going to go to SQL 2005, Analysis Services, and build a cube. From there I should be able to "see" the attacks using Pivot Tables in Microsoft Excel.

If I see a source IP in my router log and Apache error logs, then it's probably worth banning. Correlating IP addresses to identify those involved in multiple methods of attack takes me from hundreds of IP addresses down to six.

May 31, 2006

Summer of SQL and Data Mining

Summer may be the best semester at George Washington University because all the undergrads are gone. The Marvin center is almost completely vacant, and the food court is closed. I am taking two electives this summer, which will give me more electives than I need. Why am I doing this? I’m actually learning something useful. People keep asking me why I don’t go for the CIO Certification. Instead of doing databases, I’d be taking MGT 272 Information Resource Management and MGT 274 Survey of Advanced IT Technologies. I have yet to read a single government job description that even mentions CIO Certification. (The Government Services Administration “invented” the CIO Certification, but the Office of Personnel Management sets job standards.)

I’ve taken academic classes in Information Systems at GWU for a while, and I have also taken classes at Learning Tree in SQL, Exchange, Solaris, Security and the programming language, C. The more advanced my classes get at GWU, the more they resemble a Learning Tree class, with one exception: at GWU, they teach theory and practice. At Learning Tree, it’s just practice. You can learn how Microsoft SQL Server works without learning a thing about normalization. Learning Tree is training database administrators, not database designers. My professor for both classes, John Artz, argues that vendor certifications will include more theory in the future; otherwise the vendor certs will become less relevant over time.

So what are my classes? Data Warehouse Design; and Database and Expert Systems. Database and Expert Systems includes no expert systems (I covered them in Decision Systems), and is mostly T-SQL for Microsoft SQL Server 2005. Data Warehouse Design is mostly theory with some implementation on SQL 2005 Analysis Services. Relational database theory hasn’t changed in about 30 years now, so you’d think I would have learned more relational database management systems earlier, but hey, it’s easy to become distracted with security and email and the web servers and Linux. Not that I haven’t used plenty of relational databases as back-ends to applications – I just didn’t think about the relational algebra that drives my queries.

While I can’t post class notes here, I can tell you which books the professor has chosen. When I’m searching for technology books, it’s hard to tell which ones are good.
Database and Expert Systems:
Dusan Petkovic’s SQL Server 2005: A Beginner’s Guide
Ken England’s Microsoft SQL Server 2000 Performance Optimization and Tuning Handbook. (There isn’t one for SQL 2005 yet.)
Data Warehouse Design:
Ted Lachev’s Applied Microsoft Analysis Services and Microsoft Business Intelligence Platform 2005.
(And the John Artz manuscript.)
Both classes:
Connolly and Begg’s Database Systms. This is the book I wish my database class from last semester used. It covers a lot more than that textbook: Hoffer, Prescott and McFadden’s Modern Database Management.

May 20, 2006

Google Analytics: Malkovich of the Internet?

If you’ve ever seen Being John Malkovich, you should remember what happens when Malkovich himself enters the Malkovich portal: everyone is John Malkovich and all they say is “Malkovich Malkovich.” On the Internet these days, I feel like it’s “Google Google.”

After months of waiting, I got my free Google Analytics invitation. I had set up Urchin reporting on a number of sites years ago, and I was disappointed that I couldn’t buy Urchin anymore because Google had swallowed them whole. Google Analytics is even better than Urchin was, and it includes Urchin’s campaign and e-commerce tracking modules. I also liked Urchin a lot more than Webtrends, which seems to have grown into bloatware since my first (positive) encounter with Webtrends in 1999. The trouble with Webtrends and Omniture is that they are focused on big business, because that’s where the big money is. Google Analytics is simple enough use on this blog, but can scale as large as is needed. GA gives me the same tools that the big guys have. Thus, Google will make its money scaling out rather than scaling up.

Unlike Urchin, Google Analytics doesn’t run on my server. The JavaScript script goes back to Google, and I’m sure they can also see how many people are visiting my sites. On the up side, I didn’t have to remember to reconfigure Apache logging.
This is what I used to have to enter as root into httpd.conf and hope that I didn't make any typos:

LogFormat "%h %v %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}i\"" special 

Also, the Googlebot doesn’t show up in Analytics reports. Browsers that reject third-party cookies won’t be track-able. And Google will know a lot more about all of our browsing habits once more sites start implementing Google Analytics.

As far as the markets go, Google Analytics is a shot at other web analytics packages, and it may surpass them because Google can integrate your Google Adwords into your Google Analytics. (Malkovich Malkovich.) This is something that others can’t do, because Omniture and Overture are separate. Of course, Google Analytics, Adwords, and everything else Google integrates with your Gmail account. Before long, Google may be able to offer IP address resolution into Google Earth. (Malkovich Malkovich Malkovich)

I also recently created Google sitemaps for a couple of sites. Google uses sitemaps to find new pages and changed pages on sites faster. The sitemap generator is a python script that looks at your log files, your web file system and creates an XML file that the Googlebot can download. After creating the sitemaps, the two have now gotten a lot of hits from the Googlebot: 1671 for this site, and 5,000 or so for the other site, which gets a lot more traffic than mine. By getting me to do some of the work of indexing, Google indexing operates more efficiently.

Before long, Google will know more about traffic on the web than the NSA knows about traffic on the Public Switched Telephone Network.

May 13, 2006

NSA has Greatest Sociological Dataset of the 21st Century

Between the phone records, emails, and instant messaging, the NSA can now map the social and business fabric of America and see how it is changing over time. By collecting our phone records, the NSA has created a dataset that can map our social connections. As I've mentioned in my previous post, the NSA can now map the calls we make and the calls our callees make, creating a giant tree of connections. A wrong number might get you placed on a no-fly list, but I'm not going into morality. (The current administration is the de facto law, so legality becomes a moot argument.) I'm just going to explore how they might use this data and what they could find from it.

The phone call data structure (date/time, origin number, destination number, duration, ID) could easily be used to scan emails and instant messaging as well: date/time, origin address, destination address, length, ID). Using the same reflexive queries, the NSA can track our social and business connections through email. They could also tell how those annoying chain letters get spread (Good TImes virus warning: forward this to all your friends). However, the size of each record would be larger, probably more than double the ~48 bytes required for a phone record. The size would slow down the query and require more storage. Also, geocoding the emails would be more difficult, although it could be done based on where the person writing them maintains a physical address, or on the originating Internet Protocol address.

With all this data, the NSA is in a position to know the difference in social networks that use telephones and those that use email. If they've had this program in place for long enough, they could even recognize trends social communication. With the right set of queries, you could spot the difference between neighborhoods where people know each other and neighborhoods where nobody knows each other and plot them in different colors on a map. What percentage of Americans calls their mom on Mother's Day? In which neighborhoods do people call their mothers most often?

It would also be somewhat easy to plot our connections in a GIS tool such as Google Earth. You would just export your target connection tree into an XML kmz/kml file and import it into Google Earth or ESRI's ArcGIS. The graphs derived from the data would by anonymous. (Of course, research ethics might prevent us from using the data.)

Sadly, none of the derived information will ever see the light of day. Academically, it would produce fascinating results that could teach us about how we communicate with each other and how our communications are changing. Then again, no local, state, or federal government information sytem about citizens has ever been immune from misuse. IRS workers read their neighbors' tax returns. Cops run license "plates for dates," and use NCIC (National Criminal Information Center) criminal background checks to help those running for election denounce their opponents. If a dataset about us exists, it will be abused by those with access, but appropriate security controls should prevent that.

May 11, 2006

How the NSA Might use our Phone Records

Today, USA Today reported that the National Security Agency has been collecting domestic phone records of many of us U.S. citizens. Unlike everyone else blogging on this today, I'm taking no position on the ethicality of this activity. Instead, I'm going to tell you what I would do with those phone records from the perspective of a database geek. There's plenty of other analysis going on elsewhere, and I'm no constitutional lawyer.

I've been using Vonage for a while now, and I have access to my own phone records on the computer. It's easy enough to cut and paste my Vonage call records into Excel and from there into Access. From Access, I can easily export/import them into the Relational Database Mangement System of my choice, which for now is MS SQL 2005. However, there are many more out there.

Each records looks something like this: Date, Time (you can combine these into a LongDate), From phone number, To phone number, Duration, and a unique transaction ID. I get all this for incoming and outgoing calls. It's great for anyone that does billing for phone time. I'm assuming that these are the same kind of records that the NSA gets. Once the NSA gets these records, they do a data transform to make all the fields fit into their system in a uniform manner. Since the data is already fairly simple, they don't have to do much, and even a moderately skilled programmer like me could write something to transfer phone records almost as fast as they could get them.

If I had phone records from other people, I could combine them with my phone records into one massive table (relation, in database-speak). I could then do a reflexive query on them to pull a list of all the people I had contact with, through incoming or outgoing calls. I could then do another query to pull all contacts of all the people who had called me; this would show my my friends' friends. If I had access to more data about the phone numbers, say through geocoding (a fancy way of saying latitude and longitude attached to each phone number), I could create a map and track a phone tree. If I call someone in New York, and they call someone in Paris, and the person in Paris calls someone in Amman, I could draw lines making the connections on a map.

For this level of tracking to work, the NSA has to have absolutely all the phone records they can possibly get their hands on. If they have a target talking to someone and that someone talks to someone else and the NSA's records drop at the first friend of the target, they're lost. It would be a dead end. If they get all the records, the creation of a massive data warehouse that shows connections between people is pretty much academic. The budget for doing all this has dropped dramatically over recent years: you might be able to do it with a couple of Netezza data warehouse appliances. Rumor has it the NSA was Netezza's first customer. All the hardware to do it might cost under a million dollars. The tricky part, as with all data mining projects, is getting good data, and the NSA has that problem solved.

The hardest part left for them is scalability: they're trying to drink from a firehose, but the records aren't that big, which makes it feasable. You might be able to store all the number-only data in a record as short as 40 bytes: LongDate, Number, Number, Number, Number. (I'm not going to get into data types in depth here, but let's assume we can store phone numbers as numbers and not text to save space.) Thus one million phone records would occupy 40 megabytes. If the US makes a hundred million phone calls a day, that's about 4 GB a day of data. Large, but manageable if you have a large budget. Even if you double the key identifier size to 16 bytes (to cover hundreds of millions of calls) you're still only up to 4.8 GB per 100 million calls.

Only after you've identified a target would you want to create a join query that connects names and addresses with phone numbers; this would be far more efficient than attaching names to the phone record tables, and would give the NSA a chance to say they're recording numbers only. If the NSA uses a consumer data company like, say, Acxiom, to get information on phone numbers post-targeting, then they're not even subject to the Freedom of Information Act or US Privacy Law.

The end result is that the NSA has the capability to map our social and business networks; given enough time and hardware, they could even plot them on satellite photos, creating a cool mish-mash of lines across neighborhoods. They could create files on us all like Friendster lists our friends and their connections. Whether the NSA's system actually works efficiently, we'll never know.

May 10, 2006

The Job Interview

I used to think it happened only to me, and then I read Dilbert last Saturday. My interviewers are asking more and more real-world technical questions about real problems they have. More often than not, I can solve them in a couple of sentences. And later I wonder if they really had a job to offer or if they were just looking for cheap solutions.

Even months later, I can see users from my interviewers' companies reading my blog entries about how IIS, SQL, AD, and SharePoint work together. Thank you, Google Analytics. And no, I would never name your company here, because it just wouldn't be professional.

May 5, 2006

Finals: Neural Nets in a Nutshell?

I have three finals: one Monday evening and two Wednesday evening. The bad part about having two finals on one evening is that it's two exams for which I have to prepare. The good part is crossover between the two classes: both cover neural networks, although it's definitely on my database exam, my decision support systems class covered it better. Although the Marakas text had problems, including a typo and a cut-off paragraph, it explained neural nets better than the database textbook or the db professor did, and I've gone over his powerpoint several times looking for a good definition. Marakas:

neural networks attempt to mirror the way the human brain works in recognizing patterns by developing mathematical structures with the ability to learn.

One type of human intelligence is the ability to recognize patterns, and then learn to recognize patterns better. Thus, neural networks are one form of artificial intelligence.

If you search the web, though, you'll find as many definitions of neural networks as you do result pages.

April 21, 2006

I move this blog from dot.Net to SixApart.

While I liked my Presstopia blog fine and knew how to do a few customizations to it, I didn't have categories. There was no one page where I could go and find all the entries about script attacks or Exchange 12. I was also curious about updates in Six Apart's Movable Type 3.2, since I haven't done an MT install since about MT 1.6.something, along with comment assassin. With MT 3.2, all comments must be approved and I get emails about them.

By no means is this move based on Microsoft vs. Open-Source whatever. I am simply choosing an available tool, just like with anything else. It's not like MT is free either if I have more than one author.

The migration went OK -- I had to write a simple VBscript to format my old entries to a format MT could understand. My old blog, Presstopia was fine but didn't have any specific export capabilities other than Atom and RSS. Why MT can't import directly from RSS or Atom is beyond me. Why Presstopia can't shoot out all entries from RSS is a mystery -- unless I get source code access. Thus I wrote another SQL connect string and some VBScript to format the date properly. I have a lot to learn about date formatting from SQL 2000 in ASP. And about dot.Net.

Installing MT wasn't that bad, except Image::Magick is AFU on Fedora Core 4. It won't make because some kind of language error that makes variables into foreign phrases, leaving me with thousands of error messages. I like NetPBM better, anyway. And MySQL permissions are funny. Localhost is NOT included in ANY in the host permissions section. D'oh.

Now I can compose within the blog itself, rather than saving in Word or something first because my session would time out before I hit the "post" button. MT is slower on the edit response time but reading is faster, since it's static HTML rather than a VB call to SQL. And my content is no longer held hostage to a system with limited export capabilities.

March 21, 2006

Is Cheating Rampant at GWU?

Monday, March 20, 7:30 PM, Duques Computer Lab, GWU. You can tell it's time for midterms at GWU again because the computer lab is full of students collaborating on take-home computerized midterms running through Blackboard.
One student takes the exam, the others gather around, write down the questions, and look up answers on Google. Later, the others will use the written-down questions to look up answers before they start the timed on-line version. It's a decidedly low-tech form of cheating in the information age.

Surprising factors: how loud they are, how little shame they have, and how nobody cares. So cluelesswere the students that I drafted this entire entry while sitting next to them. The professor could allow teamwork on midterms, but I highly doubt it. Then again, since this is the GWU’s Business school, training the executives of tomorrow, so perhaps it’s a nod to the reality of the business world.

While GWU specifically prohibits this kind of activity in its Code of Academic Integrity, it's pretty meaningless when the administration admits athletes to GWU who never completed high schoool. The difference between what is said and done increases as universities try to compete athletically and academically. Both student and administrative cheating devalue the credential for which I'm paying with time and money.

Until GWU's President, Stephen Joel Trachtenberg, takes a stand on one, the other is not going away any time soon.

Footnote: In Freakonomics, Stephen Levitt and Stephen Dubner discuss Paul Feldman’s self-serve bagel business. Feldman’s honor-system payment and meticulous record-keeping showed that executives were more likely to steal bagels than lower-level workers. Feldman attributed the difference to an “overdeveloped sense of entitlement” while Levitt and Dubner suggest that “cheating was how they got to be executives.”

January 22, 2006

Grad School Textbook Blunders

My information system classes just started again and I’m
actually enjoying Decision Support
Systems
. For the most part, the textbooks, (which cost more used at the GWU Bookstore than new
on Amazon), are excellent standards of
information systems. However, when they go off-topic or try to predict trends,
they start to, well, go off a bit. 

George Marakas’ Decision Support Systems In the 21st Century is good from what I can tell after chapters 1, 2 and 13, but chapter 13, “The Systems Perspective of a DSS,” has some funny statements: “Because the World Wide Web can be easily eaccessed via any type of available hardware, end users having Windows, MAC-OS, LINIX [sic], UNIX, or even a home television Web service can easily share and access the DSS application.


If you’re going to publish a textbook on DSS and get a
university to charge me outrageous sums for it, you could at least spell Linux
correctly. It makes me doubt the accuracy of the entire chapter.


Also, read this: “The current
standard for Web-base applications is JAVA.  Although showing great promise, JAVA is not
yet capable of providing fast, reliable performance in complex application
environments, primarily because it is interpreted, rather than compiled into
machine code.
” Personally, I would disagree with the statement on several
levels. Java is a standard, but not the standard. Most languages used to
process web requests, like ASP, PHP, Perl, Python, are interpreted, and can run
on several platforms. I believe interpreted languages are far more common for web
applications than compiled languages, like C or C++.


My database textbook, Modern
Database Management
, says this: “The
Bluetooth wireless standard will greatly accelerate development of wireless
PDAs that connect to the Internet. This development will accentuate the
importance of protecting data security in an increasingly wireless world.



I
would agree with the second sentence, but not the first. Bluetooth is used to
synchronize mobile devices with your laptop or desktop, or to print wirelessly, not so much to connect
to the Internet. That would be Wifi, WiMax, CDPD, CDMA, GSM,  TDMA, FLEX, Mobitex, etc., but not so much
Bluetooth. I am nitpicking, but for the cost of grad school and textbooks, you’d think they would get it right, even when they go
off-topic. The authors would be better sticking to what they know, but who
knows, I could be wrong. What do you think?

November 9, 2005

All Classes will be on Wednesdays this Spring.

I just registered for the spring semester. All classes I need are on Wednesday evening, so I will be in class from 6pm until 10pm.
Wednesday:
Computerized Decision Systems - MGT 226 10
Database Systems - MGT 284 10
Monday: (my elective)
Topics in Higher Level Languages (Java) - MGT 283 10
I wasn't planning on a higher level language this spring, much less Java, but the professor has her own book on object oriented programming using Java, so it's hard to resist. Maybe I'll learn something.

All the other electives I wanted to take were also on Wednesday evening.