Distributed Computing

A number of themes appear to be converging to generate what might just be the only new thing in computing for years. I’ve been interested in parallel processing & grid computing for some time, since looking at the Transputer some years back, then watching the developments in specialist applications coded to be run in parallel like Seti@Home. As a day-job, I’ve spent the last 10 years at Nortel Networks, in a variety of areas from access, through wireless to optical transport, and have noticed a number of developing themes. As communications costs reduce, and compute systems become more open and modular; larger systems are beginning to be built from dispersed components that are connected together across cheap bandwidth.

One of the reasons that parallel computing is a specialist niche is the difficulty in writing software for a parallel system. I also did a course in parallel processing at uni, which required the learning of a parallel language, so have experienced understand some of the difficulties. It really is a different way of thinking, more akin to hardware than software, where things happen at the same time rather than being sequential. You can’t just take a normal bit of code designed to run on one processor, and attempt to run it faster by executing it on a parallel computer.

Computers are becoming more connected than ever: Due to the advancements over the last few years in high-capacity transport, and the development of the Internet, the world is now better connected than it has ever been in the past. The Internet has so far reached about 800 million people, and is projected by some analysts to reach more than a billion people by 2007. The concepts of ad-hoc wireless mesh networking are also starting to connect the standalone devices into networks. See note on motes & ZigBee.

Open transaction standards are enabling individual companies to build applications that talk directly to applications built by other companies. The increase in XML based systems connected to the Internet are one example of this. IDC talk about ‘Dynamic IT’, where there is a migration from large monolithic enterprise applications (such as SAP) to a set of open platforms (J2EE/.NET) which run a set of best-in-class applications. HP talk about the Adaptive Enterprise, IBM On-Demand computing.

This leads to a move from the companies of the past, where one company built the whole system, or value chain; to more vertically integrated companies - that focus on their key value, and sell one component of a larger system. This point can also be seen in the changing labour force of 2005. Many of my previous colleagues from Nortel have given up on the concept of working for a single company full-time, and have started ‘portfolio working’. They have identified their core competencies, and are working with a number of larger companies to fill the gaps they have on a short-term basis. This provides a lot of flexibility for the worker, and also allows the company to generate more value without committing itself to another full-time employee.

I believe that these themes will start to converge towards a more general case of distributed, network-attached compute, or general purpose grid computing. There are two interesting cases today - Google & LAMP, and Azul Systems.

On the Java/J2EE side, Azul have realised that these languages require the programmer to work within a virtual processor environment, and have built a parallel processing engine designed just to execute J2EE code. They use the concept of Network Attached Compute, in much the same manner as Brocade would talk about Network Attached Storage.

I keep reading articles in Computing where some finance company or other has just pulled out a proprietary system, and inserted a Linux/Intel (or ‘Lintel’) based system in it’s place, saving huge amounts of money on hardware/software. Google is probably the best example, and uses large numbers of Linux systems on low-cost Intel hardware to provide a large-scale distributed system for crawling & providing search results. Amazon and Lycos Europe are other examples. The general case for an application server built in this manner seems to be referred to as a LAMP system, comprising of Linux, Apache, MySQL and PHP/Perl/Python. The 3 PCs are used, because they’re better at processing XML than Java.

So - we’re not quite at the step of having completely virtualised compute resources, as both of the examples above are rather specialised. Taking the next step will require some clever network operating system, or a significant development of the current machine to machine communications standards. (From a hardware perspective, the Cell processor could also play a part.) Whoever manages to develop a suitable system could change the way computing works in such a discontinuity that might even topple Microsoft.

As an example of the sort of application I’m thinking of in the consumer world, I’d love to enable all of the networked computing devices in my house to co-operate. I have a PDA and 3 PCs, all networked, but to run an application I have to find the specific computer that has the application installed. I’d rather just submit a request to whatever’s closest, and have the ‘compute network’ solve the problem, then get back to me on whatever display is available. For example, I could initiate a request through my PDA for a search of TV programs containing certain actors. The PDA talks to the database on my PC, which then queries a server on the net. The results pop up on my plasma screen, along with a few short clips of the actor within each program. I click on the one I’m interested in, and it sets my PVR to record the show. (This example also links into concepts such as ZigBee’s ad-hoc networking, and perhaps Jini.)

Sources:
IBM On-Demand computing IDC Dynamic IT Hewlett Packard Adaptive Enterprise Oracle Grid Azul systems Google LAMP O’Reilly OnLAMP
Tags: