2/4/11

How To Criticize Computer Scientists

How To Criticize Computer Scientists or Avoiding Ineffective Deprecation And Making Insults More Pointed


In recent exchanges, members of the faculty have tried in vain to attack other Computer Scientists and disparage their work. Quite frankly, I find the results embarrassing -- instead of cutting the opponent down, many of the remarks have been laughably innocuous. Something must be done about it because any outsider who hears such blather will think less of our department: no group can hold the respect of others unless its members can deal a devastating verbal blow at will.

This short essay is an effort to help faculty make their remarks more pointed, and help avoid wimpy vindictives. It explains how to insult CS research, shows where to find the Achilles' heel in any project, and illustrates how one can attack a researcher.

The Two Basic Types Of Research
Most lousy insults arise from a simple misimpression that all researchers agree on the overall aims of CS research. They do not. In particular, CS has inherited two, quite opposite approaches from roots in mathematics and engineering.

Researchers who follow the mathematical paradigm are called theorists, and include anyone working in an area that has the terms ``analysis'', ``evaluation'', ``algorithms'', or ``theory'' in the title.

Researchers who follow the engineering paradigm are called experimentalists, and include most people working in areas that have the terms ``experimental'', ``systems'', ``compiler'', ``network'', or ``database'' in the title.

Complex Theory And Simple Systems

Knowing the tradition from which a researcher comes provides the basis for a well-aimed insult.

Theorists Favor Sophistication

Like mathematicians, theorists in Computer Science take the greatest pride in knowing and using the most sophisticated mathematics to solve problems. For example, theorists will light up when telling you that they have discovered how an obscure theorem from geometry can be used in the analysis of a computer algorithm. Theorists focus on mathematical analysis and the asymptotic behavior of computation; they take pride in the beauty of equations and don't worry about constants. Although they usually imply that their results are relevant to real computers, they secretly dream about impressing mathematicians.

Experimentalists Favor Simplicity

Like engineers, systems researchers take pride in being able to invent the simplest system that offers a given level of functionality. For example, systems researchers will light up when telling you that they have constructed a system that is twice as fast, half the size, and more powerful than its predecessor. Experimentalists focus on the performance of real computer systems; they take pride in the beauty of their code and worry about constants. Although they usually imply that their results can extend beyond real computers, they secretly dream of filing patents that apply to extant hardware.

The Insult

Knowing that CS can be divided into two basic groups helps immensely when criticizing someone. There are two basic rules: identify the type of the researcher and issue an insult for that type. Avoid saying anything that inadvertently compliments them. If performed well, an insult will not only stun the researcher (who will be shocked to learn that not everyone agrees with his or her basic value system), but will also intimidate others in the audience.

Identifying A Type

Identifying the type of a researcher is usually easy and does not require a strong technical background or real thinking. It can be done using keyword matching according to the following lists.

Detecting Theory

You can tell someone is a theorist because they slip one or more of the following keywords and phrases into lectures and technical conversations: ``theorem'', ``lemma'', ``proof'', ``axiom'', ``polynomial time'', ``logarithmic'', ``semantics'', ``numerical'', ``complexity'', ``nondeterministic'' or ``nondeterminism'', and ``for large enough N''. They write lots of equations, brag about knocking off the ``extra log factor'', and often end their lecture with an uppercase ``O'' followed by a mathematical expression enclosed in parentheses. You can also recognize a theorist because they take forever to prove something that may seem quite obvious. (I once sat through an hour lecture where someone proved that after a computer executed an assignment statement that put the integer 1 into variable x, the value in x was 1.)

Detecting Systems

An experimentalist will slip one or more of the following keywords and phrases into lectures and technical conversations: ``architecture,'' ``memory,'' ``cpu'' (sometimes abbreviated``CISC'' or ``RISC''), ``I/O'' or ``bus'', ``network'', ``interface'', ``virtual'', ``compile'' or ``compiler'', ``OS'' or ``system'', ``distributed'', ``program'' or ``code'', and ``binary''. They talk about building programs and running the resulting system on real computer systems. They refer to companies and products, and use acronyms liberally. Their lectures often end with a graph or chart of measured system performance. You can also recognize an experimentalist because they describe in excruciating detail how they set up an experiment to measure a certain value even if the measurement produced exactly the expected results. (I once sat through an hour lecture where someone carefully explained how they used three computer systems to measure network traffic, when their whole point was simply to show that the network was not the cause of the problem they were investigating.)

Forming An Insult

The key to a good insult lies in attacking whatever the researcher holds most dear and avoiding whatever the researcher does not care about. Thus, an insult lobbed at a theorist should focus on lack of sophisticated mathematics such as the following:
  • Despite all the equations, it seems to me that your work didn't require any real mathematical sophistication. Did I miss something? (This is an especially good ploy if you observe others struggling to understand the talk because they will not want to admit to that after you imply it was easy.)
  • Isn't this just a straightforward extension of an old result by Hartmanis? (Not even Hartmanis remembers all the theorems Hartmanis proved, but everyone else will assume you remember something they have forgotten.)
  • Am I missing something here? Can you identify any deep mathematical content in this work? (Once again, audience members who found the talk difficult to understand will be unwilling to admit it.)
In contrast, an insult lobbed at an experimentalist should imply that the techniques were used in previous systems or that the work isn't practical such as:
  • Wasn't all this done years ago at Xerox PARC? (No one remembers what was really done at PARC, but everyone else will assume you remember something they don't.)
  • Have you tested this on the chip Intel got running last week in their lab? (No one knows what chip Intel got running last week, but everyone will assume you do.)
  • Am I missing something? Isn't it obvious that there's a bottleneck in the system that prevents scaling to arbitrary size? (This is safe because there's a bottleneck in every system that prevents arbitrary scaling.

How To Avoid Having An Insult Backfire On You

A misplaced insult can backfire, turning into an embarrassment for the attacker and a victory for the intended attackee. To avoid such occurrences, remember the following:
  • Never attempt to attack theoretical work as not considering constants, as unrelated to real computer systems, or as requiring too much sophisticated mathematics. (The intended victim is likely to smile and thank you for the flattery.)
  • Never attempt to attack a system as too small, too simple, or as lacking sophisticated mathematics (Again, the intended victim is likely to smile and thank you for the flattery.)
  • Never attempt to attack systems work simply by saying that it's so simple and obvious that you could have done it. (For years, people said that about UNIX and the TCP/IP protocols.) In fact, this is merely an extension of a ploy used by children on a playground: ``Oh yeah? I could have done that if I wanted to.'' Don't try using it or someone will tell you to grow up.

Attacking Crossover Work

Although rare, a few researchers include both theoretical and experimental work in the same project. Insulting such combinations can be tricky because a researcher can escape unscathed by pointing to one part of their work or the other as the answer. You can try to attack both parts simultaneously:
  • I note that the systems aspect of this project seems quite complex. Do you think the cause of the convoluted implementation can be attributed to the more-or-less ``simplistic'' mathematical analysis you used?
However, a clever insult can avoid talking about the work by suggesting sinister reasons for the paradigm shift:
  • I notice that you did something unusual by combining both theory and experiment. Did you decide to try a second approach because you had insufficient results from the first?
  • You seem to have a little theory and a little experimental work combined into one project. Isn't it true that if you had a sufficiently strong contribution in one or the other you would have lectured about them separately?

A Final Plea

I certainly hope faculty will take this essay to heart and sharpen their insult skills. In the future please make all your thrusts count. Sphere: Related Content

Interesting Projects–A Collection

Alfred Thompson 29 Mar 2011 2:53 AM

It seems as though teachers are always looking for new projects to use with students. Projects get stale (at least to a teacher who has been grading lots of them for a long time) or seem to not fit with a current crop of students or just never seem right. So the hunt goes on for more. When I come up with programing projects I like to post them here on my blog for use, comments, feedback and in the hopes that people will help make them better. I tag them with the projects tag to make them easier for people to find as well. But recently it struck me that an annotated list of some of the more interesting projects might be in order. So here it is.

Programming Projects Using Arrays - This is a collection for the APCS mailing list of projects teachers have suggested for teaching arrays. They should work with any programming language.

Whack Something Game for Windows Phone 7 – This is a “how to” I wrote for creating a whack a mole style game for the Windows Phone 7. It could easily be used/modified to create a similar game for Windows or the Xbox since it uses XNA Game Studio.

The Credit Card Project – Do you know how credit cards are validated? I think a lot of students would be interested in this project that includes knowing something about the codes that identify types of credit cards and a check digit to validate the number.

FizzBuzz–A Programming Question – this was based on an interview question I read about. The comments are interesting and include both a lot of discussion about this particular project and similar questions. This one uses loops and discussion statements in an interesting combination.

Lines Can Be Fun - This is a discussion of some interesting graphical line drawing projects. There is some sample code using Small Basic but you could use these ideas in most languages that support simple graphics.

Would you play this game? - A simulation of a card game with the idea of determining if it is a reasonable game to play as defined by being something one can actually win at. It uses random numbers, arrays and loops.

Visualizations and Sorting - Some ideas around projects that show or play as in sound how sorting algorithms work. Something to make sorting more interesting than just “magic” behind the scenes.

ASCII Art For Fun and Projects – Old school ASCII art projects may seem passé but a lot of today’s students don’t know about them which makes these ideas “new to them.” And they can be fun.

Monte Carlo Simulation–Slot Machines – How do slot machines work? Add some graphics to this one and really make it rock.

Monte Carlo Simulation–Roulette – how does the house win at Roulette? Random numbers, probability and creating a simulation are all a part of this project.

Who Designed That User Interface – How would you design an ATM interface? Yeah it involves money. This is a chance to not only have students implement a user interface but learn about data checking/validation and how it all fits with usability.

Are You Using a Strong Password – On one hand this is a simple data validation project that looks at characters and does some evaluation. On the other hand it is an opportunity to talk about security, what makes a strong password and why strong passwords are important.

Coding Up A Coded Message – Not surprisingly this is about codes and cyphers. I find that a lot of younger kids are fascinated with hiding messages with codes. This allows for a lot of interesting character manipulation and some good algorithm discussions.

Fun With Formulas - Did you know that horsepower was based on James Watt finding that a work horse could lift a 1,000 pound weight 33 feet in 60 seconds? I didn’t either but it makes for a fun project. Sample code in C#, Visual Basic and a screenshot of a cool solution table using Excel. Yep, programming sorts of things in Excel. Who knew?

Fun With Colors - Move the sliders for red, blue and green to adjust the color values of a color display. This is the sort of thing designers use for all sorts of color picking routines. It shows something about how color mixing works as well as making a fast and easy project to let students experience success quickly.

Binary Number Game – A lot of traffic comes to this blog from people looking for ways to teach binary numbers. This post describes one good learning game/project and opens the door to more with a little imagination. One might as well make a game out of learning when possible.

The Four Digit Problem – How would you randomly pick a four digit number with no repeating digits? Would you use recursion? You could. Or loops? That would work as well. What’s the best way to do this/

A Simple Check Digit Project - This project uses the formula for validating passport numbers. With more and more people needing passports at a younger and younger age this project has some relevance to many. Having a meaningful project to discuss check digits (which are apparently not as inherently interesting to everyone as the yare to me) makes this a pretty good project if I do say so myself. Sphere: Related Content

18 ways to be a good developer

This is as days pass by, by Stuart Langridge

You can also see older posts, code I've written, the rest of the website, or follow me on Twitter(where I most recently said 'now to put stuff in a bag ready for this weekend, which is Niamh's birthday.').

And this is 18 ways to be a good developer, written Jan 22, 2006, and concerning Rants, Software

1.If someone requests support in a channel and you're around and don't know the answer, be sure to not say anything so they can't tell the difference between you not knowing and you not being around. That way you can complain that users don't wait for answers, even if those answers are never going to come.

2.If a user hasn't read all of your website, all the mailing list archives, the wiki, the old wiki, all the blogs of all the developers, and the text files in CVS, refuse to answer their question. They should study harder before they're entitled to ask questions.

3.Don't bother to write stuff down. If a user asks a question which they could have worked out by spending two years studying the code, then they should have spent that two years.

4.Remember, you understand this software. If someone wants to use it to get their work done, they should be prepared to understand all aspects of it in detail as well as you do. All users should be developers.

5.NEVER thank people for feedback. If they have comments on the program, they should be sending in patches.

6.It's a lot more important to finish rewriting the XML library your program uses for the third time than it is to fix a bug that makes every Fedora user crash their whole desktop. They should be using Slackware anyway.

7.If you're an influential developer at all, your opinion matters more than the users. Follow the previous rule, as it will definitely produce a positive outcome. Be sure to relate the users in question to mentally ill people and excrement.

8.What you think users will do is more important than what users actually do.

9.Don’t use punctuation or bother with the spell checking. This slows down the communication between you and the user.

10.Insult the user. This establishes control, which is important. Support should be thought of as a battle. Popular insults include “asshole,” “mother f**ker,” “dipshit,” and “newb.” Insulting their mother is another good way of establishing control.

11.If you’re confused by the “bug report” that the user is giving you, don’t feel bad, as this isn’t your fault. This is the user’s fault. Users live in a different world. They’re besuited, annoying, stupid people who aren’t able to clearly get points across. Tell them this, as they probably don’t realize it. It is sure to ease the communication.

12.As a developer, you know clearly what users will want to do with the software better than they do. If they say something about the software, take it with a grain of salt. They're only the people who actually try out your theories; you're the one who came up with them.

13.Insist that all users run CVS or svn HEAD. If they're using your latest release stable version, they should be prepared to checkout CVS and compile it before commenting on it. A version release or downloadable binary distribution means nothing when there's something newer available from source control.

14.If someone you know tells you how they would use your software, and someone who actually uses it tells you differently, trust the person you know; after all, you know them. This is doubly important if the person you know is another developer.

15.Documentation is a pointless waste of time. If someone complains that they're finding it difficult to do anything with your program because there's nothing written anywhere on how to use it, then tell them to read the source; that's good enough.

16.If someone files a bug which turns out to be a duplicate, be sure to let them know how stupid they were when you link the two bugs. This is particularly important if the two bugs share no words in common whatsoever and only turn out to be duplicates after a week of digging and thought by you; after all, you had to work much harder then!

17.Anyone who switches away from your program to someone else's is clearly both stupid and an enemy of free software. You're lucky to get rid of them.

18.Programming ability and usability engineering are the same thing. If you know how to write code, you know about usability already; you certainly don't need to waste time studying it.

(with apologies to Christian "ChipX86" Hammond, who does none of these things) Sphere: Related Content

Designing For The Future Web

By James GardnerMarch 29th, 2011
Design, Opinion column67 CommentsPublishing Policy.Advertisement
Designing for the future Web. That’s a big subject. Where do we start when we’re talking about something that isn’t here yet?

In this article, we’ll look at what the future Web might look like and how we can adapt our current skills to this new environment, as well as how to create fluid websites that are built around a consistent core and that adapt to the limitations and features of the device on which they are viewed. We’ll also look at how our conceptual approach to designing websites should evolve: designing from the simplest design upwards, and not from the richest website down.

But before we get to that, let’s start with a question. What do we mean by the “future Web”?

What Is The Future Web?

Back in the old days: analogous Google queries would have taken 30 days. Image: dullhunk
The one word that I hear more than any other at the moment is mobile. Mobile websites, mobile devices, mobile apps: the list seems to go on and on. In fact, a large swell of opinion says that the future Web is mobile.

But despite all this, focusing just on mobile isn’t the answer.

The way we access the Internet is changing, of that we can be certain. And in the short term, this does mean more mobile devices. But in the long term, we have to look a little wider. Thomas Husson, senior analyst for Forrester, summed it up nicely in his 2011 Mobile Trends report when he said, “The term mobile will mean a lot more than mobile phones.” In the long term, the word we should use instead of mobile is portable.

Why Portable? How Has the Internet Changed to Make It So?
First, the physical infrastructure of the Internet is spreading rapidly, so that our ability to access the Internet wherever we are grows daily. In the last 10 years, the number of Internet users has grown by 444.8% and now includes 28.7% of the population. That’s nearly 2 billion people, the majority of whom are in Asia. This growth is fuelled by investment in the underlying hardware that gives us access to the Internet: millions and millions of computers, millions of miles of cables, hundreds of thousands of wireless hotspots and, on top of all this, growing 3G coverage around the globe (around 21% by the end of 2010 according to Morgan Stanley).

Secondly, the way we use the Internet is changing. We are increasingly orienting our online experience around services rather than search engines. Services such as Facebook, Twitter and LinkedIn are becoming the hub for our online life, and we are blending them to create our own unique Web of content: Facebook for our social life, LinkedIn for our professional life, Spotify for music, Netflix for television and film. We’re seeing a very different form of information consumption here, one in which we expect information to be pushed to us through our social circle, the people whom we trust. We’re moving away from the old paradigm of information retrieval, in which we are expected to seek information using search engines and links.

Some of these services are tied to a single device, but increasingly they are available across multiple platforms, including the desktop, mobile apps, Internet-enabled TVs and others. Only last month, Samsung created the first tweeting refrigerator. Okay, that might not be the greatest use of an Internet connection, but it is an example of how these services are starting to spread out, away from the desktop and into our everyday lives. Evrythng, a start-up currently in beta, is working on a platform that would give any physical object an online presence, effectively making the Internet an ubiquitous entity containing data that can be consumed anywhere and by anything.

Given these changes, it’s important that we not be overly rigid in our approach to creating new Web content; we mustn’t allow ourselves to think in terms of devices. Right now, we are producing mobile apps and standard websites to deliver our services, but in a few years’ time, we may be looking at a completely different landscape, one where knowing exactly where and how our content is being viewed is impossible. Our content must be portable in the sense that it can be displayed anywhere.

Media marketers have responded to the increasing use of mobile media. (Image: birgerking)

We may also find ourselves having to decide whether to concentrate on particular devices and channels at the expense of audience numbers or to take a less tailored approach and serve the widest spectrum possible.

Regardless of the route we take, the ability to deliver a consistent experience across all channels is paramount, and our ability as designers and developers to understand the options and deliver this consistency to our clients will be crucial.

So, this is the future Web, a mish-mash of devices and channels. Sounds good, doesn’t it? Let’s go back to the key word, portability.

How Do We Design For The Portable Web?
Ask yourself, how would your latest project cope in the following scenarios:
1. The user is watching House on their new Internet TV. Hugh Laurie’s not on screen, so the user decides to check their email. A friend has sent a link to your website, which the user opens in a sidebar and views simultaneously with the program.
2. The user is on a train back from work, probably delayed somewhere, accessing your website via 3G on an iPad
3. The user is on a client’s website. They need to access your website to read an article, but they have only a company-supplied Sony Ericsson with Opera Mini installed.
Each of these scenarios presents us with a different problem to solve: (1) an odd aspect-ratio and browser combination, (2) a good display area but slow connection and (3) a very small display area. And they are all very possible scenarios. The first Internet TVs by big brands are now available from the big retailers. Opera Mini has over 85.5 million users and is the dominant browser in many areas of the world; in fact, in Asia, Opera and Nokia (with their combined 66.33% market share) are way ahead of the third-place browser (which is BlackBerry, with a 9.81% share). And Deloitte has predicted that 2011 will be the year of the tablet and that 50% of the “computing devices” sold will not be PCs.

Chances are that, unless you’ve really thought about it (and if you have, then you probably don’t need to read this article), your website won’t work in all of those cases.

When designing for the portable Web, we need to be aware of three things: design, content and integration. Approached in the right way, we can create websites that are accessible across the widest user base and that provide a consistent experience regardless of access method.

Consistent? How?
When faced with a multitude of devices to design for, all with varying specifications, the last goal that might come to mind is consistency, and with good reason. And yet we should be striving to achieve consistency. Not in design but in experience.

Conceptually, we should be thinking about our design in two layers: the core content or service, and then the display layer. The core of our website should not change from device to device, and it should deliver a consistent experience. As we shall see shortly, this means that we must ensure that elements such as the content and the navigation patterns work the same way always.

The web’s future consists of vast possibilities, considering them all is virtually impossible. That is why we need consistency! Image: Juhan Sonin
Let’s say our user is at work and is browsing our website on an iPad. They work through the carefully designed navigation hierarchy to get to the piece of content that they want to read, but they are interrupted by a phone call and have to stop browsing. Later, on the train home, they access the website again, this time via their phone. The visual elements of the design will be different—by necessity—but crucially, the routes they took to find the content should be exactly the same, as should the content they read when they got there.

This consistency of experience is what will allow us to create great websites for the portable Web and a complete user experience.

Where Do I Start? And How Will I Know When I Get There?
If a single consistent experience is our goal, this begs the question, should we create a mobile website that scales up or a desktop website that degrades?

The answer is neither. We should try to create a single design that can be used across all devices without alteration. But in practice, at least for the moment, we should start with the simplest website and work up.

Why? Let’s go back to the introduction. On the portable Web, we have no control over how our content will be used or viewed, and as such we must throw out the idea that we are designing for a particular device or device size. We must approach the design of our website in a different way, one in which we create the core content or service first. After all, this will define our website in the end, not the visual elements. This may seem difficult initially, but as we shall see, many of the best practices for desktop website development hold true for the portable Web, especially with regard to content structure.

To recap, here are the key rules to bear in mind when working through a design for the portable Web:
1.The website should be available to as wide an audience as possible;
2.The website should contain the same content wherever it is viewed, where feasible;
3.The website’s structure should be the same wherever it is viewed;
4.The content should be displayed in a manner that is appropriate to its environment.
A website that meets all of these criteria would fit snugly in the future portable Web. But how do we go about making our websites do this?

Designing For The Portable Web
Design Using Web Standards: That Means HTML5
The good news is that the two most common browser engines on mobile, Webkit and Opera, both support HTML5 very well; Webkit has supported HTML5 at least partially since November 2007.

Using standard and descriptive mark-up throughout our websites will have the benefit of delivering consistent output across most devices. And the extended capabilities of HTML5 to deliver media, animation and local storage make it a great choice for mobile applications.

These three abilities allow HTML5 websites to reproduce behaviours usually associated with native mobile applications, closing the experience gap between the two. Video can now be played natively through HTML5 using the video tag, while animations can be played using the HTML5 canvas. Finally, local storage allows a website to store database-like information on the device, allowing for fully functional offline use of the website.

YouTube, Netflix and Gmail all have HTML5 versions of their websites that are designed for the mobile experience and that take advantage of the new capabilities of HTML5. They’re a great starting point for any developer who wants to see what can be achieved.

HTML5 is now ready to be used for development, and there’s no reason why you can’t start right away. Many excellent resources and tutorials are available to help you get started:
■Dive into HTML5
An overview of the HTML5 standard and a great starting point.
■HTML5 Demos and Examples
A series of demonstrations showing the capabilities of HTML5, with source code.
■HTML5 Gallery
A showcase of websites created in HTML5.
To get started using HTML5 in your projects, you can take advantage of any one of the number of development environments that support it. The most complete implementation is through Adobe’s Dreamweaver CS5; an HTML5 support pack can be downloaded that extends the built-in editor. Aptana also supports HTML5 in its beta of Aptana Studio 3. Links are provided at the end of this article.

Start Simple, Work Up
Thinking portable means thinking clean and simple. The wide variation in screen sizes—from a 40-inch LCD running at 1920 × 1080 pixels to a portrait-orientation mobile screen at 320 × 240 pixels—means that we must create designs that are scalable and adaptive. We must also be aware that someone may be interacting via a remote control or a fat stubby finger on a touchscreen. The simpler the design, the more adaptable it will be.

Bottom up conceptualizing males sense. Concentrate on the basic elements and let the context evolve around them. Image: Andrei Bocan
Create your basic website structure first and add only your core styles, the ones that are applicable to all devices and layouts. Starting simple gives us a great base on which to build. Essentially, we are starting from the most basic experience, available on even the smallest mobile device, and working our way up to the more capable desktop browsers.

Using @media queries in the CSS will enable your website to recognize the additional capabilities of desktop browsers and scale up for these environments, presenting a fuller and more interactive experience where possible.

A word of caution and a reason why we don’t work the other way around by degrading a desktop website to a mobile one: @media queries are not supported by every mobile device. Rachel Andrews provides a good overview of @media queries here on Smashing Magazine, albeit working from desktop to mobile, rather than the other way round.

Forget About Proprietary
Whatever you do, stay away from the proprietary technologies, because that’s the one way to guarantee an inconsistent experience. Flash and Silverlight as development platforms are living on borrowed time. Microsoft has already indicated that it will discontinue Silverlight development to concentrate on HTML5, while Flash is being used mainly as a game development platform and video-delivery mechanism. If we are going to create truly cross-platform websites that display consistently across all devices, then Flash and Silverlight are not wise choices because we cannot be certain that they will be installed on the user’s device. Not to say that Flash doesn’t have its place; as a platform for Web-based games, it is currently unrivalled. It’s about choosing the best technologies for the job at hand.

Be Wary of JavaScript… for the Time Being
The bad news is that we may have to sacrifice some of the things we take for granted now. We must learn to design for unknown screen sizes and ratios and allow content to flow as required. Think less about design structure and page layout and more about content structure.

We may have to forgo using JavaScript and AJAX (both staples of desktop development) to create more involving user experiences, because some lower-end devices will not have the hardware muscle to deal with complex libraries. Trimming page weight will also be a priority because we cannot be certain that end users will have broadband-speed access to the Internet, so large libraries will be unacceptable overhead.

This is particularly important in light of the recent “hash bang” trend, started with Gawker Media’s controversial redesign of its websites. The websites (including Gizmodo, Lifehacker and Gawker) present a more application-like experience to users, but do so by relying on JavaScript for content delivery. In effect, the websites consist of a single page that is loaded with dynamic content on request, instead of the multiple pages that they consisted of previously. Any users whose browsers cannot process the JavaScript, for whatever reason, will be unable to browse the website; they are greeted with only a blank page.

However, a number of libraries are being developed to be lightweight and usable on portable devices. jQuery has an alpha of its mobile library available for testing. The project has the backing of industry players such as BlackBerry, Mozilla and Adobe, so it is worth keeping an eye on.

JavaScript support will mature as devices worldwide move onto more modern platforms and as older devices are taken out of service. But for the time being, a conservative approach to its use will mean a wider potential audience for your work.

Test, Test, Then Test Again
On the portable Web, there’s a good chance we won’t be able to test against every possible platform on which our content will be viewed. But that doesn’t take away the need to test. And test we must.
Opera Mini’s emulator lets you test your website in a virtual browser.

Buying a device from each platform would be prohibitive for the majority of designers. But alternatives are available. For most of the main platforms, device emulators are available that simulate the browsing experience. See the resources section at the end of this article for links.

At the other end of the scale, a paid service is available from DeviceAnywhere, which enables you to test your website on over 2000 handsets.

Unfortunately, there are no Internet TV emulators so far, but Google has released a guide to designing for Google TV.

Finally, of course, we mustn’t forget to test on our desktop browsers, too. The aim of designing for the portable Web is to create a single experience across as wide a set of devices as possible. Just because users are able to browse the Web in many ways doesn’t mean they will stop using their desktop, laptop or netbook. Use this to your advantage when testing simply by resizing your browser to ensure that your design scales and flows appropriately. The emulators will provide you with an exact rendering of your website on other devices.

The Ugly Duckling?
So, does the portable Web defy beauty and kick sand in the face of outstanding design? Of course not. Great design is not only about visual imagery, but about presenting information clearly, which involves hierarchy and importance through innovative and well-thought out typography, layouts and navigation. Which brings us to…

Content For The Portable Web
Content is once again king. The rise of Quora should be enough to convince anyone of that; it is a service based solely on content. On the portable Web, this is doubly true. By paring down the design elements, you leave even more focus on the content.

Understand What’s Important
Identifying what is most critical to users should be your first task when developing a portable website. There may not be room for complex navigation, especially on smaller screens, so keep it simple. Compare the mobile and desktop versions of YouTube’s start page:
YouTube’s standard home page.

YouTube’s HTML5-based home page works brilliantly on small screens.

Create a Solid Information Hierarchy
Structuring our content is important, for both readability and SEO. Understanding the content that we are presenting is essential to creating clear information hierarchies that guide users through it.

Map the user’s possible journeys through your content. There should be a clear route to every piece of content, starting with the top-level information categories and getting more granular with each click.

John Lewis’ mobile website has a clear information hierarchy to aid navigation.

A good example of this is the mobile website of John Lewis, a UK-based department store. From the home page, you can easily drill down to each department, and from there to individual products. It’s simple, and it also means that the amount of information on any given page is not overwhelming and that you know exactly where you are in the hierarchy at all times.

Keep Content Available
Even if users aren’t on a desktop, don’t treat them as second-class citizens. Provide as much content as is feasible. And for what content there is, present it appropriately. Remove the following:
■Superfluous images
If an image isn’t essential to the content, get rid of it.
■Unsupported file formats
Don’t include Flash or even the Flash placeholder if the file likely can’t be played.
■Unnecessary text
Good desktop copy doesn’t necessarily make for good portable copy. Is that second customer testimonial absolutely necessary? If not, remove it.
While we want to remove unnecessary content, we don’t want to remove too much. In the example below, we have a simple accessible website, but one that has no depth. The first level of information is presented well, but the headings for the company’s services at the bottom of the page should link to the next level of information. The way it is, if I want to find out more, I’m forced to visit the non-optimized website. This is a poor user experience, because it makes finding what I need more difficult.

Sapient Nitro’s mobile website displays really well but cuts a lot of information from the full website.

Integration And The Portable Web
If services are to become the new hub of the Internet, keeping our websites linked to these services becomes paramount.

Keep It Modular
Services will come and go (although the main ones will certainly remain for a long time yet… yes, I’m looking at you, Facebook), so keep your design modular. Being able to integrate with new services as they come online and to prune away those that have fallen by the wayside will ensure that your content is available to the widest possible audience.

The goal is to make it easy to push your content across multiple services and thus integrate your content into the fabric of the Web. Primarily, this will be through search engine optimization and social sharing.

Make Your Content Search-Engine Friendly
While the way people access content is becoming more social and less search-based, search engines are still a massive source of traffic. Keeping your content formatted for easy retrieval is a must. Quora has done this extremely well, leading to high rankings across the major search engines and generating traffic for its next-generation Q&A service. SEO may be old hat for some, but as quality of content becomes increasingly important, it will gain new life.
Quora plays nice with search engines, with great results.

Make Sharing Easy
SEO is important, but so are direct connections to other services through OAuth, OpenGraph and OpenID. If this isn’t an option for you, then at the very least give users some way to share your content. Services like AddThis and ShareThis make it simple to add sharing capabilities; take advantage of them. A single tweet can generate a lot of activity. Of course, modern development and content platforms such as WordPress have this functionality built in.

Bringing these three elements together will create websites that are discoverable, consistent and usable. Just one question now is raising its ugly head…

What About Apps? Aren’t They The Way Forward?
Apps are big business. Gartner forecasts that mobile app store revenue will top $15 billion in 2011. It’s no surprise that Google, Microsoft, Nokia and others are trying to get in on the act. But just because app stores are commercially successful, does it mean they should be our first point of call when designing for the future Web?

Let’s look at why one might want to create an app:
■Easy to purchase, install, use and throw away
Apps are so usable that even your granny could use them. Installing them on a smartphone is a smooth process that requires minimal involvement from the user. And when you’ve had enough, you simply delete it and no trace of the app remains. This is a great user experience, period. That’s why Apple is now pushing the same concept for full-blown Mac apps through the Mac App Store. Apps also provide, in most cases, a good user experience, with their native controls and design patterns.
■Brand association and lock-in
Apps are designed to do one thing and do it well. The most successful apps are exercises in brand association: “I want to search the Web, so I’ll use the Google app,” or “I want to check up on my friends, so I’ll use the Facebook app.” You experience the brand through the app. I could easily use the Safari browser on the iPhone to access both Facebook and Google, but the apps make it easy for me. I’m locked into the experience, which is great for the companies because their brands get planted square in the middle of my home screen; in this case, a big F and a big G.
■Money
The most attractive thing about apps to many companies is the profit. Apple’s App Store has shown that monetizing content is possible. Even for independent developers, making a lot of money in a relatively short period of time is possible.
What’s remarkable about all of these points is that they have nothing to do with information consumption. They are all about brand and user experience. However, there are also reasons why you should think twice:
■Apps are information silos:
Apps do what they do well. But they don’t do a good job of bringing in the wider Web. Want to follow a link? You might be able to view the page in app, but you’re just as likely to get thrown out into the browser. That’s not a good user experience. You also lose control of the user’s actions and their focus on your content.
■Apps are platform-specific:
Writing an app automatically ties you to the platform you are writing it for. This immediately limits your potential audience. Smartphone penetration is growing but is still a small segment of the overall Internet-enabled phone market. To take the US market as an example, even though 31% of the population have smartphones, only 6% of the population have iPhones. That’s 19 million out 307 million. If you released an iOS-only app in the US, you would immediately lose 76.17 million potential users.
■Apps work best for big brands and services:
Regardless of how good the app is, you have to find a way to get it discovered among the tidal wave of apps that are released into app stores every day. Big brands can push their apps through their existing Web presence, but that’s a lot more difficult for smaller brands. And unless you can generate a lot of relevant content regularly, as the major services do, your app will be consigned to the trash very quickly. Research by Pinch Media (now Flurry) shows that free apps are used primarily in the first 10 days following installation, and then rapidly trail off to around 2% of the installation base after 70 days. Paid application usage drops off even more quickly.
■Mobile users prefer browsers over apps:
A study by Keynote Systems in October 2010 shows that users prefer mobile websites for nearly all types of Web content. The only categories in which apps came out on top were social networking, music and games, which makes sense because these apps usually take full advantage of a native platform’s capabilities.
So, if we want to create something with more permanence, that can evolve at a speed that suits us and our clients, then we need to look away from mobile apps and towards the mobile Web. We must execute good design, thoughtful content and solid integration to tie our portable websites into the social infrastructure of the Web.

Conclusion
The fully portable Web may not be here right now, but it will be before we know it. As it was with the browser wars, developers and designers must re-educate themselves to become the driving force behind these changes and be brave enough to let go of current design thinking and work to set new standards. Understanding how to create online presences that take full advantage of all platforms and preparing for the future shape of the Web will position us not just as technicians, but as people who can provide real value to our clients.

Resources
The HTML5 editors and device emulators mentioned above can be downloaded from the following websites.

HTML5 development environments:
■CS5 HTML5 Support Pack (part of the CS5 11.0.3 Updater), plus tutorial
■Aptana 3 beta

Device emulators:
■Android
■Opera Mini
■Apple iPhone (via iOS SDK)
■Windows Mobile (look for the latest Windows Phone Developer Tools)
■BlackBerry
■Nokia Mobile Browser Sphere: Related Content

How Digg is Built

Written by Dave Beckett • Filed under Technology

At Digg we have substantially rebuilt our infrastructure over the last year in what we call "Digg V4". This blog post gives a high-level view of the systems and technologies involved and how we use them. Read on to find out the secrets of the Digg engineers!

Let us start by reviewing the public products that Digg provides to users:
1. a social news site for all users,
2. a personalized social news site for an individual,
3. an Ads platform,
4. an API service and
5. blog and documentation sites.

These sites are primarily accessed by people visiting in browsers or applications. Some people have Digg user accounts and are logged in; they get the personalized site My News. Everyone gets the all user site which we call Top News. These products are all seen on 'digg.com' and the mobile site 'm.digg.com'. The API service is at 'services.digg.com'. Finally, there are the 'about.digg.com' (this one) and 'developers.digg.com' sites which together provide the company blog and documentation for users, publishers, advertisers and developers.

This post will mainly cover the high-level technology of the social news products.

What we are trying to do
We are trying to build social news sites based on user-submitted stories and advertiser-submitted ad content.
Story Submission: The stories are submitted by logged-in users with some descriptive fields: a title, a paragraph, a media type, a topic and a possible thumbnail. These fields are extracted from the source document by a variety of metadata standards (such as the Facebook open graph protocol, OEmbed plus some filtering) but the submitter has the final edit on all of these. Ads are submitted by publishers to a separate system but if Dugg enough, may become stories.

Story Lists: All stories are shown in multiple "story lists" such as Most Recent (by date, newest first), by topic of the story, by media type and if you follow the submitted user, in the personalized social news product, MyNews.

Story Actions: Users can do actions on the stories and story lists such as read them, click on them, Digg them, Bury them, make comments, vote on the comments and more. A non-logged in user can only read or click stories.

Story Promotion: Several times per hour we determine stories to move from recent story lists to the Top News story list. The algorithm (our secret sauce!) picks the stories by looking at both user actions and content classification features.

How do we do it?
Let us take a look at a high level view of how somebody visiting one of the Digg sites gets served with content and can do actions. The following picture shows the public view and the boundary to the internal services that Digg uses to provide the Pages, Images or API requests.
The edge of our internal systems is simplified here but does show that the API Servers proxy requests to our internal back end services servers. The front end servers are virtually stateless (apart from some caching) and rely on the same service layer. The CMS and Ads systems will not be described further in this post.

Taking a look at the internal high level services in an abstract fashion, these can be generally divided into two system parts:
- Online or Interactive or Synchronous
Serve user requests for a page or API directly or indirectly. These have to return the response in some number of milliseconds (for services) which in aggregate to the user cannot be more than 1 or 2 seconds to give a good page response. This includes AJAX requests which are asynchronous for the user in the browser but are request/response from the serving system point of view.
- Offline or Batch or Asynchronous
Serve requests that are not in the interactive request-response loop and are typically only indirectly initiated by a user. The work here can take from seconds, minutes or hours (rarely).
The two parts above are used in Digg as shown in this diagram:
Looking deeper into the components.

Online Systems
The applications serving pages or API requests are mainly written in PHP (Front End, Drupal CMS) and Python (API server) using Tornado. They call the back end services via the Thrift protocol to a set of services written in Python. Many things are cached in the online applications (FE and BE) using Memcached and Redis; some items are primarily stored in Redis too, described below.

Messaging and Events
The online and offline worlds are connected in a synchronous fashion by calls to the primary data stores, transient / logging systems and in an asynchronous way using RabbitMQ to queue up events that have happened like "a user Dugg a story" or jobs to perform such as "please compute this thing".

Batch and Asynchronous Systems
When a message is found in a queue, a job worker is called to perform the specific action. Some messages are triggered by a time based cron-like mechanism too. The workers then typically work on some of the data in the primary stores or offline stores e.g. Logs in HDFS and then usually write the results back into one of the primary stores so that the online services can use them. Examples here are things like indexing new stories, calculating the promotion algorithm, running analytics jobs over site activity.

Data Stores
Digg stores data in multiple types of systems depending on the type of data and the access patterns, and also for historical reasons in some cases :)

Cassandra: The primary store for "Object-like" access patterns for such things as Items (stories), Users, Diggs and the indexes that surround them. Since the Cassandra 0.6 version we use does not support secondary indexes, these are computed by application logic and stored here. This allows the services to look up, for example, a user by their username or email address rather than the user ID. We use it via the Python Lazyboy wrapper.

HDFS: Logs from site and API events, user activity. Data source and destination for batch jobs run with Map-Reduce and Hive in Hadoop. Big Data and Big Compute!

MogileFS: Stores image binaries for user icons, screenshots and other static assets. This is the backend store for the CDN origin servers which are an aspect of the Front End systems and can be fronted by different CDN vendors.

MySQL: This is mainly the current store for the story promotion algorithm and calculations, because it requires lots of JOIN heavy operations which is not a natural fit for the other data stores at this time. However... HBase looks interesting.

Redis: The primary store for the personalized news data because it needs to be different for every user and quick to access and update. We use Redis to provide the Digg Streaming API and also for the real time view and click counts since it provides super low latency as a memory-based data storage system.

SOLR: Used as the search index for text queries along with some structured fields like date, topic.

Scribe: the log collecting service. Although this is a primary store, the logs are rotated out of this system regularly and summaries written to HDFS.

Operating System and Configuration
Digg runs on Debian stable based GNU/Linux servers which we configure with Clusto, Puppet and using a configuration system over Zookeeper.

More
In future blog posts we will describe in more detail some of the systems outlined here. Watch this space!

If you have feedback on this post or suggestions on what we should write about, please let us know. This post was written by Dave (@dajobe).
--------------------------------------------------------------------------------
This could be considered a followup to the How Digg Works post from 2008describing the earlier architecture. Sphere: Related Content

Non-blocking Programmers

Wednesday, March 30, 2011
It is often stated that the productivity of individual programmers varies by an order of magnitude, and there is significant research supporting the 10x claim. More subjectively, I suspect every working developer quickly realizes that the productivity of their peers varies tremendously. Having been at this for a while, I suspect there is no single factor or even small number of factors which cause this variance. Instead there are a whole collection of practices, all of which which add up to determine an individual developer's productivity.

To make things more interesting, many of the practices conflict in some way. We'll discuss three of them today.

1. Non-Blocking Operation
We don't just write code right up until the moment the product ships. Numerous steps depend on other people: code reviews, dependent modules or APIs, a test cycle, etc. When faced with a delay to wait for someone else, a developer can choose several possible responses.
blocking: while waiting for the response do something other than produce code for the project. Codelabs, reading related documentation, and browsing the programming reddit are all examples.
non-blocking: switch to a different coding task in another workspace.

Versatility and wide ranging knowledge is a definite positive (see point 2), and people who spend time satisfying intellectual curiosity grow into better developers. The blocking developer spends time pursuing those interests. We'll ignore the less positive variations on this.

The non-blocking programmer makes progress on a different development task. This can of course be taken too far: having a dozen workspaces and context switching to every one of them each day isn't productivity, its ADHD.

One could also label these as single-tasking versus multi-tasking, but that analogy implies more than I intend.

Sometimes developers maximize their own productivity by immediately interrupting the person they are waiting for, generally with the lead-in of "I just sent you email." This impacts point 3, the amount of time developers can spend in a productive zone, and is one of the conflicts between practices which impact overall productivity.

2. Versatile Techniques
Here I'm obliged to make reference to a craftsman's toolbox, with hammers and nails and planers and other woodworking tools I haven't the slightest idea what to do with. The essential point is valid without understanding the specifics of carpentry: a developer with wide ranging expertise can bring more creative solutions to bear on a problem. For example:
  • Realizing that complex inputs would be better handled by a parser than an increasingly creaky collection of string processing and regex.
  • Recognizing that a collection of data would be better represented as a graph, or processed using a declarative language
  • Recall having read about just the right library to solve a specific problem
Developers with a curiosity about their craft grow into better developers. This takes time away from the immediate work of pounding out code (point 1), but makes one more effective over the long run.

3. Typing Speed
This sounds too trivial to list, but the ability to type properly does make a difference. Steve Yegge dedicated an entire post to the topic. I concur with his assessment that the ability to touch type matters, far more than most developers think it should. I'll pay further homage to Yegge with a really long explanation as to why.

Developers work N hours per day, where N varies considerably, but the entire time is not spent writing code. We have interruptions, from meetings to questions from colleagues to physical necessities. The amount of time spent actually developing can be a small slice of one's day. More pressingly, we don't just sit down and immediately start pounding out program statements. There is a warm up period, to recall to mind the details of what is being worked on. Reviewing notes, re-reading code produced in the previous session, and so forth get one back into the mode of being productive. Interruptions which disrupt this productive mode have far greater impact than the few minutes it takes to answer the question.

Peopleware, the classic book on the productivity of programmers, refers to this focussed state as "flow" and devotes sections of the book to suggestions on how to maximize it. As the book was published in 1987, some of the suggestions now seem quaint like installing voice mail and allowing developers to turn off the telephone ringer. The essential point remains though: a block of time is far more useful than the same amount of time broken up by interruptions, and developers do well to maximize these blocks of time.

Once in the zone, thoughts race ahead to the next several steps in what needs to be done. Ability to type quickly and accurately maximizes the effectiveness of time spent in the flow of programming. Hunting and pecking means you only capture a fraction of what could have been done.

There are other factors relating to flow which can be optimized. For example one can block off chunks of time, or work at odd hours when interruptions are minimal. Yet control of the calendar isn't entirely up to the individual, while learning to type most definitely is.

Conclusion
The most effective, productive programmer I know talks very fast and types even faster. He has worked in a number of different problem spaces in his career, and stays current by reading Communications of the ACM and other publications. He handles interruptions well, getting back into the flow of programming very quickly. He also swears profusely, though I suspect that isn't really a productivity factor.

Other highly effective programmers have different habits. The most important thing is to be aware of how to maximize your own effectiveness, rather than look for a single solution or adopt someone else's techniques wholesale. Especially not the swearing. Sphere: Related Content

ZED SHAW ( this is not a blog )

The Master, The Expert, The Programmer

I spent most of my high school years living on Guam trying to stay alive long enough to leave and start a new life. It wasn’t a good time for me, and about the only good thing that came out of it was I started studying martial arts. These days I’m a lazy bastard, but back in the day I studied everything I could get my hands on. It was rough, but I came out of it fine and I’ve since used my knowledge of martial arts in just about everything I’ve done. Each one I studied taught me something different. Capoeira taught me that being balanced is more about being able to adapt and flex than root your stance. Aikido taught me that attacking a problem directly is rarely the solution. Muay Thai taught me that destroying the base will destroy the building. I studied Muay Thai, Ninjitsu, Wing Tsung, Judo, various weapons, and even spent a year getting the crap beat out of me by some rough sword fighters in the SCA. Unfortunately I never studied anything long enough to be considered very good at it. I just took what I found and moved on to the next interesting thing. What does this have to do with programming?

The Master
When I started studying Aikido I read “The Book of Five Rings” by Miyamoto Musashi about 20 times. I had a badly stained copy from walking to and from class every night in the rain. It was folded and written on as I tried to figure out what the hell he meant. I never really figured it out. I guess you had to be a Japanese Samurai from the 1700’s to figure it out. I did figure out though that Musashi was able to defeat his enemies by using unusual and revolutionary tactics. From what I understand of the history of Japanese martial arts, nobody used both swords at the same time until Musashi started doing it. They had two, they just never used them both simultaneously. Musashi was the first known man to break out both swords and defeat a very large number of people with them. He did things differently and it worked.

In Musashi’s case it wasn’t something he came up with and then developed later, but rather an accident that came out of necessity. He was apparently being attacked by a large near army of men, and in order to keep alive he took out his wakizashi to defend himself (that’s the small sword). He later developed it into a complete system, but originally it was self preservation. He also did this after spending his whole adult life studying combat, the sword, and strategy in general. Even though it came from necessity, he had the tools and training necessary to make this radical leap possible. Without this training, he probably would have been killed trying to wield two swords at once.

Later in life Musashi retired and pretty much just disappeared from the world. He taught a couple of people, but mostly his art would have died with him if he hadn’t written one very little book, “The Book of Five Rings”. In this book Musashi lays out what he knows about strategy in probably one of the most concise and best written treatise on the subject. Every other person who wrote about strategy wrote huge tomes. If you think “The Art of War” is small, keep in mind that the Chinese had a huge number of “official” texts on strategy, of which “The Art of War” was only a minor part focusing on how war fits with Chinese politics of the era. “The Book of Five Rings” was different though, it wasn’t specific to any era, it was small, it was austere, and it was like nothing before it. It was the final culmination of one man’s mastery.

After reading books on martial arts history for years, and studying everything I can, I started to see a commonly understood pattern. Almost all people considered masters of their art finally come to such a deep knowledge that they can do more with less. Rather than a flurry of complicated leaping and jumping, the master will simply step to the side and make one calculated strike. Every story about old masters is the same in that, even though they were frail and near death, their knowledge and abilities were so deep and clear that their simplest motions had the greatest power. For a master, the pompous and flowery motions were just wastes of energy.

Apparently though, these are all just stories I’d heard. Nothing more than myths and legends that I was passed from my teachers and friends. But it’s these stories that make up what we perceived as a “masterful” person.

A similar story comes from a completely different side of the world and a different era. I read a story about Mestre Bimba—the originator of modern Capoeira from Brazil—where he fought a challenger in his 20’s. Mestre Bimba was in his 80s had limited mobility, but he still took the young buck on. The young guy started the fight by flying into the air doing this impressive flipping motion, ready to kick some ass. Mestre Bimba just rolled out in a slow cartwheel, and stuck his foot out. The hot-head, still flipping, comes out of the maneuver and right into Mestre Bimba’s steady foot, smashing himself unconscious. When he woke up he said, “What..what was that?”

“That was my foot my son,” is all Mestre Bimba said.

This kind of story is so common in the lore of the martial arts that it’s impossible to study a martial art and not hear at least one. Every teacher I’ve learned from had similar stories about their senseis, sifus, and mestres. Each one is about how some frail old man (or woman) could do amazing things with just the simplest of motions. These lessons all taught me the same thing, “A master wastes no energy. Every motion is precious.” A master makes everything look effortless. Nothing is frustrating or difficult for them because they do nothing that isn’t necessary. The master’s actions are pure and elegant.

The Expert
In all my martial arts studies I’d always consider myself a novice. I never studied long enough to be an “expert”, but when I was younger I thought that I was. I studied so much that I couldn’t help being some kind of expert. I found that I could get very good at something rather quickly, but mastering something took far longer. I don’t think I’ve ever mastered anything I’ve studied. I’m just an expert, probably even less now that I don’t study regularly.

I have met quite a few false masters though. These are people who may be very good, and much better than myself. I usually took classes from them, but not because I thought they were “grand masters”, or “masters” or anything. These guys (they were always guys, women are hard to find) were flashy. They could do neat things, could teach really complex techniques, and could tell you every single thing about their martial art possible. Charging you for lesson after lesson was how they made their money after all. Teaching you movie stunt man moves was how they attracted and kept students.

Yet, none of these gentlemen were what I’d consider masters. They were great teachers, and I don’t want to insult them in any way, but none of them were masters of their art. None of them could clearly and simply explain their martial art’s concepts. When I’d ask a complicated question, they would give me a complicated answer. Sometimes their answers were just wrong. Like one guy who tried to show everyone how to break out of an arm lock by punching. He asked me to do the arm lock since I studied Judo, so I did it right and made sure that I rotated with him as he tried to punch. He kept trying to punch me, and I just kept rotating. I really wasn’t paying attention until he suddenly burst out, “Dammit stand still so I can demonstrate.” I said “sorry” but thought, “Yeah, like I’m gonna stand still.” He could have kicked my ass two ways from Sunday, but a simple arm lock frustrated him?

The main thing I noticed about the experts I’ve encountered is they are into impressing you with their abilities. They are usually incredibly good, but their need for recognition gets in the way of mastery. Everything they do is an attempt to prove themselves and in order to do this they must perform like an actor on stage. There’s nothing wrong with this, and I don’t think the expert can become a master without going through this stage in life. At some point though, the expert becomes comfortable with themselves or fed up with impressing everyone and starts to look inward to the core of their art.

The Programmer
I’m going to come out looking like an obnoxious pompous asshole here, but I’m not trying to be one. I’m simply trying to explain something I’ve noticed about the difference between code written by myself and that which “frustrated experts” write. I’m in no way saying that I’m some kind of grand master coder. I consider myself an advanced expert at best.

What I notice is that my peers are progressing to more and more complicated and convoluted designs. They are impressed with the flashiest APIs, the biggest buzzwords, and the most intricate of useless features. They are more than happy to write endless unit tests to test their endless refactoring all the while claiming that they follow XP’s “the simplest thing that works” mantra. I’ve actually seen a guy take a single class that did nothing more than encapsulate the addition of two strings, and somehow “refactor” it to be four classes and two interfaces. How is this improving things? How can more somehow equal simpler? This should never be the case.

These are the actions of an expert. These experts are very smart, capable, and skilled, but they are too busy impressing everyone to realize that their actions are only making things worse for themselves. In the end all of their impressive designs are doing nothing but making more work for themselves and everyone around them. It’s as if their work is only designed for getting them their next job, rather than keeping them in their current one.

I used to be this way. I used to love complicated designs and read everything I could about complicated technologies. But as I get more experienced and “older” as a programmer I find complex things just annoying. They aren’t a mental challenge to understand anymore, they are just irritating. I’ll pick apart the flashy crap, boil down the technology to its essence and then come up with a much simpler design for the task at hand almost every time.

What worries me though is how the experts react to my simplified designs. Typically they’ll say that what I’ve written is not “following best practices” or “isn’t well designed.” They’ll propose these endlessly complex designs with endlessly imagined failure scenarios, and not realize that what they are doing will be a nightmare to maintain. The experts will then saunter off to implement their Flaming Tower of Babel without any comments, horribly complex mock enabled tests, making sure EVERY SINGLE CLASS HAS AN INTERFACE, and ending every class with “Impl” because, well, that’s the best practice. After implementing it they’ll continue to complicate the design even further with endless seemingly aimless refactorings for no other reason than to refactor. And when they’re done, I’ll go in and read through their code and cry.

This is the actions of an expert. They love complexity because the art is still new to them, something which should be explored. A list is not just a container, it’s a linked list, or red-black tree, or doubly linked list. To me, it’s just a container. I realize now that they’re missing my need for simple beautiful things. They don’t love quiet elegance, and would rather shout their superiority from the top of the mountain. Meanwhile, I’m just a lazy old man who wants to get his job done and write something without any wasted energy. I want to climb the mountain with the least amount of effort and their shouting is causing an avalanche of bad code.

The Coming Professional Master
Programming is a very new discipline, so there’s not too many master programmers out there. What’s worse is that the few people I would consider masters aren’t very exemplary of the software profession and art. They are typically professors who never write anything under a deadline and are given complete artistic freedom to develop whatever they want. Take Donald Knuth, who was able to take three years off from teaching in order to complete TeX. There’s no way I could get away with telling my employer that it’ll take me three years to finish their product. Knuth is basically a “master amateur”. A guy who worked in a complete utopia and was able to hone his skills without interference. I would compare him with a man who became a master by studying at a monastery for for his entire life.

In contrast there are masters in the martial arts who learned their art as a means of survival and became masters in a realistic and hostile environment. We don’t have anyone like this in the programming profession, or at least I haven’t met any. I believe that my generation of developers will produce the kind of masters forged in the real professional world. (Yes, sorry professors, if you can’t get fired for missing a deadline then you aren’t a real programmer working in the real world.) Hopefully software development will continue as a profession and we’ll see a crop of master programmers emerge from industry to challenge the existing amateur masters. But, if the current experts continue to push for ever more complicated, convoluted, involved, and “impressive” designs and ideas then we’re in for a world of hurt.

So my final plea for all my fellow experts out there: Can we please start pushing the art and science of software development toward the austere? I’d love someday to hear a young coder tell a story about someone they idolized like, “There was this guy I worked with who once optimized a complicated red- black tree getting 300% performance boost. I was baffled and ask, 'How’d you do that? That’s impossible.’ To which he responded…”

“'That’s my linked list my son.’”

All content Copyright (C) Zed A. Shaw since like 2000 or something like that.

FretWar.com - Guitar Competition and Education

Andt.hn - Collaborative choose your own adventure never ending concept album

Oppugn.us - Rant competition

SongBe.At - A music microblog

@zedshaw - My Twitter

Librelist.com - Free mailing lists for open source projects
--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--
Old Essays - Old essays still hosted here

Old Videos - Videos of me speaking

Old Blog Posts - Blog before moving it Sphere: Related Content

27/3/11

Speeding Up Your Website’s Database

Website speed has always been a big issue, and it has become even more important since April 2010, when Google decided to use it in search rankings. However, the focus of the discussion is generally on minimizing file sizes, improving server settings and optimizing CSS and Javascript.

The discussion glosses over another important factor: the speed with which your pages are actually put together on your server. Most big modern websites store their information in a database and use a language such as PHP or ASP to extract it, turn it into HTML and send it to the Web browser.

So, even if you get your home page down to 1.5 seconds (Google’s threshold for being considered a “fast” website), you can still frustrate customers if your search page takes too much time to respond, or if the product pages load quickly but the “Customer reviews” delay for several seconds.Google’s threshold for a fast-loading website is about 1.5 seconds. This screenshot comes from Google Webmaster Tools (go to [domain name] → Diagnostics → Site Performance).

This article looks at these sorts of issues and describes some simple ways to speed up your website by optimizing your database. It starts with common knowledge but includes more complex techniques at the end, with links to further reading throughout. The article is intended for fearless database beginners and designers who have been thrown in at the deep end.

What Is A Database? What Is SQL?
A database is basically a collection of tables of information, such as a list of customers and their orders. It could be a filing cabinet, a bunch of spreadsheets, a Microsoft Access file or Amazon’s 40 terabytes of book and customer data.

A typical database for a blog has tables for users, categories, posts and comments. WordPress includes these and a few other starter tables. A typical database for an e-commerce website has tables for customers, products, categories, orders and order items (for the contents of shopping baskets). The open-source e-commerce software Magento includes these and many others. Databases have many other uses — such as for content management, customer relations, accounts and invoicing, and events — but these two common types (i.e. for a blog and an e-commerce website) will be referenced throughout this article.

Some tables in a database are connected to other tables. For example, a blog post can have many comments, and a customer can make multiple orders (these are one-to-many relationships). The most complicated type of database relationship is a many-to-many relationship. One relationship is at the core of all e-commerce databases: an order can contain many products, and a single product can be added to many different orders. This is where the “order items” table comes in: it sits between the products and the orders, and it records every time a product is added to an order. This will be relevant later on in the article, when we look at why some database queries are slow.

The word database also refers to the software that contains all this data, as in “My database crashed while I was having breakfast,” or “I really need to upgrade my database.” Popular database software include Microsoft Access 2010, Microsoft SQL Server, MySQL, PostgreSQL and Oracle Database 11g.

The acronym SQL comes up a lot when dealing with databases. It stands for “structured query language” and is pronounced “sequel” or “es-cue-el.” It’s the language used to ask and tell a database things — exciting things like SELECT lastname FROM customers WHERE city='Brighton'. This is called a database query because it queries the database for data. There are other types of database statements: INSERT for putting in new data, UPDATE for updating existing data, DELETE for deleting things, CREATE TABLE for creating tables, ALTER TABLE and many more.

How Can A Database Slow Down A Website?
A brand new empty website will run very fast, but as it grows and ages, you may notice some sluggishness on certain pages, particularly pages with complicated bits of functionality. Suppose you wanted to show “Customers who bought this product also bought…” at the bottom of a page of products. To extract this information from the database, you would need to do the following:

1.Start with the current product,
2.See how many times the product has recently been added to anyone’s shopping basket (the “order items” table from above),
3.Look at the orders related to those shopping baskets (for completed orders only),
4.Find the customers who made those orders,
5.Look at other orders made by those customers,
6.Look at the contents of those orders’ baskets (the “order items” again),
7.Look up the details of those products,
8.Identify the products that appear the most often and display them.
You could, in fact, do all of that in one massive database query, or you could split it up over several different queries. Either way, it might run very quickly when your database has 20 products, 12 customers, 18 orders and 67 order items (i.e. items in shopping baskets). But if it is not written and programmed efficiently, then it will be a lot slower with 500 products, 10,000 customers, 14,000 orders and 100,000 order items, and it will slow down the page.

This is a very complicated example, but it shows what kind of stuff goes on behind the scenes and why a seemingly innocuous bit of functionality can grind a website to a halt.

A website could slow down for many other reasons: the server running low on memory or disc space; another website on the same server consuming resources; the server sending out a lot of emails or churning away at some other task; a software, hardware or network fault; a misconfiguration. Or it may have suddenly become a popular website. The next two sections, therefore, will look at speed in more detail.

Is It My Database?
There are now several ways to analyze your website’s speed, including the Firebug plug-in for Firefox, the developer tools in Google Chrome (press Shift + Control + I, and then go to Resources → Enable Resource Tracking) and Yahoo YSlow. There are also websites such as WebPagetest, where you can enter a URL, and it will time it from your chosen location.

All of these tools will show you a diagram of all of the different resources (HTML, images, CSS and JavaScript files) used by your page, along with how long each took to load. They will also break down the time taken to perform a DNS lookup (i.e. to convert your domain name into an IP address), the time taken to connect to your server, the time spent waiting for your server to reply (aka “time to first byte”), and the time spent receiving (i.e. downloading) the data.

Many Web pages are constructed in their entirety by the Web server, including by PHP that accesses the database, and then sent to the browser all at once, so any database delays would lead to a long waiting time, and the receiving/downloading time would be proportional to the amount of data sent. So, if your 20 kB HTML page has a quick connection, a waiting time of 5 seconds and a download time of 0.05 seconds, then the delay would occur on the server, as the page is being built.

Not all Web pages are like this, though. The PHP flush function forces the browser to send the HTML that it has already built to the browser right away. Any further delays would then be in the receiving time, rather than the waiting time.

Either way, you can compare the waiting/receiving time for your suspected slow and complicated Web page to the waiting time for a similarly sized HTML page (or image or other static resource) on the same server at the same time. This would rule out the possibility of a slow Internet connection or an overloaded server (both of which would cause delays) and allow you to compare the times taken to construct the pages. This is not an exact science, but it should give you some indication of where things are being held up.

The screenshots below show the analysis provide by Google Chrome’s Developer Tools of a 20 kB Web page versus a 20 kB image. The Web page waited 130 milliseconds (ms) and downloaded for 22 ms. The image waited for 51 ms and downloaded for 11 ms. The download/receiving times are about the same, as expected, but the server is spending about 80 ms extra on processing and constructing the Web page, which entails executing the PHP and calling the database.

When performing these tests, analyze the static resource by itself and click “Refresh,” so that you are not getting a quick cached version. Also, run each a few times to ensure that you’re not looking at a statistical anomaly. The third screenshot below shows that WebPagetest indicates almost double the time of Google for the same page at the same time, demonstrating that using the same environment for all tests is important.Resource analysis using Google Chrome’s Developer Tools, showing a 130-ms wait time for a Web page.The same tool, showing a 51-ms wait time for an image of about the same size.Resource analysis of the same page from WebPagetest, with a 296-ms wait time and a 417-ms total time.

How To Time A Database Query In PHP And MySQL
The approach above was general; we can now get very specific. If you suspect that your database might be slowing down your website, then you need to figure out where the delay is coming from. I will define a couple of timing functions, and then use them to time every single database query that is run by a page. The code below is specific to PHP and MySQL, but the method could be used on any database-driven website:
1. function StartTimer ($what='') {
2. global $MYTIMER; $MYTIMER=0; //global variable to store time
3. //if ($_SERVER['REMOTE_ADDR'] != '127.0.0.1') return; //only show for my IP address
4.
5. echo '

';
6. echo "About to run $what. "; flush(); //output this to the browser
7. //$MYTIMER = microtime (true); //in PHP5 you need only this line to get the time
8.
9. list ($usec, $sec) = explode (' ', microtime());
10. $MYTIMER = ((float) $usec + (float) $sec); //set the timer
11. }
12. function StopTimer() {
13. global $MYTIMER; if (!$MYTIMER) return; //no timer has been started
14. list ($usec, $sec) = explode (' ', microtime()); //get the current time
15. $MYTIMER = ((float) $usec + (float) $sec) - $MYTIMER; //the time taken in milliseconds
16. echo 'Took ' . number_format ($MYTIMER, 4) . ' seconds.

'; flush();
17. }


StartTimer starts the timer and also prints whatever you are trying to time. The second line is a check of your IP address. This is very useful if you are doing this (temporarily) on a live website and don’t want everyone in the world to see the timing messages. Uncomment the line by removing the initial //, and replace the 127.0.0.1 with your IP address. StopTimer stops the timer and displays the time taken.

Most modern websites (especially well-programmed open-source ones) have a lot of PHP files but query the database in only a handful of places. Search through all of the PHP files for your website for mysql_db_query or mysql_query. Many software development packages such as BBEdit have functions to perform searches like this; or, if you are familiar with the Linux command line, try this:
grep mysql_query `find . -name \*php`

You may find something like this:
view sourceprint?
1 mysql_query ($sql);

For WordPress 3.0.4, this is on line 1112 of the file wp-includes/wp-db.php. You can copy and paste the functions above into the top of this file (or into any PHP file that is included by every page), and then add the timer before and after the mysql_query line. It will look like this:
1.StartTimer ($query);
2. $this->result = @mysql_query( $query, $dbh );
3. StopTimer();

Below is a partial screenshot of this being done on a brand new WordPress installation. It is running about 15 database queries in total, each taking about 0.0003 seconds (0.3 ms); so, less than 5 ms in total, which is to be expected for an empty database.
This shows and times all of the database queries that WordPress runs.
If you have found this line in other commonly used systems, please share this information by adding to the comments for this article.

You can also do other interesting things with it: you can see how fast your computer is compared to mine. Counting to 10 million takes my computer 2.9420 seconds. My Web server is a bit faster at 2.0726 seconds:
1. StartTimer ('counting to 10000000');
2. for ($i=0; $i<10000000; $i++); //count to a high number
3. StopTimer();

Notes on the Results
This technique gives you only comparative results. If your server was very busy at that moment, then all of the queries would be slower than normal. But you should have at least been able to determine how long a fast query takes on your server (maybe 1 to 5 ms), and therefore identify the slow-ish ones (200+ ms) and the really slow ones (1+ second). You can run the test a few times over the course of an hour or day (but not immediately after — see the section below about the database cache) to make sure you’re not getting a fluke.

This will also most likely severely mess up the graphical presentation of the page. It may also give you PHP warnings like “Cannot modify header information. Headers already sent by…” This is because the timing messages are interfering with cookie and session headers. As long as the page still displays below the warnings, you can ignore them. If the page does not display at all, then you may need to put the StartTimer and StopTimer around specific blocks of code, rather than around mysql_query.

This technique is essentially a quick hack to show some rough results. It should not be left on a live website.

What Else Could It Be?
If your database queries are not particularly slow, but the construction of your Web page is, then you might just have poorly written code. You can put the timer statements above around bigger and bigger blocks of code to see if and where the delay is occurring. It could be that you are looping through 10,000 full rows of product information, even if you are displaying only 20 product names.

Profiling
If you are still baffled and/or want more complete and accurate information about what’s happening in your code, you could try a debugging and profiling tool such as Xdebug, which analyzes a local copy of your website. It can even visually show where bottlenecks are occurring.

Indexing Database Tables
The experiment above may have surprised you by showing just how many database queries a page on your website is running, and hopefully, it has helped you identify particularly slow queries.

Let’s look now at some simple improvements to speed things up. To do this, you’ll need a way to run database queries on your database. Many server administration packages (like cPanel or Plesk) provide phpMyAdmin for this task. Alternatively, you could upload something like phpMiniAdmin to your website; this single PHP file enables you to look at your database and run queries. You’ll need to enter your database name, user name and password. If you don’t know these, you can usually find them in your website’s configuration file, if it has one (in WordPress, it’s wp-config.php).

Among the database queries that your page runs, you probably saw a few WHERE conditions. This is SQL’s way of filtering out results. For instance, if you are looking at an “Account history” type of page on your website, there is probably a query like this to look up all of the orders someone has placed. Something like this:
1 SELECT * FROM orders WHERE customerid = 2;

This retrieves all orders placed by the customer with the database ID 2. On my computer, with 100,000 orders in the database, running this took 0.2158 seconds.

Columns like customerid — which deal with a lot of WHERE conditions with = or < or > and have many possible values, should be indexed. This is like the index at the back of a book: it helps the database quickly retrieve indexed data. This is one of the quickest ways to speed up database queries.

What to Index
In order to know which columns to index, you need to understand a bit about how your database is being used. For example, if your website is often used to look up categories by name or events by date, then these columns should be indexed.
1 SELECT * FROM categories WHERE name = 'Books';
2 SELECT * FROM events WHERE startdate >= '2011-02-07';

Each of your database tables should already have an ID column (often called id, but sometimes ID or articleid or the like) that is listed as a PRIMARY KEY, as in the wp_posts screenshot below. These PRIMARY KEYs are automatically indexed. But you should also index any columns that refer to ID numbers in other tables, such as customerid in the example above. These are sometimes referred to as FOREIGN KEYs.
1. SELECT * FROM orders WHERE customerid = 2;
2. SELECT * FROM orderitems WHERE orderid = 231;

If a lot of text searches are being done, perhaps for descriptions of products or article content, then you can add another type of index called a FULL TEXT index. Queries using a FULL TEXT index can be done over multiple columns and are initially configured to work only with words of four or more letters. They also exclude certain common words like about and words that appear in more than 50% of the rows being searched. However, to use this type of index, you will need to change your SQL queries. Here is a typical text search, the first without and the second with a FULL TEXT index:
1. SELECT * FROM products WHERE name LIKE '%shoe%' OR description LIKE '%shoe%';
2. SELECT * FROM products WHERE MATCH(name,description) AGAINST ('shoe');

It may seem that you should go ahead and index everything. However, while indexing speeds up SELECTs, it slows down INSERTs, UPDATEs and DELETEs. So, if you have a products table that hardly ever changes, you can be more liberal with your indexing. But your orders and order items tables are probably being modified constantly, so you should be more sparing with them.

There are also cases where indexing may not help; for example, if most of the entries in a column have the same value. If you have a stock_status column that stores a value of 1 for “in stock,” and 95% of your products are in stock, then an index wouldn’t help someone search for in-stock products. Imagine if the word the was indexed at the back of a reference book: the index would list almost every page in the book.
1. SELECT * FROM products WHERE stock_status = 1;

How to Index
Using phpMyAdmin or phpMiniAdmin, you can look at the structure of each database table and see whether the relevant columns are already indexed. In phpMyAdmin, click the name of the table and browse to the bottom where it lists “Indexes.” In phpMiniAdmin, click “Show tables” at the top, and then “sct” for the table in question; this will show the database query needed to recreate the table, which will include any indices at the bottom — something like KEY 'orderidindex' ('orderid').Using phpMiniAdmin to check for indices in the WordPress wp_posts table.

If the index does not exist, then you can add it. In phpMyAdmin, below the index, it says “Create an index on 1 columns”; click “Go” here, enter a useful name for the index (like customeridindex), choose the column on the next page, and press “Save,” as seen in this screenshot:Indexing a column using phpMyAdmin.

In phpMiniAdmin, you’ll have to run the following database statement directly in the large SQL query box at the top:
1 ALTER TABLE orders ADD INDEX customeridindex (customerid);

Running the query again after indexing takes only 0.0019 seconds on my computer, 113 times faster.

Adding a FULL TEXT index is a similar process. When you run searches against this index, you must list the same columns:
1 ALTER TABLE articles ADD FULLTEXT(title,author,articletext);
2 SELECT * FROM articles WHERE MATCH(title,author,articletext) AGAINST ('mysql');

Back-Ups and Security
Before altering your database tables in any way, make a back-up of the whole database. You can do this using phpMyAdmin or phpMiniAdmin by clicking “Export.” Especially if your database contains customer information, keep the back-ups in a safe place. You can also use the command mysqldump to back up a database via SSH:
1 mysqldump --user=myuser --password=mypassword
2 --single-transaction --add-drop-table mydatabase
3 > backup`date +%Y%e%d`.sql

These scripts also represent a security risk, because they make it much easier for someone to steal all of your data. While phpMyAdmin is often provided securely though your server management software, phpMiniAdmin is a single file that is very easy to upload and forget about. So, you may want to password-protect it or remove it after usage.

Optimizing Tables
MySQL and other kinds of database software have built-in tools for optimizing their data. If your tables get modified a lot, then you can run the tools regularly to make the database tables smaller and more efficient. But they take some time to run (from a few seconds to a few minutes or more, depending on the size of the tables), and they can block other queries from running on the table during optimization, so doing this at a non-busy time is best. There’s also some debate about how often to optimize, with opinions ranging from never to once in a while to weekly.

To optimize a table, run database statements such as the following in phpMyAdmin or phpMiniAdmin:
1 OPTIMIZE TABLE orders;

For example, before I optimized my orders table with 100,000 orders, it was 31.2 MB in size and took 0.2676 seconds to run SELECT * FROM orders. After its first ever optimization, it shrunk to 30.8 MB and took only 0.0595 seconds.

The PHP function below will optimize all of the tables in your database:
1 function OptimizeAllTables() {
2 $tables = mysql_query ('SHOW TABLES'); //get all the tables
3 while ($table = mysql_fetch_array ($tables))
4 mysql_query ('OPTIMIZE TABLE ' . $table[0]); //optimize them
5 }

Before calling this function, you have to connect to your database. Most modern websites will connect for you, so you don’t need to worry about it, but the relevant MySQL calls are shown here for the sake of completeness:
1 mysql_connect (DB_HOST, DB_USER, DB_PASSWORD);
2 mysql_select_db (DB_NAME);
3 OptimizeAllTables();

Making Sure To Use The Cache
Just as a Web browser caches copies of pages you visit, database software caches popular queries. As above, the query below took 0.0019 seconds when I ran it the first time with an index:
1 SELECT * FROM orders WHERE customerid=2;

Running the same query again right away takes only 0.0004 seconds. This is because MySQL has remembered the results and can return them a second time without looking them up again.

However, many news websites and blogs might have queries like the following to ensure that articles are displayed only after their published date:
1 SELECT * FROM posts WHERE publisheddate > CURDATE();
2 SELECT * FROM articles WHERE publisheddate > NOW();

These queries cannot be cached because they depend on the current time or date. In a table with 100,000 rows, a query like the one above would take about 0.38 seconds every time I run it against an unindexed column on my computer.

If these queries are run on every page of your website, thousands of times per minute, it would speed things up considerably if they were cacheable. You can force queries to use the cache by replacing NOW or CURDATE with an actual time, like so:
1 SELECT * FROM articles WHERE publisheddate > '2011-01-17 17:00';

You can use PHP to make sure the time changes every five minutes or so:
1 $time = time();
2 $currenttime = date ('Y-m-d H:i', $time - ($time % 300));
3 mysql_query (“SELECT * FROM articles WHERE publisheddate > '$currenttime'”);

The percentage sign is the modulus operator. % 300 rounds the time down to the last 300 seconds or 5 minutes.

There are other uncacheable MySQL functions, too, like RAND.

Outgrowing Your Cache
Outgrowing your MySQL cache can also make your website appear to slow down. The more posts, pages, categories, products, articles and so on that you have on your website, the more related queries there will be. Take a look at this example:
1 SELECT * FROM articles WHERE publisheddate > '2011-01-17 17:00' AND categoryid=12

It could be that when your website had 500 categories, queries like this one all fit in the cache together and all returned in milliseconds. But with 1000 regularly visited categories, they keep knocking each other out of the cache and returning much slower. In this case, increasing the size of the cache might help. But giving more server RAM to your cache means spending less on other tasks, so consider this carefully. Plenty of advice is available about turning on and improving the efficiency of your cache by setting server variables.

When Caching Doesn’t Help
A cache is invalidated whenever a table changes. When a row is inserted, updated or deleted, all queries relying on that table are effectively cleared from the cache. So, if your articles table is updated every time someone views an article (perhaps to count the number of views), then the improvement suggested above might not help much.

In such cases, you may want to investigate an application-level cacher, such as Memcached, or read the next section for ideas on making your own ad-hoc cache. Both require much bigger programming changes than discussed up to now.

Making Your Own Cache
If a particularly viscous database query takes ages but the results don’t change often, you can cache the results yourself.

Let’s say you want to show the 20 most popular articles on your website in the last week, using an advanced formula that takes into account searches, views, saves and “Send to a friend” hits. And you want to show these on your home page in an unordered (ul) HTML list.

It might be easiest to use PHP to run the database query once an hour or once a day and save the full list to a file somewhere, which you can then include on your home page.

Once you have written the PHP to create the include file, you could take one of a couple approaches to scheduling it. You could use your server’s scheduler (in Plesk 8, go to Server → Scheduled Tasks) to call a PHP page every hour, with a command like this:
1 wget -O /dev/null -q http://www.mywebsite.co.uk/runhourly.php

Alternatively, you could get PHP to check whether the file is at least an hour old before running the query — something like this, where 3600 is the number of seconds in an hour:
1 $filestat = stat ('includes/complicatedfile.html');
2 //look up information about the file
3 if ($filestat['mtime'] < time()-3600) RecreateComplicatedIncludeFile();
4 //over 1 hour
5 readfile ('includes/complicatedfile.html');
6 //include the file into the page

Returning to the involved example above for “Customers who bought this product also bought…,” you could also cache items in a new database column (or table). Once a week or so, you could run that long set of queries for each and every product, to figure out which other products customers are buying. You could then store the resulting product ID numbers in a new database column as a comma-separated list. Then, when you want to select the other products bought by customers who bought the product with the ID 12, you can run this query:
1 SELECT * FROM products WHERE FIND_IN_SET(12,otherproductids);

Reducing The Number Of Queries By Using JOINs
Somewhere in the management and control area of your e-commerce website is probably a list of your orders with the names of the customers who made them.

This page might have a query like the following to find all completed orders (with a status value indicating whether an order has been completed):
1 SELECT * FROM orders WHERE status>1;

And for each order it comes across, it might look up the customer’s details:
1 SELECT * FROM customers WHERE id=1;
2 SELECT * FROM customers WHERE id=2;
3 SELECT * FROM customers WHERE id=3;
4 etc

If this page shows 100 orders at a time, then it has to run 101 queries. And if each of those customers looks up their delivery address in a different table, or looks for the total charge for all of their orders, then the time delay will start to add up. You can make it much faster by combining the queries into one using a JOIN. Here’s what a JOIN looks like for the queries above:
1 SELECT * FROM orders INNER JOIN customers
2 ON orders.customerid = customers.id WHERE orders.status>=1;

Here is another way to write this, without the word JOIN:
1 SELECT * FROM orders, customers
2 WHERE orders.customerid = customers.id AND orders.status>=1;

Restructuring queries to use JOINs can get complicated because it involves changing the accompanying PHP code. But if your slow page runs thousands of database statements, then it may be worth a look. For further information, Wikipedia offers a good explanation of JOINs. The columns with which you use a JOIN (customerid in this case) are also prime candidates for being INDEXed.

You could also ask MySQL to EXPLAIN a database query. This tells you which tables it will use and provides an “execution plan.” Below is a screenshot showing the EXPLAIN statement being used on one of the more complex WordPress queries from above:Using the EXPLAIN statement to explain how MySQL plans to deal with a complex query.
The screenshot shows which tables and indices are being used, the JOIN types, the number of rows analyzed, and a lot more information. A comprehensive page on the MySQL website explains what the EXPLAIN explains, and another much shorter page goes over how to use that information to optimize your queries (by adding indices, for instance).

…Or Just Cheat
Finally, returning again to the advanced example above for “Customers who bought this product also bought…,” you could also simply change the functionality to be something less complicated for starters. You could call it “Recommended products” and just return a few other products from the same category or return some hand-picked recommendation.

Conclusion
This article has shown a number of techniques for improving database performance, ranging from simple to quite complex. While all well-built websites should already incorporate most of these techniques (particularly the database indices and JOINs), the techniques do get overlooked.

There is also a lot of debate on forums around the Web about the effectiveness and reliability of some of these techniques (i.e. measuring speed, indexing, optimization, how best to use the cache, etc.), so the advice here is not definitive, but hopefully it gives you an overview of what’s available.

If your website starts to mysteriously slow down after a few months or years, you will at least have a starting point for figuring out what’s wrong. Sphere: Related Content