Mostrando entradas con la etiqueta Web Developers. Mostrar todas las entradas
Mostrando entradas con la etiqueta Web Developers. Mostrar todas las entradas

27/3/11

Speeding Up Your Website’s Database

Website speed has always been a big issue, and it has become even more important since April 2010, when Google decided to use it in search rankings. However, the focus of the discussion is generally on minimizing file sizes, improving server settings and optimizing CSS and Javascript.

The discussion glosses over another important factor: the speed with which your pages are actually put together on your server. Most big modern websites store their information in a database and use a language such as PHP or ASP to extract it, turn it into HTML and send it to the Web browser.

So, even if you get your home page down to 1.5 seconds (Google’s threshold for being considered a “fast” website), you can still frustrate customers if your search page takes too much time to respond, or if the product pages load quickly but the “Customer reviews” delay for several seconds.Google’s threshold for a fast-loading website is about 1.5 seconds. This screenshot comes from Google Webmaster Tools (go to [domain name] → Diagnostics → Site Performance).

This article looks at these sorts of issues and describes some simple ways to speed up your website by optimizing your database. It starts with common knowledge but includes more complex techniques at the end, with links to further reading throughout. The article is intended for fearless database beginners and designers who have been thrown in at the deep end.

What Is A Database? What Is SQL?
A database is basically a collection of tables of information, such as a list of customers and their orders. It could be a filing cabinet, a bunch of spreadsheets, a Microsoft Access file or Amazon’s 40 terabytes of book and customer data.

A typical database for a blog has tables for users, categories, posts and comments. WordPress includes these and a few other starter tables. A typical database for an e-commerce website has tables for customers, products, categories, orders and order items (for the contents of shopping baskets). The open-source e-commerce software Magento includes these and many others. Databases have many other uses — such as for content management, customer relations, accounts and invoicing, and events — but these two common types (i.e. for a blog and an e-commerce website) will be referenced throughout this article.

Some tables in a database are connected to other tables. For example, a blog post can have many comments, and a customer can make multiple orders (these are one-to-many relationships). The most complicated type of database relationship is a many-to-many relationship. One relationship is at the core of all e-commerce databases: an order can contain many products, and a single product can be added to many different orders. This is where the “order items” table comes in: it sits between the products and the orders, and it records every time a product is added to an order. This will be relevant later on in the article, when we look at why some database queries are slow.

The word database also refers to the software that contains all this data, as in “My database crashed while I was having breakfast,” or “I really need to upgrade my database.” Popular database software include Microsoft Access 2010, Microsoft SQL Server, MySQL, PostgreSQL and Oracle Database 11g.

The acronym SQL comes up a lot when dealing with databases. It stands for “structured query language” and is pronounced “sequel” or “es-cue-el.” It’s the language used to ask and tell a database things — exciting things like SELECT lastname FROM customers WHERE city='Brighton'. This is called a database query because it queries the database for data. There are other types of database statements: INSERT for putting in new data, UPDATE for updating existing data, DELETE for deleting things, CREATE TABLE for creating tables, ALTER TABLE and many more.

How Can A Database Slow Down A Website?
A brand new empty website will run very fast, but as it grows and ages, you may notice some sluggishness on certain pages, particularly pages with complicated bits of functionality. Suppose you wanted to show “Customers who bought this product also bought…” at the bottom of a page of products. To extract this information from the database, you would need to do the following:

1.Start with the current product,
2.See how many times the product has recently been added to anyone’s shopping basket (the “order items” table from above),
3.Look at the orders related to those shopping baskets (for completed orders only),
4.Find the customers who made those orders,
5.Look at other orders made by those customers,
6.Look at the contents of those orders’ baskets (the “order items” again),
7.Look up the details of those products,
8.Identify the products that appear the most often and display them.
You could, in fact, do all of that in one massive database query, or you could split it up over several different queries. Either way, it might run very quickly when your database has 20 products, 12 customers, 18 orders and 67 order items (i.e. items in shopping baskets). But if it is not written and programmed efficiently, then it will be a lot slower with 500 products, 10,000 customers, 14,000 orders and 100,000 order items, and it will slow down the page.

This is a very complicated example, but it shows what kind of stuff goes on behind the scenes and why a seemingly innocuous bit of functionality can grind a website to a halt.

A website could slow down for many other reasons: the server running low on memory or disc space; another website on the same server consuming resources; the server sending out a lot of emails or churning away at some other task; a software, hardware or network fault; a misconfiguration. Or it may have suddenly become a popular website. The next two sections, therefore, will look at speed in more detail.

Is It My Database?
There are now several ways to analyze your website’s speed, including the Firebug plug-in for Firefox, the developer tools in Google Chrome (press Shift + Control + I, and then go to Resources → Enable Resource Tracking) and Yahoo YSlow. There are also websites such as WebPagetest, where you can enter a URL, and it will time it from your chosen location.

All of these tools will show you a diagram of all of the different resources (HTML, images, CSS and JavaScript files) used by your page, along with how long each took to load. They will also break down the time taken to perform a DNS lookup (i.e. to convert your domain name into an IP address), the time taken to connect to your server, the time spent waiting for your server to reply (aka “time to first byte”), and the time spent receiving (i.e. downloading) the data.

Many Web pages are constructed in their entirety by the Web server, including by PHP that accesses the database, and then sent to the browser all at once, so any database delays would lead to a long waiting time, and the receiving/downloading time would be proportional to the amount of data sent. So, if your 20 kB HTML page has a quick connection, a waiting time of 5 seconds and a download time of 0.05 seconds, then the delay would occur on the server, as the page is being built.

Not all Web pages are like this, though. The PHP flush function forces the browser to send the HTML that it has already built to the browser right away. Any further delays would then be in the receiving time, rather than the waiting time.

Either way, you can compare the waiting/receiving time for your suspected slow and complicated Web page to the waiting time for a similarly sized HTML page (or image or other static resource) on the same server at the same time. This would rule out the possibility of a slow Internet connection or an overloaded server (both of which would cause delays) and allow you to compare the times taken to construct the pages. This is not an exact science, but it should give you some indication of where things are being held up.

The screenshots below show the analysis provide by Google Chrome’s Developer Tools of a 20 kB Web page versus a 20 kB image. The Web page waited 130 milliseconds (ms) and downloaded for 22 ms. The image waited for 51 ms and downloaded for 11 ms. The download/receiving times are about the same, as expected, but the server is spending about 80 ms extra on processing and constructing the Web page, which entails executing the PHP and calling the database.

When performing these tests, analyze the static resource by itself and click “Refresh,” so that you are not getting a quick cached version. Also, run each a few times to ensure that you’re not looking at a statistical anomaly. The third screenshot below shows that WebPagetest indicates almost double the time of Google for the same page at the same time, demonstrating that using the same environment for all tests is important.Resource analysis using Google Chrome’s Developer Tools, showing a 130-ms wait time for a Web page.The same tool, showing a 51-ms wait time for an image of about the same size.Resource analysis of the same page from WebPagetest, with a 296-ms wait time and a 417-ms total time.

How To Time A Database Query In PHP And MySQL
The approach above was general; we can now get very specific. If you suspect that your database might be slowing down your website, then you need to figure out where the delay is coming from. I will define a couple of timing functions, and then use them to time every single database query that is run by a page. The code below is specific to PHP and MySQL, but the method could be used on any database-driven website:
1. function StartTimer ($what='') {
2. global $MYTIMER; $MYTIMER=0; //global variable to store time
3. //if ($_SERVER['REMOTE_ADDR'] != '127.0.0.1') return; //only show for my IP address
4.
5. echo '

';
6. echo "About to run $what. "; flush(); //output this to the browser
7. //$MYTIMER = microtime (true); //in PHP5 you need only this line to get the time
8.
9. list ($usec, $sec) = explode (' ', microtime());
10. $MYTIMER = ((float) $usec + (float) $sec); //set the timer
11. }
12. function StopTimer() {
13. global $MYTIMER; if (!$MYTIMER) return; //no timer has been started
14. list ($usec, $sec) = explode (' ', microtime()); //get the current time
15. $MYTIMER = ((float) $usec + (float) $sec) - $MYTIMER; //the time taken in milliseconds
16. echo 'Took ' . number_format ($MYTIMER, 4) . ' seconds.

'; flush();
17. }


StartTimer starts the timer and also prints whatever you are trying to time. The second line is a check of your IP address. This is very useful if you are doing this (temporarily) on a live website and don’t want everyone in the world to see the timing messages. Uncomment the line by removing the initial //, and replace the 127.0.0.1 with your IP address. StopTimer stops the timer and displays the time taken.

Most modern websites (especially well-programmed open-source ones) have a lot of PHP files but query the database in only a handful of places. Search through all of the PHP files for your website for mysql_db_query or mysql_query. Many software development packages such as BBEdit have functions to perform searches like this; or, if you are familiar with the Linux command line, try this:
grep mysql_query `find . -name \*php`

You may find something like this:
view sourceprint?
1 mysql_query ($sql);

For WordPress 3.0.4, this is on line 1112 of the file wp-includes/wp-db.php. You can copy and paste the functions above into the top of this file (or into any PHP file that is included by every page), and then add the timer before and after the mysql_query line. It will look like this:
1.StartTimer ($query);
2. $this->result = @mysql_query( $query, $dbh );
3. StopTimer();

Below is a partial screenshot of this being done on a brand new WordPress installation. It is running about 15 database queries in total, each taking about 0.0003 seconds (0.3 ms); so, less than 5 ms in total, which is to be expected for an empty database.
This shows and times all of the database queries that WordPress runs.
If you have found this line in other commonly used systems, please share this information by adding to the comments for this article.

You can also do other interesting things with it: you can see how fast your computer is compared to mine. Counting to 10 million takes my computer 2.9420 seconds. My Web server is a bit faster at 2.0726 seconds:
1. StartTimer ('counting to 10000000');
2. for ($i=0; $i<10000000; $i++); //count to a high number
3. StopTimer();

Notes on the Results
This technique gives you only comparative results. If your server was very busy at that moment, then all of the queries would be slower than normal. But you should have at least been able to determine how long a fast query takes on your server (maybe 1 to 5 ms), and therefore identify the slow-ish ones (200+ ms) and the really slow ones (1+ second). You can run the test a few times over the course of an hour or day (but not immediately after — see the section below about the database cache) to make sure you’re not getting a fluke.

This will also most likely severely mess up the graphical presentation of the page. It may also give you PHP warnings like “Cannot modify header information. Headers already sent by…” This is because the timing messages are interfering with cookie and session headers. As long as the page still displays below the warnings, you can ignore them. If the page does not display at all, then you may need to put the StartTimer and StopTimer around specific blocks of code, rather than around mysql_query.

This technique is essentially a quick hack to show some rough results. It should not be left on a live website.

What Else Could It Be?
If your database queries are not particularly slow, but the construction of your Web page is, then you might just have poorly written code. You can put the timer statements above around bigger and bigger blocks of code to see if and where the delay is occurring. It could be that you are looping through 10,000 full rows of product information, even if you are displaying only 20 product names.

Profiling
If you are still baffled and/or want more complete and accurate information about what’s happening in your code, you could try a debugging and profiling tool such as Xdebug, which analyzes a local copy of your website. It can even visually show where bottlenecks are occurring.

Indexing Database Tables
The experiment above may have surprised you by showing just how many database queries a page on your website is running, and hopefully, it has helped you identify particularly slow queries.

Let’s look now at some simple improvements to speed things up. To do this, you’ll need a way to run database queries on your database. Many server administration packages (like cPanel or Plesk) provide phpMyAdmin for this task. Alternatively, you could upload something like phpMiniAdmin to your website; this single PHP file enables you to look at your database and run queries. You’ll need to enter your database name, user name and password. If you don’t know these, you can usually find them in your website’s configuration file, if it has one (in WordPress, it’s wp-config.php).

Among the database queries that your page runs, you probably saw a few WHERE conditions. This is SQL’s way of filtering out results. For instance, if you are looking at an “Account history” type of page on your website, there is probably a query like this to look up all of the orders someone has placed. Something like this:
1 SELECT * FROM orders WHERE customerid = 2;

This retrieves all orders placed by the customer with the database ID 2. On my computer, with 100,000 orders in the database, running this took 0.2158 seconds.

Columns like customerid — which deal with a lot of WHERE conditions with = or < or > and have many possible values, should be indexed. This is like the index at the back of a book: it helps the database quickly retrieve indexed data. This is one of the quickest ways to speed up database queries.

What to Index
In order to know which columns to index, you need to understand a bit about how your database is being used. For example, if your website is often used to look up categories by name or events by date, then these columns should be indexed.
1 SELECT * FROM categories WHERE name = 'Books';
2 SELECT * FROM events WHERE startdate >= '2011-02-07';

Each of your database tables should already have an ID column (often called id, but sometimes ID or articleid or the like) that is listed as a PRIMARY KEY, as in the wp_posts screenshot below. These PRIMARY KEYs are automatically indexed. But you should also index any columns that refer to ID numbers in other tables, such as customerid in the example above. These are sometimes referred to as FOREIGN KEYs.
1. SELECT * FROM orders WHERE customerid = 2;
2. SELECT * FROM orderitems WHERE orderid = 231;

If a lot of text searches are being done, perhaps for descriptions of products or article content, then you can add another type of index called a FULL TEXT index. Queries using a FULL TEXT index can be done over multiple columns and are initially configured to work only with words of four or more letters. They also exclude certain common words like about and words that appear in more than 50% of the rows being searched. However, to use this type of index, you will need to change your SQL queries. Here is a typical text search, the first without and the second with a FULL TEXT index:
1. SELECT * FROM products WHERE name LIKE '%shoe%' OR description LIKE '%shoe%';
2. SELECT * FROM products WHERE MATCH(name,description) AGAINST ('shoe');

It may seem that you should go ahead and index everything. However, while indexing speeds up SELECTs, it slows down INSERTs, UPDATEs and DELETEs. So, if you have a products table that hardly ever changes, you can be more liberal with your indexing. But your orders and order items tables are probably being modified constantly, so you should be more sparing with them.

There are also cases where indexing may not help; for example, if most of the entries in a column have the same value. If you have a stock_status column that stores a value of 1 for “in stock,” and 95% of your products are in stock, then an index wouldn’t help someone search for in-stock products. Imagine if the word the was indexed at the back of a reference book: the index would list almost every page in the book.
1. SELECT * FROM products WHERE stock_status = 1;

How to Index
Using phpMyAdmin or phpMiniAdmin, you can look at the structure of each database table and see whether the relevant columns are already indexed. In phpMyAdmin, click the name of the table and browse to the bottom where it lists “Indexes.” In phpMiniAdmin, click “Show tables” at the top, and then “sct” for the table in question; this will show the database query needed to recreate the table, which will include any indices at the bottom — something like KEY 'orderidindex' ('orderid').Using phpMiniAdmin to check for indices in the WordPress wp_posts table.

If the index does not exist, then you can add it. In phpMyAdmin, below the index, it says “Create an index on 1 columns”; click “Go” here, enter a useful name for the index (like customeridindex), choose the column on the next page, and press “Save,” as seen in this screenshot:Indexing a column using phpMyAdmin.

In phpMiniAdmin, you’ll have to run the following database statement directly in the large SQL query box at the top:
1 ALTER TABLE orders ADD INDEX customeridindex (customerid);

Running the query again after indexing takes only 0.0019 seconds on my computer, 113 times faster.

Adding a FULL TEXT index is a similar process. When you run searches against this index, you must list the same columns:
1 ALTER TABLE articles ADD FULLTEXT(title,author,articletext);
2 SELECT * FROM articles WHERE MATCH(title,author,articletext) AGAINST ('mysql');

Back-Ups and Security
Before altering your database tables in any way, make a back-up of the whole database. You can do this using phpMyAdmin or phpMiniAdmin by clicking “Export.” Especially if your database contains customer information, keep the back-ups in a safe place. You can also use the command mysqldump to back up a database via SSH:
1 mysqldump --user=myuser --password=mypassword
2 --single-transaction --add-drop-table mydatabase
3 > backup`date +%Y%e%d`.sql

These scripts also represent a security risk, because they make it much easier for someone to steal all of your data. While phpMyAdmin is often provided securely though your server management software, phpMiniAdmin is a single file that is very easy to upload and forget about. So, you may want to password-protect it or remove it after usage.

Optimizing Tables
MySQL and other kinds of database software have built-in tools for optimizing their data. If your tables get modified a lot, then you can run the tools regularly to make the database tables smaller and more efficient. But they take some time to run (from a few seconds to a few minutes or more, depending on the size of the tables), and they can block other queries from running on the table during optimization, so doing this at a non-busy time is best. There’s also some debate about how often to optimize, with opinions ranging from never to once in a while to weekly.

To optimize a table, run database statements such as the following in phpMyAdmin or phpMiniAdmin:
1 OPTIMIZE TABLE orders;

For example, before I optimized my orders table with 100,000 orders, it was 31.2 MB in size and took 0.2676 seconds to run SELECT * FROM orders. After its first ever optimization, it shrunk to 30.8 MB and took only 0.0595 seconds.

The PHP function below will optimize all of the tables in your database:
1 function OptimizeAllTables() {
2 $tables = mysql_query ('SHOW TABLES'); //get all the tables
3 while ($table = mysql_fetch_array ($tables))
4 mysql_query ('OPTIMIZE TABLE ' . $table[0]); //optimize them
5 }

Before calling this function, you have to connect to your database. Most modern websites will connect for you, so you don’t need to worry about it, but the relevant MySQL calls are shown here for the sake of completeness:
1 mysql_connect (DB_HOST, DB_USER, DB_PASSWORD);
2 mysql_select_db (DB_NAME);
3 OptimizeAllTables();

Making Sure To Use The Cache
Just as a Web browser caches copies of pages you visit, database software caches popular queries. As above, the query below took 0.0019 seconds when I ran it the first time with an index:
1 SELECT * FROM orders WHERE customerid=2;

Running the same query again right away takes only 0.0004 seconds. This is because MySQL has remembered the results and can return them a second time without looking them up again.

However, many news websites and blogs might have queries like the following to ensure that articles are displayed only after their published date:
1 SELECT * FROM posts WHERE publisheddate > CURDATE();
2 SELECT * FROM articles WHERE publisheddate > NOW();

These queries cannot be cached because they depend on the current time or date. In a table with 100,000 rows, a query like the one above would take about 0.38 seconds every time I run it against an unindexed column on my computer.

If these queries are run on every page of your website, thousands of times per minute, it would speed things up considerably if they were cacheable. You can force queries to use the cache by replacing NOW or CURDATE with an actual time, like so:
1 SELECT * FROM articles WHERE publisheddate > '2011-01-17 17:00';

You can use PHP to make sure the time changes every five minutes or so:
1 $time = time();
2 $currenttime = date ('Y-m-d H:i', $time - ($time % 300));
3 mysql_query (“SELECT * FROM articles WHERE publisheddate > '$currenttime'”);

The percentage sign is the modulus operator. % 300 rounds the time down to the last 300 seconds or 5 minutes.

There are other uncacheable MySQL functions, too, like RAND.

Outgrowing Your Cache
Outgrowing your MySQL cache can also make your website appear to slow down. The more posts, pages, categories, products, articles and so on that you have on your website, the more related queries there will be. Take a look at this example:
1 SELECT * FROM articles WHERE publisheddate > '2011-01-17 17:00' AND categoryid=12

It could be that when your website had 500 categories, queries like this one all fit in the cache together and all returned in milliseconds. But with 1000 regularly visited categories, they keep knocking each other out of the cache and returning much slower. In this case, increasing the size of the cache might help. But giving more server RAM to your cache means spending less on other tasks, so consider this carefully. Plenty of advice is available about turning on and improving the efficiency of your cache by setting server variables.

When Caching Doesn’t Help
A cache is invalidated whenever a table changes. When a row is inserted, updated or deleted, all queries relying on that table are effectively cleared from the cache. So, if your articles table is updated every time someone views an article (perhaps to count the number of views), then the improvement suggested above might not help much.

In such cases, you may want to investigate an application-level cacher, such as Memcached, or read the next section for ideas on making your own ad-hoc cache. Both require much bigger programming changes than discussed up to now.

Making Your Own Cache
If a particularly viscous database query takes ages but the results don’t change often, you can cache the results yourself.

Let’s say you want to show the 20 most popular articles on your website in the last week, using an advanced formula that takes into account searches, views, saves and “Send to a friend” hits. And you want to show these on your home page in an unordered (ul) HTML list.

It might be easiest to use PHP to run the database query once an hour or once a day and save the full list to a file somewhere, which you can then include on your home page.

Once you have written the PHP to create the include file, you could take one of a couple approaches to scheduling it. You could use your server’s scheduler (in Plesk 8, go to Server → Scheduled Tasks) to call a PHP page every hour, with a command like this:
1 wget -O /dev/null -q http://www.mywebsite.co.uk/runhourly.php

Alternatively, you could get PHP to check whether the file is at least an hour old before running the query — something like this, where 3600 is the number of seconds in an hour:
1 $filestat = stat ('includes/complicatedfile.html');
2 //look up information about the file
3 if ($filestat['mtime'] < time()-3600) RecreateComplicatedIncludeFile();
4 //over 1 hour
5 readfile ('includes/complicatedfile.html');
6 //include the file into the page

Returning to the involved example above for “Customers who bought this product also bought…,” you could also cache items in a new database column (or table). Once a week or so, you could run that long set of queries for each and every product, to figure out which other products customers are buying. You could then store the resulting product ID numbers in a new database column as a comma-separated list. Then, when you want to select the other products bought by customers who bought the product with the ID 12, you can run this query:
1 SELECT * FROM products WHERE FIND_IN_SET(12,otherproductids);

Reducing The Number Of Queries By Using JOINs
Somewhere in the management and control area of your e-commerce website is probably a list of your orders with the names of the customers who made them.

This page might have a query like the following to find all completed orders (with a status value indicating whether an order has been completed):
1 SELECT * FROM orders WHERE status>1;

And for each order it comes across, it might look up the customer’s details:
1 SELECT * FROM customers WHERE id=1;
2 SELECT * FROM customers WHERE id=2;
3 SELECT * FROM customers WHERE id=3;
4 etc

If this page shows 100 orders at a time, then it has to run 101 queries. And if each of those customers looks up their delivery address in a different table, or looks for the total charge for all of their orders, then the time delay will start to add up. You can make it much faster by combining the queries into one using a JOIN. Here’s what a JOIN looks like for the queries above:
1 SELECT * FROM orders INNER JOIN customers
2 ON orders.customerid = customers.id WHERE orders.status>=1;

Here is another way to write this, without the word JOIN:
1 SELECT * FROM orders, customers
2 WHERE orders.customerid = customers.id AND orders.status>=1;

Restructuring queries to use JOINs can get complicated because it involves changing the accompanying PHP code. But if your slow page runs thousands of database statements, then it may be worth a look. For further information, Wikipedia offers a good explanation of JOINs. The columns with which you use a JOIN (customerid in this case) are also prime candidates for being INDEXed.

You could also ask MySQL to EXPLAIN a database query. This tells you which tables it will use and provides an “execution plan.” Below is a screenshot showing the EXPLAIN statement being used on one of the more complex WordPress queries from above:Using the EXPLAIN statement to explain how MySQL plans to deal with a complex query.
The screenshot shows which tables and indices are being used, the JOIN types, the number of rows analyzed, and a lot more information. A comprehensive page on the MySQL website explains what the EXPLAIN explains, and another much shorter page goes over how to use that information to optimize your queries (by adding indices, for instance).

…Or Just Cheat
Finally, returning again to the advanced example above for “Customers who bought this product also bought…,” you could also simply change the functionality to be something less complicated for starters. You could call it “Recommended products” and just return a few other products from the same category or return some hand-picked recommendation.

Conclusion
This article has shown a number of techniques for improving database performance, ranging from simple to quite complex. While all well-built websites should already incorporate most of these techniques (particularly the database indices and JOINs), the techniques do get overlooked.

There is also a lot of debate on forums around the Web about the effectiveness and reliability of some of these techniques (i.e. measuring speed, indexing, optimization, how best to use the cache, etc.), so the advice here is not definitive, but hopefully it gives you an overview of what’s available.

If your website starts to mysteriously slow down after a few months or years, you will at least have a starting point for figuring out what’s wrong. Sphere: Related Content

9/10/10

How Good C# Habits can Encourage Bad JavaScript Habits: Part 1

This is the first post in a multi-part series covering common mistakes C# developers tend to make when they first start writing JavaScript.

Introduction
Many people come to jQuery and believe that their knowledge of a previous classical language (C#, Java, etc) will help them be successful at client-side scripting. You can use your classical language skills to accomplish a large amount of functionality with jQuery. However, the more client-side code you write you will find yourself uncovering strange bugs because you didn’t take adequate time to learn JavaScript properly.
“…it turns out that if you have absolutely no idea what you’re doing in the language you can still generally make things work.” –Douglas Crockford, Yahoo!’s JavaScript Architect, Douglas on JavaScript -- Chapter 2: And Then There Was JavaScript

This article is targeted for developers that use jQuery but haven’t invested the time necessary to learn JavaScript. The intent is to help you avoid some common mistakes when moving from a classical language (Java, C#, etc) to JavaScript.

jQuery is a library that is written in JavaScript. It is important to remember that you will always be writing in JavaScript when using jQuery. It is inevitable that you will run into native JavaScript concepts that are foreign to the classical language proficient developer. Taking the time now to be proficient in JavaScript will increase your client-side code quality, efficiency, and decrease code maintenance.

Bad Practices
1. Having Variables & Functions in Global Scope
A best practice in C# is to limit the use of global variables. This doesn’t necessarily translate into a bad practice in JavaScript, but most C# developers are not aware of how easily it is to pollute the global namespace with needless variables and functions.

One way you can pollute the global namespace is to not declare your variables. In JavaScript, if you don’t declare your variables then they become globally available to the rest of the program, which is probably not what you wanted and generally a bad idea.

Coming from a C# background, you are most likely used to the concept of namespacing your classes so they won’t clash with other libraries. In the same way you should be mindful to not declare variables or functions in the global namespace to prevent clashing with other libraries, browser extensions, and also your own code.

Bad Practice
The following code shows some bad examples of how to declare variables and functions.
var iAmGlobal = "Keep these to a minimum";

iAmGlobalToo = "Bad";

function oldSchoolWay() {
//Placed in global scope when executed
iAmGlobalNooo = "Really Bad";

console.log( "Bad oldSchoolWay Function" );
}

//Bad way to prevent namespace clashing
function namespace_oldSchoolWay() {
console.log( "Worse oldSchoolWay Function" );
}

//Functions
window.oldSchoolWay();
window.namespace_oldSchoolWay();

//Global variables are available off window object
console.log( window.iAmGlobal );
console.log( window.iAmGlobalToo );
console.log( window.iAmGlobalNooo );

You can execute and modify the above code from jsFiddle.

In the above code snippet you’ll see that the iAmGlobal, iAmGlobalToo, and iAmGlobalNooo are all global variables. You would expect iAmGlobal and probably iAmGlobalToo to be global variables, but the variable inside the oldSchoolWay function is also global because it was never declared! The best way to stay out of this trouble is to be disciplined and declare all of your variables and not let JavaScript do it for you.

Klaus Komenda actually refers to the function statement as the Old School Way of function declaration. I’ve even seen C# developers trying to prefix their function names (as shown above by namespace_oldSchoolWay) to minimize collisions with other libraries, but you should not do this. That technique still clutters the global namespace.

There are many options to fix this problem, but the simplest is to create a namespace object and then declare all your functions and properties within it. See the following code snippet for a slightly better approach.

Object Literal (All Public)
An Object Literal is a convenient way to create new objects. The syntax looks very similar to what you might expect to see in JSON (JavaScript Object Notation), but they are actually quite different. For more information about this you can check out an in-depth discussion on it by Ben Alman entitled There's no such thing as a "JSON Object".

When you create an Object Literal you can provide properties and methods and they can refer to each other. You can also add additional methods and properties to the object after it has been defined. All the properties and methods that you declare in the Object Literal are publicly accessible to the rest of your JavaScript code.
//Object Literal declaring properties and methods
var skillet = {
//public property
ingredient: "Bacon Strips",

//public method
fry: function() {
console.log( "Frying " + this.ingredient );
}
};
console.log( skillet.ingredient ); //Bacon Strips
skillet.fry(); //Frying Bacon Strips

//Adding a public property to an Object Literal
skillet.quantity = "12";
console.log( skillet.quantity ); //12

//Adding a public method to an Object Literal
skillet.toString = function() {
console.log( this.quantity + " " +
this.ingredient );
};
skillet.toString(); //12 Bacon Strips​

You can execute and modify the above code from jsFiddle.

The above code declares a skillet variable and sets it to an Object Literal with one property (ingredient) and one method (fry) that are both publicly accessible off of the object. You can also add additional public methods and properties to the object after the initial declaration as shown above when adding the quantity property and the toString method. The toString method is able to access the properties of the skillet object since all of it’s parts are public.

Pros
•Cleans up the global namespace by adding a namespace to your properties and methods
•You can add functionality to the object literal at a later point
•All the properties and methods are public

Cons
•Uses the Object Literal syntax to define your properties and methods that some may find cumbersome
•There isn’t the ability to have private properties or methods

Self-Executing Anonymous Function: Part 1 (All Private)
Another common technique that you can use to protect the global namespace is to use a Self-Executing Anonymous Function. This is just a fancy term to refer to a function with no name that is immediately executed after it’s defined. The value of this technique is that you can create a wrapper around your code that protects it from the global namespace. By using the most simple version of the Self-Executing Anonymous Function you can create code that is exclusively private.
//Self-Executing Anonymous Function: Part 1 (All Private)
(function() {
//private variable
var ingredient = "Bacon Strips";

//private function
function fry() {
console.log( "Frying " + ingredient );
}

fry();
}());

//Variables not accessible
console.log( window.ingredient ); //Undefined

//Functions not accessible
try {
window.fry(); //Throws Exception
} catch( e ) {
//Object [object DOMWindow] has no method 'fry'
console.log( e.message );
}

//Can't add additional functionality

You can execute and modify the above code from jsFiddle.

The above code snippet wraps the ingredient variable and fry function with a Self-Executing Anonymous Function and right before the function ends it executes the fry method. After the Self-Executing Anonymous Function finishes executing there is no way to run the function again or reference any of the internals of the wrapped code.

Pros
•Hides all your implementation from external code
•The function runs once and only once
•The code inside doesn’t use the Object Literal notation

Cons
•All information is hidden, which may not be what you want
•Slightly more complicated technique, but not really

Revealing Module Pattern (Public & Private)
The Revealing Module Pattern actually uses the concept of the Self-Executing Anonymous Function as well, but in this case we save the result into a variable. This pattern introduces the concept of a Closure. A closure is a way to keep variables alive after a function has returned. The Mozilla Developer Network has a great page explaining closures. In it they provide a core definition:
“A closure is a special kind of object that combines two things: a function, and the environment in which that function was created. The environment consists of any local variables that were in-scope at the time that the closure was created.”

In the following example we will declare private properties and methods as well as public properties and methods.
//Revealing Module Pattern (Public & Private)
var skillet = (function() {
var pub = {},
//Private property
amountOfGrease = "1 Cup";

//Public property
pub.ingredient = "Bacon Strips";

//Public method
pub.fry = function() {
console.log( "Frying " + pub.ingredient );
};

//Private method
function privateWay() {
//Do something...
}

//Return just the public parts
return pub;
}());

//Public Properties
console.log( skillet.ingredient ); //Bacon Strips

//Public Methods
skillet.fry();

//Adding a public property to a Module
skillet.quantity = 12;
console.log( skillet.quantity ); //12

//Adding a public method to a Module
skillet.toString = function() {
console.log( skillet.quantity + " " +
skillet.ingredient + " & " +
amountOfGrease + " of Grease" );
};

try {
//Would have been successful,
//but can't access private variable
skillet.toString();
} catch( e ) {
console.log( e.message ); //amountOfGrease is not defined
}

You can execute and modify the above code from jsFiddle.

Inside of the Revealing Module Pattern we declared a pub object that will keep all the public properties and methods that you want to be exposed. At the end of the Module the pub object is returned, which is what causes the them to be public outside of the Self-Executing Anonymous Function. All the parts that were not added to the pub object remain private to the outside world; however, the public methods that were exposed have access to the private parts because of the Closure that was created.

Pros
•Allows for public and private properties and methods
•The Technique is Easy to understand
•The code inside doesn’t use the Object Literal notation

Cons
•Doesn’t allow for a way to add private properties to be used in new public methods

Self-Executing Anonymous Function: Part 2 (Public & Private)
We looked at the Self-Executing Anonymous Function earlier as a pattern you could use to keep all your code private. As it turns out, you can actually modify the pattern somewhat so that you can achieve the same benefits of the Revealing Module Pattern. Not only can we achieve public and private properties and methods, but we can also provide an easy way to extend the namespace while keeping the content protected from the global namespace. In addition, the following pattern can protect the $ from conflicting with other JavaScript libraries and also protect undefined from being redefined.

Take a look at the following example, and we will walk through the code explaining the key changes to the pattern.
//Self-Executing Anonymous Func: Part 2 (Public & Private)
(function( skillet, $, undefined ) {
//Private Property
var isHot = true;

//Public Property
skillet.ingredient = "Bacon Strips";

//Public Method
skillet.fry = function() {
var oliveOil;

addItem( "\t\n Butter \n\t" );
addItem( oliveOil );
console.log( "Frying " + skillet.ingredient );
};

//Private Method
function addItem( item ) {
if ( item !== undefined ) {
console.log( "Adding " + $.trim(item) );
}
}
}( window.skillet = window.skillet || {}, jQuery ));

//Public Properties
console.log( skillet.ingredient ); //Bacon Strips

//Public Methods
skillet.fry(); //Adding Butter & Fraying Bacon Strips

//Adding a Public Property
skillet.quantity = "12";
console.log( skillet.quantity ); //12

//Adding New Functionality to the Skillet
(function( skillet, $, undefined ) {
//Private Property
var amountOfGrease = "1 Cup";

//Public Method
skillet.toString = function() {
console.log( skillet.quantity + " " +
skillet.ingredient + " & " +
amountOfGrease + " of Grease" );
console.log( isHot ? "Hot" : "Cold" );
};
}( window.skillet = window.skillet || {}, jQuery ));

try {
//12 Bacon Strips & 1 Cup of Grease
skillet.toString(); //Throws Exception
} catch( e ) {
console.log( e.message ); //isHot is not defined
}

You can execute and modify the above code from jsFiddle.

First, since we have a Self-Executing Anonymous Function, we can actually provide some parameters to pass to it when it executes. In this case we are passing 2 arguments to the anonymous function.

The first argument looks quite strange. What is window.skillet = window.skillet || {} doing? The code checks to see if skillet exists in the global namespace (window). If it does not exist, then window.skillet is assigned an empty object literal. Using this approach we can build a library across JavaScript files. If another script uses the same technique, then it will pass in the existing instance and append logic to it. Inside the Anonymous Function, if we want something to be public, then we append it to the skillet object. Any other properties or methods will be considered private.

The second argument passed in jQuery. The benefit of this is that the named parameter is referenced as $, which allows us to refer to jQuery as $ within the Anonymous Function without having to worry that it will conflict with the $ declared in other JavaScript libraries. This is a common practice that you will most likely run across when looking at well written jQuery code.

You might notice a 3rd parameter to the Anonymous Function called undefined. Why did we add that parameter and why didn’t we pass an argument to it? In JavaScript, you can unfortunately redefine what undefined means. Imagine that some code somewhere deep in one of your 3rd party libraries redefines undefined to something strange like true. If anywhere in your code you test against undefined, then you code will most likely not behave like you intended. In JavaScript, if you have a parameter that doesn’t have a matching argument, then it’s value is set as undefined. By using this trick, it can save us from the bad code someone wrote to redefine undefined.

Pros
•Gives you the ability to have public and private properties and methods
•The code inside doesn’t use the Object Literal notation
•Keeps undefined’s value as undefined in case someone overrode the value
•Allows you to use $ inside your code without worrying about clashing with other libraries
•Allows your library to grow across files using the “window.namespace = window.namespace || {}” technique
•A common pattern that you will see in many libraries, widgets, and plugins

Cons
•Slightly more complicated pattern, but not overly difficult to understand
If you are interested in digging deeper into some of the patterns I mentioned above then I encourage you to check out Klaus Komenda’s post entitled JavaScript Programming Patterns. The article provides an insightful view of how JavaScript patterns have changed over the years.

Best Practice
In JavaScript it is very important to declare all of your variables and it is considered a best practice to limit the number of objects in the global namespace. Depending on the situation you are developing you may need one or more of the concepts listed above to organize your code and keep it from colliding with other libraries. There is no pattern that is a Silver Bullet, but rather you should assess where you are at and examine the pros and cons of each pattern to address your situation.

2. Not Declaring Arrays & Objects Correctly
JavaScript is a prototypal language and not a classical language like C#. Although JavaScript does have a new operator which looks a lot like C#, you should not use it for creating new objects or arrays.

What Not To Do
You might be tempted to use the new keyword to create your objects in JavaScript.

//Bad way to declare objects and arrays
var person = new Object(),
keys = new Array();

The new keyword was added to the language partially to appease classical languages, but in reality it tends to confuse developers more than help them. Instead, there are native ways in JavaScript to declare objects and arrays and you should use those instead.

Preferred Way
Instead of using the previous syntax, you should instead declare your objects and arrays using their literal notation.

//Preferred way to declare objects and arrays
var person = {},
keys = [];

Using this pattern you actually have a lot of expressive power using Object Literals and array initializer syntax. As we mentioned earlier, you should be careful to know that Object Literals are not JSON.
//Preferred way to declare complex objects and arrays
var person = {
firstName: "Elijah",
lastName: "Manor",
sayFullName: function() {
console.log( this.firstName + " " +
this.lastName );
}
},
keys = ["123", "676", "242", "4e3"];


Best Practice
You should declare all of your variables using their literal notation instead of using the new operation. In a similar fashion you shouldn’t use the new keyword for Boolean, Number, String, or Function. All they do is add additional bloat and slow things down.

The only real reason why you would use the new keyword is if you are creating a new object and you want it to use a constructor that you defined. For more information on this topic you can check out Douglas Crockford's post entitled JavaScript, We Hardly new Ya.

Conclusion
This article has covered some of the common issues you might address coming from C# to JavaScript. I have committed just about everyone of the above Bad Practices myself when I starting to code in JavaScript heavily.

With the recent serge of jQuery hitting the market and other great JavaScript libraries, it is important that we as developers understand the language instead of just thinking that our previous language knowledge will get us by.

This article only covers some of the things you should know. The next post in the series will be published next week. In the meantime please feel free to add comments of any other gotchas that you’ve experience coming from a C# background. I would love to hear about it. Sphere: Related Content

2/7/10

7 Personality Types of Developers Today

Developers and programmers are meticulous individuals, and developers sometimes stand out even among themselves.

We introduced you to 7 types of designers in our article 7 Personality Types of Designers Today. Developers have peculiar traits and habits of their own. This article looks at 7 types of developers today and their defining characteristics.

“The best programmers are not marginally better than merely good ones. They are an order of magnitude better, measured by whatever standard: conceptual creativity, speed, ingenuity of design or problem-solving ability.”
—Randall E. Stross

Stereotyping is generally not good practice. But we’re not trying to squeeze individuals into categories. Rather, delineating these types can help you figure out where you stand and help you understand others.

1. The Self-Help Constructor
The self-help constructor does whatever it takes to get the job done with his experience and skill, no matter how limited.

For example, he may accomplish the job by finding open-source software and other free applications and tools. His best assets are his willingness to learn what he needs to complete the job and his ability to absorb the information like a sponge. He is resourceful, working with whatever is available to him.

Not every client will be impressed. Those who don’t know any better will praise his work, but the self-help constructor does not develop applications or plug-ins himself.

He merely exploits existing tools to construct something seemingly new for clients. With the wide range of sophisticated tools available today, this is becoming easier, but much less impressive.

2. The Experienced Old Man
He may not be the hippest guy in this energetic and creative field, but the experienced old man brings something valuable to the table: a wealth of knowledge and experience.

He may appear outdated, unable to keep up with the latest tools and technology, but he is wise and knows the basics like the back of his hand.

His battle stories of bygone days will fascinate and thrill. He may not be the fastest or most technologically savvy, but slow and steady wins the race, and he delivers the goods as he always has.

He proves that the old-school style of coding may be antique but isn’t extinct. He may not be your heaviest hitter, but in times of great need, you know you can count on the experienced old man to deliver.

3. The Hardcore Geek
Workaholic doesn’t begin to describe the hardcore geek, this martyr of developers. He goes beyond the call of duty to deliver the product and takes great pride in his work.

He spends his lunch hour at his desk working frantically to finish the project ahead of time. When he allows himself a little free time, he reads books, journal articles and the like to improve himself. Very much an introvert, he feels most comfortable in the world of code and programming jargon.

The more code the hardcore geek writes, the more content he feels. As great as he is with code, he makes for a much better worker bee than a leader.

4. The Scholarly Know-It-All
The scholarly know-it-all is a walking encyclopedia on programming. He can spend hours passionately discussing the history of a programming language or dissecting imperfect code.

He is the poet of the programming world, whose code is a work of art that can be appreciated and analyzed. Recursion is his middle name, and he tweaks every block of code to perfection, regardless of timelines or readability.

He sets high standards for himself, and his work sometimes complicates matters: a task that should take only an hour to complete takes him a few months. Mind you, he’s not incompetent. On the contrary, he is highly capable; but he makes work for himself by creating new tools and libraries and even reconstructing entirely new systems, all to meet his own standards.

He feels obliged to impart his knowledge to others and share his passion for the theory and technical intricacies of coding and programming. He tries his best to explain to clients why using state-of-the-art technology is so important. Every project is his precious child.

The scholarly know-it-all is great to have on your team, but be sure you can get him to spend his energy on the important details, rather than waste time satisfying his urge to delve into every nook and cranny.

5. The Ninja
The ninja is a man of few words and keeps to himself. While similar to the hardcore geek, he has more in his life than code and work.

He is an enigma: not outright friendly or forthcoming, but he works surprisingly well on a team. Everyone notices his tireless nature but can’t figure out how he does everything so well and so quickly. There is much evidence of his work but little evidence that he did it. “Show don’t tell” describes his modus operandi best.

Never outwardly frazzled (try as you might to throw him off), he resolves problems quickly and efficiently, regardless of time or place. The ninja’s stealth sends chills down your spine, and he leaves you wondering how he managed to accomplish his feat.

A lone ranger, he gets the job done regardless of his status on the team or his relationship with other members. His motto? Don’t have doubts; just resolve the problem quickly and efficiently. This no-nonsense attitude makes him an absolute joy to work with.

6. The Clever Ambassador
The clever ambassador is the face of the team. He is outspoken and the unofficial project manager. His knowledge of software development, project workflows and code theory is adequate, but he does very little of the actual programming or work.

He is quick to pick up leads and great at communicating with clients. He is the consummate ring-master, able to please both clients (the ferocious lions) and team members (the elephants that could easily trample him if they wanted).

In his supervisory role, the clever ambassador ensures that every project meets the requirements and satisfies the client. He is the go-between, representing the development team for the client and balancing client satisfaction with practicality.

Having to walk this tight rope, he often feels that he should be better compensated, despite never doing any heavy lifting (i.e. coding). He is the model who sits pretty in front of the camera selling the product, while the rest of the team (make-up artists, hair stylists, etc.) works behind the scenes, receiving lower payment for what amounts to the same work.

7. The Half-Cup Speedster
The half-cup speedster takes on multiple projects at once. He works much faster than most, but his amazing quantity is tarnished by its quality: his speed results from cutting corners and hacking core.

He feels that optimizing and checking code takes too long. His code is messy because he does not follow best practices and never makes use of object-oriented programming (OOP).

Amazingly, despite his code looking like a minefield, the product works just as intended. Cutting corners is generally not good practice, but in an impossible crunch, the half-cup speedster might be the person for the job.

Unfortunately, much like the handwriting of physicians, his code is practically indecipherable. Should someone need to fix a problem that surfaces later, they will surely encounter difficulties. You can’t fix what you can’t read or understand.


Written exclusively for Webdesigner Depot by Aidan Huang, a freelance front-end developer and designer. He is also an editor at Onextrapixel.

As we’ve seen, there are many types of developers in the field. Which do you most closely resemble? Have you met anyone who fits any of the categories mentioned here? Share your thoughts with us in the comments below… Sphere: Related Content