5/6/10

Streaming API for XML (StAX)

Según se lee en InternetNews.com, BEA Systems después de estar dos años desarrollando proyectos para optimizar el procesado de ficheros XML ha lanzado la primera versión de StAX ( Streaming API for XML ).

Este API busca solucionar los problemas de DOM y SAX para el procesado de XML, proporcionando un acceso pull que nos permite analizar únicamente la parte del documento XML que nos interesa, sin tener que crear complejas estructuras arbóreas del documento o sin tener que analizar el documento en su totalidad.

Podéis ver mucha más información en la página web de StAX. Sphere: Related Content

XMLBeans de BEA: manipulación de XML desde Java

BEA ha donado el código de XMLBeans a Apache. El código fuente ya esta disponible en el repositorio CVS.
XMLBeans procesa un XSD (XML Schema) para generar código Java que permite navegar y manipular el XML respetando las restricciones impuestas por el XSD concreto.
Más información: Getting Started with XMLBeans

Numerosa documentación sobre XMLBeans
A las puertas de XMLBeans 2.0, BEA Systems ha publicado cuatro tutoriales sobre mapeo de objetos y XML utilizando su framework Open Source en su portal de desarrolladores dev2dev.

Los artículos son:
XML Processing with Java Object Technology de Scott Ryan.
Strongly Typed XML in Java with XMLBeans de Cezar Cristian Andrei.
Leveraging Complex Schema Features in Java the XMLBeans Way de Raj Alagumalai y Raju Subramanian.
Using XMLBeans in Web Service Clients and User Interfaces de Steve Hanson.

Para todos los que utilizais esta fenomenal libreria, os serán muy utiles.

Enlaces relacionados: The Server Side

W3C pública las especificaciones de XSLT, XML Query y XPath 2.0
El W3C ha publicado una versión "release candidate" de las especificaciones de XSLT, XML Query y XPath 2.0. Éstas especificaciones, que suponen un cambio mayor en XSLT, XML Query y XPath, ahora ya se encuentran en un estado suficientemente maduro para comenzar con las implementaciones.

Aquí os dejo vínculos a las especificaciones:
XSLT y XQuery:
* XSL transformaciones (XSLT) versión 2.0
* XSLT 2.0 y XQuery 1.0 serialización
* XML sintaxis para XQuery 1.0 (XQueryX)
XQuery y XPath:
* XQuery 1.0: un lenguaje de consultas XML
* XML Path Language (XPath) 2.0
* XQuery 1.0 y XPath 2.0 modelo de datos (XDM)
* XQuery 1.0 y XPath 2.0 funciones y operadores
* XQuery 1.0 y XPath 2.0 semántica formal

Enlaces relacionados: W3C - All Standards and Drafts Sphere: Related Content

Tutorial extenso sobre XMLBeans

XMLBeans es una librería Open Source de XML data binding, cedida por BEA hace unos meses a la Apache Software Foundation.

Su funcionamiento es muy similar a JAXB y básicamente nos permitirá transformar jerarquías completas de objetos Java a XML y viceversa, utilizando como guía de mapeo el esquema o el DTD del documento XML.

La potencia de estos frameworks es impresionante y agilizan mucho el desarrollo de aplicaciones. Sin embargo, sin tutoriales adecuados su utilidad se ve reducida considerablemente.

En javaBoutique han publicado un extenso tutorial de ocho páginas donde explican con detalle como utilizar XMLBeans en nuestras aplicaciones.

Desde luego es un gran recurso para iniciarse en esta librería. Espero que os sea útil. Sphere: Related Content

Patrones de diseño XML

XML ha pasado durante los últimos años de ser una tecnología oscura a ser parte del día a día del desarrollador. Poco a poco, este lenguaje se ha ido introduciendo en nuestras vidas y ya cada vez son más pocos los desarrollos que no utilizan algo de XML, ya sea explícitamente o implícitamente en alguna librería de terceros.

Este auge, y la proliferación de tecnologías basadas en XML ( XQuery, XPath, XML Schema, ... ) hace que sean cada vez más necesarias una serie de guías que nos ayuden en nuestro trabajo diario.

Eso es lo que nos ofrece XML Patterns, una bibilioteca de patrones de diseño que nos ayudarán a controlar la estructura de nuestros esquemas, dtds, crear correctamente nuestros documentos XML, etc. Sphere: Related Content

4/6/10

Mathematics Software for Linux

Mathematics Packages:

Octave
GNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with Matlab. It may also be used as a batch-oriented language.

Octave has extensive tools for solving common numerical linear algebra problems, finding the roots of nonlinear equations, integrating ordinary functions, manipulating polynomials, and integrating ordinary differential and differential-algebraic equations. It is easily extensible and customizable via user-defined functions written in Octave's own language, or using dynamically loaded modules written in C++, C, Fortran, or other languages.

R-Project
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

bc
bc is an arbitrary precision numeric processing language. Syntax is similar to C, but differs in many substantial areas. It supports interactive execution of statements. bc is a utility included in the POSIX P1003.2/D11 draft standard.

Scilab
Scilab is a scientific software package for numerical computations providing a powerful open computing environment for engineering and scientific applications. It is developed since 1990 by researchers from INRIA and ENPC. Distributed freely via the Internet since 1994, Scilab is currently being used in educational and industrial environnments around the world.

Scilab includes hundreds of mathematical functions with the possibility to add interactively programs from various languages (C, Fortran...). It has sophisticated data structures (including lists, polynomials, rational functions, linear systems...), an interpreter and a high level programming language.

Yorick
Yorick is an interpreted programming language, designed for postprocessing or steering large scientific simulation codes. Smaller scientific simulations or calculations, such as the flow past an airfoil or the motion of a drumhead, can be written as standalone yorick programs. The language features a compact syntax for many common array operations, so it processes large arrays of numbers very efficiently. Unlike most interpreters, which are several hundred times slower than compiled code for number crunching, yorick can approach to within a factor of four or five of compiled speed for many common tasks. Superficially, yorick code resembles C code, but yorick variables are never explicitly declared and have a dynamic scoping similar to many Lisp dialects. The yorick language is designed to be typed interactively at a keyboard, as well as stored in files for later use. Yorick includes an interactive graphics package, and a binary file package capable of translating to and from the raw numeric formats of all modern computers.

Algae
Algae is an interpreted language for numerical analysis. Algae was developed because we needed a fast and versatile tool, capable of handling large problems. Algae has been applied to interesting dynamics problems in aerospace and related fields for more than a decade.

Yacas
YACAS is an easy to use, general purpose Computer Algebra System, a program for symbolic manipulation of mathematical expressions. It uses its own programming language designed for symbolic as well as arbitrary-precision numerical computations. The system has a library of scripts that implement many of the symbolic algebra operations; new algorithms can be easily added to the library. YACAS comes with extensive documentation (320+ pages) covering the scripting language, the functionality that is already implemented in the system, and the algorithms we used.

Rlab
Rlab is an interactive, interpreted scientific programming environment. Rlab is a very high level language intended to provide fast prototyping and program development, as well as easy data-visualization, and processing. Rlab is not a clone of languages such as those used by tools like Matlab or Matrix-X/Xmath. However, as Rlab focuses on creating a good experimental environment (or laboratory) in which to do matrix math, it can be called ``Matlab-like''; since the programming language possesses similar operators and concepts.

Euler
EULER is a program for quickly and interactively computing with real and complex numbers and matrices, or with intervals, in the style of MatLab, Octave,... It can draw and animate your functions in two and three dimensions.

Maxima
Maxima is a descendant of DOE Macsyma, which had its origins in the late 1960s at MIT. It is the only system based on that effort still publicly available and with an active user community, thanks to its open source nature. Macsyma was the first of a new breed of computer algebra systems, leading the way for programs such as Maple and Mathematica. This particular variant of Macsyma was maintained by William Schelter from 1982 until he passed away in 2001. In 1998 he obtained permission to release the source code under GPL. It was his efforts and skill which have made the survival of Maxima possible, and we are very grateful to him for volunteering his time and skill to keep the original Macsyma code alive and well. Since his passing a group of users and developers has formed to keep Maxima alive and kicking. Maxima itself is reasonably feature complete at this stage, with abilities such as symbolic integration, 3D plotting, and an ODE solver, but there is a lot of work yet to be done in terms of bug fixing, cleanup, and documentation. This is not to say there will be no new features, but there is much work to be done before that stage will be reached, and for now new features are not likely to be our focus.

JACAL
JACAL is an interactive symbolic mathematics program. JACAL can manipulate and simplify equations, scalars, vectors, and matrices of single and multiple valued algebraic expressions containing numbers, variables, radicals, and algebraic differential, and holonomic functions.


gTybalt
Symbolic calculations, carried out by computer algebra systems, have become an integral part in the daily work of scientists. The advance in algorithms and computer technology has led to remarkable progress in several areas of natural sciences. gTybalt was developed as a tool for certain kind of calculations. The characteristics of these calculations are: First of all, these tend to be "long" calculations, e.g. the system needs to process large amounts of data and efficiency in performance is a priority. Secondly, the algorithms for the solution of the problem are usually developed and implemented by the scientists themselves. This requires support from the computer algebra system for a programming language which allows to implement complex algorithms for abstract mathematical entities. In other words, it requires support of object oriented programming techniques from the system. On the other hand, these calculations usually do not require that the computer algebra system provides sophisticated tools for all branches of mathematics. Thirdly, despite the fact that these calculations process large amounts of data, the time needed for the implementation of the algorithms usually outweights the actual running time of the program. Therefore convenient development tools are also important.

Symaxx
Symaxx/2 is a graphical frontend for Maxima.

Singluar
SINGULAR is a Computer Algebra System for polynomial computations with special emphasis on the needs of commutative algebra, algebraic geometry, and singularity theory.

HartMath
HartMath is an experimental computer algebra system written in Java.

GiNaC
The name GiNaC is an iterated and recursive abbreviation for GiNaC is Not a CAS, where CAS stands for Computer Algebra System. It has been developed to become a replacement engine for xloops which is up to now powered by the Maple CAS. Its design is revolutionary in a sense that contrary to other CAS it does not try to provide extensive algebraic capabilities and a simple programming language but instead accepts a given language (C++) and extends it by a set of algebraic capabilities.

XLoops
Aim of this project is to provide a package that completely evaluates massive one- and two-loop Feynman diagrams to make calculations in high energy physics easier.

PARI-GP
PARI-GP is a software package for computer-aided number theory. It consists of a C library, libpari (with optional assembler cores for some popular architectures), and of the programmable interactive gp calculator. While you can write your own libpari-based programs, many people just start up a gp session, or have gp execute their scripts.

GRASS
GRASS GIS (Geographic Resources Analysis Support System) is an open source, Free Software Geographical Information System (GIS) with raster, topological vector, image processing, and graphics production functionality that operates on various platforms through a graphical user interface and shell in X-Windows. It is released under GNU General Public License (GPL).

Macaulay 2
Macaulay 2 is a software system devoted to supporting research in algebraic geometry and commutative algebra, whose development has been funded by the National Science Foundation.

NumExp
NumExp is a family of open-source applications for numeric computation. When it was created, the idea was to make a powerfull tool like Mathematica. Now, we know this is almost impossible without more open-source hackers. Meanwhile, we are trying to make, at least, an useful tool!

GtkGraph
GtkGraph is a simple graphing calculator written for X Windows using the Gtk+ widget set. It is intended as a replacement for a standalone graphing calculator, which typically costs over $80 USD, and has a tiny monochrome display driven by a CPU running at around 6 MHz with no FPU. GtkGraph can plot functions and solve arithmetic expressions using double precision arithmetic.

surf
surf is a tool to visualize some real algebraic geometry: plane algebraic curves, algebraic surfaces and hyperplane sections of surfaces. surf is script driven and has (optionally) a nifty GUI using the Gtk widget set.

The E Equational Theorem Prover
E is a a purely equational theorem prover for clausal logic. That means it is a program that you can stuff a mathematical specification (in clausal logic with equality) and a hypothesis into, and which will then run forever, using up all of your machines resources. Very occasionally it will find a proof for the hypothesis and tell you so ;-).

TISEAN
TISEAN is free a software project for the analysis of time series with methods based on the theory of nonlinear deterministic dynamical systems, or chaos theory, if you prefer.

Plotting Software
Gnuplot
gnuplot is a command-driven interactive function plotting program. It can be used to plot functions and data points in both two- and three-dimensional plots in many different formats, and will accommodate many of the needs of today's scientists for graphic data representation. gnuplot is copyrighted, but freely distributable; you don't have to pay for it.

NCAR
The NCAR Command Language (NCL) is a programming language designed specifically for the access, analysis, and visualization of data. NCL can be run in interactive mode, where each line is interpreted as it is entered at your workstation, or it can be run in batch mode as an interpreter of complete scripts.

Gri
Gri is a language for scientific graphics programming. The word "language" is important: Gri is command-driven, not point/click. Some users consider Gri similar to LaTeX, since both provide extensive power as a reward for tolerating a learning curve. Gri can make x-y graphs, contour graphs, and image graphs, in PostScript and (someday) SVG formats. Control is provided over all aspects of drawing, e.g. line widths, colors, and fonts. A TeX-like syntax provides common mathematical symbols.

PLplot
PLplot is a library of functions that are useful for making scientific plots. PLplot can be used from within compiled languages such as C, C++, FORTRAN and Java, and interactively from interpreted languages such as Octave, Python, Perl and Tcl. The PLplot library can be used to create standard x-y plots, semilog plots, log-log plots, contour plots, 3D surface plots, mesh plots, bar charts and pie charts. Multiple graphs (of the same or different sizes) may be placed on a single page with multiple lines in each graph.

PGPLOT
The PGPLOT Graphics Subroutine Library is a Fortran- or C-callable, device-independent graphics package for making simple scientific graphs. It is intended for making graphical images of publication quality with minimum effort on the part of the user. For most applications, the program can be device-independent, and the output can be directed to the appropriate device at run time.

plotutils
The GNU plotutils package contains software for both programmers and technical users. Its centerpiece is libplot, a powerful C/C++ function library for exporting 2-D vector graphics in many file formats, both vector and raster. It can also do vector graphics animations.

SciGraphica
SciGraphica is a scientific application for data analysis and technical graphics. It pretends to be a clone of the popular commercial (and expensive) application "Microcal Origin". It fully supplies plotting features for 2D, 3D and polar charts. The aim is to obtain a fully-featured, cross-plattform, user-friendly, self-growing scientific application. It is free and open-source, released under the GPL license.

Grace
Grace is a WYSIWYG 2D plotting tool for the X Window System and M*tif.

Ptplot
Ptplot 5.2 is a 2D data plotter and histogram tool implemented in Java. Ptplot can be used as a standalone applet or application, or it can be embedded in your own applet or application.

DISLIN
DISLIN is a high-level plotting library for displaying data as curves, polar plots, bar graphs, pie charts, 3D-color plots, surfaces, contours and maps.

ImLib3D
ImLib3D is an open source C++ library for 3D (volumetric) image processing. It contains most basic image processing algorithms, and some more sophisticated ones. It comes with an optional viewer that features multiplanar views, animations, vector field views and 3D (OpenGL) multiplanar. All image processing operators can be interactively called from the viewer as well as from the UNIX command-line. ImLib3D's goal is to provide a standard and easy to use platform for volumetric image processing research. Focus has been put on simplicity for the developer. ImLib3D has been carefully designed, using modern, standards conforming C++. It intensively uses the Standard C++ Library, including strings, containers, and iterators.

GLgraph
GLgraph visualize mathematical functions. It can handle 3 unknowns (x,z,t) and can produce a 4D function with 3 space and 1 time dimension.

MayaVi
MayaVi is a free, easy to use scientific data visualizer. It is written in Python and uses the amazing Visualization Toolkit (VTK) for the graphics. It provides a GUI written using Tkinter. MayaVi is free and distributed under the conditions of the BSD license. It is also cross platform and should run on any platform where both Python and VTK are available (which is almost any *nix, Mac OSX or Windows).

Graphviz
Graph Drawing Programs from AT&T Research and Lucent Bell Labs

Numerical Libraries
GNU Scientific Library
The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers. It is free software under the GNU General Public License.

The library provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions in total.

SAML
The "Simple Algebraic Math Library" is a C library for computer algebra, together with some application programs: a desktop calculator, a spreadsheet (sort of) and a program to factorize integers.

Numerical Python
Numerical Python adds a fast, compact, multidimensional array language facility to Python.

VTK
The Visualization ToolKit (VTK) is an open source, freely available software system for 3D computer graphics, image processing, and visualization used by thousands of researchers and developers around the world. VTK consists of a C++ class library, and several interpreted interface layers including Tcl/Tk, Java, and Python. VTK supports a wide variety of visualization algorithms including scalar, vector, tensor, texture, and volumetric methods; and advanced modeling techniques such as implicit modelling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation. In addition, dozens of imaging algorithms have been directly integrated to allow the user to mix 2D imaging / 3D graphics algorithms and data. The design and implementation of the library has been strongly influenced by object-oriented principles.

PDL
PDL (``Perl Data Language'') gives standard Perl the ability to compactly store and speedily manipulate the large N-dimensional data arrays which are the bread and butter of scientific computing.

LAPACK
LAPACK is written in Fortran77 and provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers. Dense and banded matrices are handled, but not general sparse matrices. In all areas, similar functionality is provided for real and complex matrices, in both single and double precision.

PARI-GP
PARI-GP is a software package for computer-aided number theory. It consists of a C library, libpari (with optional assembler cores for some popular architectures), and of the programmable interactive gp calculator. While you can write your own libpari-based programs, many people just start up a gp session, or have gp execute their scripts.

Python Number Crunching
This page lists a number of packages related to numerics, number crunching, signal processing, financial modeling, linear programming, statistics, data structures, date-time processing, random number generation, and crypto.

LINPACK
LINPACK is a collection of Fortran subroutines that analyze and solve linear equations and linear least-squares problems. The package solves linear systems whose matrices are general, banded, symmetric indefinite, symmetric positive definite, triangular, and tridiagonal square. In addition, the package computes the QR and singular value decompositions of rectangular matrices and applies them to least-squares problems. LINPACK uses column-oriented algorithms to increase efficiency by preserving locality of reference.

LINPACK was designed for supercomputers in use in the 1970s and early 1980s. LINPACK has been largely superceded by LAPACK, which has been designed to run efficiently on shared-memory, vector supercomputers.

ATLAS
ATLAS stands for Automatically Tuned Linear Algebra Software. ATLAS is both a research project and a software package. This FAQ describes the software package. ATLAS's purpose is to provide portably optimal linear algebra software. The current version provides a complete BLAS API (for both C and Fortran77), and a very small subset of the LAPACK API. For all supported operations, ATLAS achieves performance on par with machine-specific tuned libraries.

CLN
CLN is a library for computations with all kinds of numbers. It has a rich set of number classes... [see web page]

Colt
This distribution provides an infrastructure for scalable scientific and technical computing in Java. It is particularly useful in the domain of High Energy Physics at CERN: It contains, among others, efficient and usable data structures and algorithms for Off-line and On-line Data Analysis, Linear Algebra, Multi-dimensional arrays, Statistics, Histogramming, Monte Carlo Simulation, Parallel & Concurrent Programming. It summons some of the best concepts, designs and implementations thought up over time by the community, ports or improves them and introduces new approaches where need arises. In overlapping areas, it is competitive or superior to toolkits such as STL, Root, HTL, CLHEP, TNT, GSL, C-RAND / WIN-RAND, (all C/C++) as well as IBM Array, JDK 1.2 Collections framework, JGL (all Java), in terms of performance (!), functionality and (re)usability.

Programming Languages
Lush
Lush is an object-oriented programming language designed for researchers, experimenters, and engineers interested in large-scale numerical and graphic applications. Lush is designed to be used in situations where one would want to combine the flexibility of a high-level, loosely-typed interpreted language, with the efficiency of a strongly-typed, natively-compiled language, and with the easy integration of code written in C, C++, or other languages.

Nickle
Nickle is a programming language based prototyping environment with powerful programming and scripting capabilities. Nickle supports a variety of datatypes, especially arbitrary precision numbers. The programming language vaguely resembles C. Some things in C which do not translate easily are different, some design choices have been made differently, and a very few features are simply missing.

Nickle provides the functionality of UNIX bc, dc and expr in much-improved form. It is also an ideal environment for prototyping complex algorithms. Nickle's scripting capabilities make it a nice replacement for spreadsheets in some applications, and its numeric features nicely complement the limited numeric functionality of text-oriented languages such as AWK and PERL.

Open Dynamics Engine
ODE is a free, industrial quality library for simulating articulated rigid body dynamics - for example ground vehicles, legged creatures, and moving objects in VR environments. It is fast, flexible, robust and platform independent, with advanced joints, contact with friction, and built-in collision detection.

Blitz++
Blitz++ is a C++ class library for scientific computing which provides performance on par with Fortran 77/90. It uses template techniques to achieve high performance. The current versions provide dense arrays and vectors, random number generators, and small vectors and matrices.

FFTW
FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine and sine transforms, the DCT and DST). We believe that FFTW, which is free software, should become the FFT library of choice for most applications.

Our benchmarks, performed on on a variety of platforms, show that FFTW's performance is typically superior to that of other publicly available FFT software, and is even competitive with vendor-tuned codes. In contrast to vendor-tuned codes, however, FFTW's performance is portable: the same program will perform well on most architectures without modification. Hence the name, "FFTW," which stands for the somewhat whimsical title of "Fastest Fourier Transform in the West."

GMP
GNU MP is a library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers. It has a rich set of functions, and the functions have a regular interface.

NURBS++
Non-Uniform Rational B-Splines (NURBS) curves and surface are parametric functions which can represent any type of curves or surfaces. This C++ library hides the basic mathematics of NURBS. This allows the user to focus on the more challenging parts of their projects. The library also offers a lot of features to help generate NURBS from data points.

SciPy
SciPy is an open source library of scientific tools for Python. SciPy supplements the popular Numeric module, gathering a variety of high level science and engineering modules together as a single package.

SciPy includes modules for graphics and plotting, optimization, integration, special functions, signal and image processing, genetic algorithms, ODE solvers, and others.

Sites of Interest
The Linux Lab Project
The Linux lab project is intended to help people with development of data collection and process control software for LINUX. It should be in understood as software and knowledge pool for interested people and application developers dealing with this stuff in educational or industrial environment.

Programming Systems on GNU/Linux
This page deals with links to tutorials, documents, and Linux implementations for installing Linux on a PC, getting started with Linux, and then going a step further -- to optimise your PC for processing power, using multiple processors (Symmetric Muliti Processing - SMP); making a cheap, upgradeable, Supercomputing Linux cluster and finally links to software to do parallel programming on Linux.

Scientific Applications on Linux
SAL (Scientific Applications on Linux) is a collection of information and links to software that will be of interest to scientists and engineers. The broad coverage of Linux applications will also benefit the whole Linux/Unix community. There are currently 3,070 entries in SAL.

Netlib
Netlib is a collection of mathematical software, papers, and databases.

FSF Free Software Directory - Mathematics
[Collection of GPL'd and other Free Software] Sphere: Related Content

Conclusions on Parallel Computing

By Asaf Shelly (21 posts) on April 9, 2010 at 11:10 am

We have been dealing with parallel computing for some while now. Some of the ideas we had at the start proved to be wrong while others are only becoming relevant in the near future. No doubt about it, parallel computing was pushed and forced into the mainstream of computing just as Object Oriented was in the previous millennia.

Some History: Hardware

The first to deal with parallel computing were hardware developers because the hardware supports multiple devices working at the same time, with different operation rates and response times. Hardware design is also Event Driven because devices work independently and issue an Interrupt event when required. The computer hardware we know today is fully parallel however it is centralized with a single CPU (Central Processing Unit) and multiple peripheral devices.

Some History: Kernel

The next to support parallel computing was the software infrastructure which in modern operating systems is the Kernel. The Kernel must support multiple events coming in the form of Hardware Interrupts and propagated upwards as Software Events. Kernels are commonly distributed in design as several Drivers can communicate with each other. The centralized object in the system is allowing communication between the drivers and supports synchronization but is not supposed to contribute to the application's business logic in any form or way.

Some History: Network

UNIX is based on services. A Service is a way to call a function over network. Network technologies required distributed design in which every element is completely parallel to the next and there is no single 'processor unit' as the system's master. UNIX took this to the next level with technologies such as services, pipes, sockets, mailslots, Fork and more. At a time when programming was a tedious work, developing an operating system to support Fork meant extensive efforts. Still UNIX had built in support for that mechanism which solves so many problems... Only we forgot how to use it and I don't remember seeing a new system design that had Fork in it.

Some History: Applications

When I just started with C programming and have just found out about threads I tried doing things in parallel just to see how it works. The result was, as you can imagine, by far worse. The application runs much slower, there are "Random Bugs" and the code looks terrible. The explanation I got was that there is only one CPU and the different threads compete over it. No Multi-Core CPU means that there is no ROI (return on investment) for using multiple threads and the large efforts required for a parallel design. The only reason to use a thread is when you really have to for example when there is need to wait for hardware or a network buffer.

Parallel Computing Today

A few years ago CPUs got to a certain hardware limitation which would have required special cooling. At this point the race to reduce silicon size and increase clock frequency has ended. Instead of spending massive amounts of silicon on the CPU for advanced algorithms to improve instruction pre-fetch, smaller and simpler CPUs are used and there is room for more CPUs on the same silicon wafer. We got the Multi-Core CPU which practically means several CPUs on the same computer.

At first the cores of a Multi-Core CPU were simpler than the single core one. These cores also operated in a much lower frequency which meant that an application designed for a single task operation had a massive performance impact when moving to a new computer, for the first time ever.

Parallel Computing has become main stream. We started with a long series of lectures about parallel computing. It seemed that people wanted to know about this subject but there was so much overhead that Parallel Computing simply scared people away. There is a huge ramp before you can be a good parallel programmer. Just as there is for object oriented programming. This meant that team leaders and architects were at the same level as beginner programmers, or perhaps with some very little advantage. Add to this the fact that there are massive amounts of code already written for a single core CPU and good advantages can be achieved after at least some re-write. Last but most important reason to reject parallel computing was that it is easier and cheaper to buy another machine than to make the best out of the CPU cores. This was actually a boost for Cloud Computing.

Who is doing Parallel Computing

There are several types of parallel computing. The hardware is parallel so the Kernel is parallel. With this type of parallelism every worker is doing something else, and workers own their resources instead of sharing them. For a long while now DSP (Digital Signal Processing) chips are Multi-Core CPUs so that the algorithms executed on these chips can run faster. Algorithms and DSP chips are evaluated by MIPS which is the amount of instructions per time constant. Gaining performance increase with an algorithm means either using less instructions or adding more worker CPU cores. PCs also run algorithms such as face recognition, image detection, image filtering, motion detection, and more. The transition from single core CPU to a Multi-Core CPU was fast and simple.

Algorithm's increase in performance is relative to the amount of computations per data item. More computation more cores can be used. Image Blending (fade) is an example for an algorithm which cannot enjoy the use of more than a single core. Take an image and blend each pixel with the corresponding pixel of another image. Each pixel should be read from RAM then a simple addition and shift right are performed and then the result should be writen back to RAM. The CPU can operate at a rate of 3GHz and the RAM at 1GHz. For each pixel in the image we: Read pixel A, Read pixel B, Add, Shift, Write result pixel. Add another core and the CPU cores will mutually block on access to the memory. This is also true for Databases and database algorithms such as sort algorithms, linked lists, etc. For this reason the new Multi-Core CPUs have extensive support for parallel access to memory.

Parallel Computing ROI

Parallel Computing is the new future for computers. Object Oriented is no longer the new buzz word. I keep telling people that before they make an Object Oriented Design to their systems they should make flow charts. Good OOD is based on good system flow charts, whether you write them down or do it in your head as an art.

We all used to think that User Interface is the product and OOD is the way to do it. It now looks like we were wrong:

User Experience is the prodcut and Parallel Design is the way to do it. User Experience (UX) is not User Interface (UI). User Interface defines what the product would look like, or in other words UI defines what the product is. Object Oriented Design defines what the code looks like, or in other words OOD defines what the code is. Parallel Computing defines how the code works, or in other words Parallel Computing defines what the code does. User Experience defines how the application behaves, or in other words User Experience defines what the application does.

I am not using a C++ library because it is using linked-lists. I am using that library because it can sort.

I am not buying a product because it looks like I want it to look, for this I can buy a framed picture instead. I am buying a product because it is doing something I need and it is not doing what I do not need.

Parallel Computing is the basis for User Experience. Even if you have a single core it is better to have good parallel design. As customers you know this, you don't want to accidentally hit "Print" instead of "Save" and now wait for 5 seconds punishment for the dialog to open so you can close it. (see minute 43 for demo video)

Today we have so many good resources and tools. Now is the time to learn how to work parallel and produce good prodcuts with good UX.


Comments (7)
April 14, 2010 6:55 AM PDT


Peter da Silva I was doing parallel computing on single-CPU systems back in the late '70s and early '80s, without even thinking about it. It was mainstream. It was called the "UNIX command line". The UNIX pipes and filters model took advantage of parallelism on a single computer by allowing you to take advantage of parallelism inherent in teh division of work between I/O and computation. A UNIX pipeline allowed programs to accumulate and buffer data as fast as the disks could provide it, so that data was available for computation as soon as the CPU-intensive components of the pipeline were ready for it. When multiple CPUs became available, this just happened automatically.

For slow and latency sensitive devices, such as tape drives, one of the earliest tools for buffering I/O was simply to run the "DD" command with a large buffer multiple times in a pipeline: "tar cvf - | dd bs=16k | dd bs=16k | dd bs=16k > /dev/rmt0h" (this was on a PDP-11, 16k was a large buffer). The output of "tar" was uneven and bursty, because it was seeking all over the disk to collect the files for the archive, but the output of the final "dd" was smooth and the tape was able to stream for many megabytes at a time.

This had nothing to do with your proposed redefinition of parallel computing as a user experience design tool, it was a more or less automatic byproduct of good factoring of the problem. It was coarse-grained and could be bottlenecked by non-streaming operations (eg, sorts), but it was an early and effective tool. There have been similar tools created for specialized problem areas in GUI applications, such as MIDI apps that let you lay out multiple MIDI processing steps in two dimensions and hook them together by "wires", but the same kind of factoring of the problem space for GUI applications hasn't really been found.
April 14, 2010 8:34 AM PDT


Richard H. The image blending example only highlights the inherent non-parallel nature of memory-cpu bus contention. Current PCs with multi-cores aren't 100% parallel at the hardware level. ie. the Von-Neuman bottleneck is still present.
Lower your expectations, or get a system that really is parallel at the bus level.
April 14, 2010 8:35 AM PDT


Yves Daoust I don't quite share the comparison of parallel computing with object oriented design. I see the latter as a small step in the art of programming, as opposed to a giant leap for the former.

Anyone can write sequential programs after a few minutes of training on any procedural language. Most people end up writing well structured programs after a few years of practice and find no difficulty switching to Object Oriented Programming.

Writing concurrent programming is of another nature. It reserved for true experts, with a truly scientific understanding of the issues. Just think of the Dining Philosophers problem: even though the problem statement looks easy, I doubt that ordinary people can solve it correctly.

In fact, I consider that parallel programming is not within reach of ... the human brain, except in simple or symmetrical cases. As soon as there are two or three asynchronous agents, you lose the control :)
April 14, 2010 1:44 PM PDT


Thierry Joubert It is true that we see nowadays about as many conferences on Parallel Programmingin as we saw on OOP during the early 90's. From time to time, big actors have to convice the masses. Today, with Java and .NET, OOP has become the standard (try to give a C/C++ course to students if you are any doubt about this). The OOP "push" came from the software industry whose motivation was to provide efficient programming interfaces for programmable products like GUIs, Databases, system services, etc. OOP was a movement towards progress.

Parallelism is one of the oldest thing in computer science as stated in the article and several comments, but the Parallel Programming "push" we see nowadays is organized by silicon vendors who failed to keep up on the Moore's Law slope. OOP was not motivated by any limitation, and I see a noticeable difference here.
April 14, 2010 4:47 PM PDT


paul clayden Parallel is a fad and won't last. It's an interim measure to something much much bigger. Pretty soon we'll have analogue computing/quantum computing which is going to rock all our worlds.
April 14, 2010 8:11 PM PDT


Lava Kafle superb clarification, We have been using oparallelism in java oracle .Net CSharp whatever since very beginning of X64 Architectures supported by Intel
April 18, 2010 3:00 AM PDT

Asaf Shelly
Asaf Shelly Total Points:
1,930
Brown Belt
Hi All,

I will start with thanking Peter for the extensive information. Truly something to respect.

This shows us that the basic ideas were already there and where somehow lost in time. Makes me wonder what else did we forget.

Back at the old days applications and drivers usually had only a few components. These were separated by using different source files. Later in time we had a massive upgrade to use classes and objects as part of the Object Oriented programming and design. C programmers did not have to write down the Object Design whereas C++ programmers found it almost intuitive and mandatory. C programming also defines procedures. Notice the name "Procedure", it means that the function is not a 3 line variable modification code, rather it is a whole procedure in the main process. The flow chart was also too often not written down but as we can see by the names the application was a 'Process' to perform which had a 'main procedure' and several other 'procedures'. Old school programming defined Procedures and Structures, we now go back to Tasks and Objects. This is why my website (where the video is found) says " Welcome to the Renaissance"...

I was slowly getting to reply to Yves Daoust's: "In fact, I consider that parallel programming is not within reach of ... the human brain". See minute 12:30 in the same video mentioned at the end of the post. Everything we do is parallel. If you work as part of a big organization then you probably do Object Oriented Design and manage the programming tasks using SCRUM methodology. Take a look at SCURM, copy the principles to your code and you have a good parallel application. I quote Wikipedia ("http://en.wikipedia.org/wiki/Scrum_(development)") : "...the 'ScrumMaster', who maintains the processes..." There is also sprint, backlog, priority, and daily sync meeting which is used to profile the operation and keep track of progress. There are also interesting things to learn from it, for example the daily sync meeting is where you report of all problems. This means that we don't raise an exception for every problem, instead we collect all the errors and report when the time it right. This might solve a few problems that parallel loops are struggling with.
The " Dining Philosophers problem" is a way to manage a proposed solution – Locks, it is not a way to solve the problem. If instead of using a set of locks you use a service for each resource the problem is completely different.

Is the image here http://www.9to5mac.com/intel-core-i7-mac-pro-xserve the answer to Richard's question?

Hi Thierry, I could respectfully argue that OOP was motivated by the limitation in managing large scale projects just as parallel programming is motivated by managing large scale systems. OOP is for the design time and parallel programming is for the run time. Not that I don't agree with you. It is possible that OOP was focused on so much for the past few years that programmers today think only in objects but find it very difficult to think in tasks.

I guess I have to say to Paul that parallel programming is ignorant to the engine. I am suggesting you use a word-processor instead of a typewriter. It does not matter whether you are using MS-Office for Mac, Open-Office, or something new that will be invented 5 years from now. Quantum computing or not, my application should still know how to cancel an operation when it is no longer required.

Thanks for the comment Lava.

Regards,
Asaf Sphere: Related Content