Hyper-Threading with the Intel Core i7

June 14th, 2009

I have got a new computer. As alway i build it myself. See the following photos and note the impressive size of the cpu cooler by Noctua).

   

I chose the Intel Core i7, because i was very curious about it’s technical features. It has four “real” physical cores, but provides eight “virtual” cores with hyper-threading. These “virtual” cores are shown by the operating systems in their task/process managers. See the following screenshots for Windows and Linux.

    8 cores on Linux

The question i asked myself is: How do these virtual cores perform ? How many programms can i run in parallel without hurting performance ? What is the speedup ? Is it 4 ? Is it 8 ?

So I made a test. I chose a single threaded program, the ray tracer pbrt and started this program 1, 2, 3, …, 8, 9, 10 times as a process under Linux and timed the running times. Here are the results.

Number of programms Running times Speedup Explanation
1 2 3 4 5 6 7 8 9 10
1 1:18.27 - - - - - - - - - 1  
2 1:18.57 1:18.32 - - - - - - - - 1.997  
3 1:18.69 1:18.76 1:19.18 - - - - - - - 2.97  
4 1:19.62 1:21.88 1:20.12 1:19.68 - - - - - - 3.83  
5 1:54.01 1:54.38 1:53.47 1:19.33 1:54.90 - - - - - 3.41 2 cores with 2 threads each and 1 core with 1 thread
6 1:56:13 1:22.16 1:23.09 1:54.22 1:55.41 1:54.95 - - - - 4.05 2 cores with 2 threads each and 2 core with 1 thread each
7 1:53.27 1:25.28 1:53.62 1:53.92 1:56.38 1:55.49 1:54.05 - - - 4.72 3 cores with 2 threads each and 1 core with 1 thread
8 1:59.50 1:57.72 1:55.16 1:54.96 1:58.60 1:57.72 1:58.46 1:59.62 - - 5.25 4 cores with 2 threads each
9 2:08.65 2:09.34 1:59.44 2:07.06 2:00.61 2:38.73 2:02.70 2:01.40 2:10.74 - 4.45 4 cores with 2 threads each
10 2:04.29 2:23.16 2:44.80 2:09.42 2:45.95 2:16.97 2:14.71 2:10.60 2:15.10 2:09.96 4.73 4 cores with 2 threads each

For up to four programs the Core i7 behaves like a usual four core processor. These four programs can run in parallel with the same performance of about 80 seconds. The speedup is almost linear.

When more than four programs run, the processors has to run at least two threads on one core. Then two virtual processors have to share a single physical processors and the programs take about 114 seconds.

Conclusion: Hyper-threading gives us some extra computing power here. The best speedup of 5.25 was achieved with 8 programs.

By the way: the following image was the one rendered for the benchmark. See the gallery of pbrt for more.


Parallelization with Haskell - Easy as can be

June 7th, 2009

The functional programming language Haskell provides a very easy way of parallelization. Consider the following naive implementation of the
Fibonacci function.

fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

This implementation has a bad expontential time complexity, so it should be improved, for example with caching. But this is beyond the scope of this article. We just need a function that takes a while to finish.

In Haskell there are two operators that have to be used for parallelization: par and pseq. par a b is some kind of a “fork” operation: a is started in parallel and b is returned. Keep in mind that Haskell is has a lazy evaluation strategy. a is only evaluated if it is needed The function pseq a b evaluates first a then b.

Equipped with this two operations it is very easy to parallelize fib.

parfib n
| n < 11 = fib n -- For small values of n we use the sequential version
| otherwise = f1 `par` (f2 `pseq` (f1+f2)) -- calculate f1 and f2 in parallel, return the sum as the result
where
f1 = parfib (n-1)
f2 = parfib (n-2)

The code has to be compiled with the -threaded option.

ghc -O3 -threaded --make -o parfib ParFib.hs

The number of threads is specified at runtime with the -N command line option.

./parfib +RTS -N7 -RTS

On an Intel Core i7 920 this resulted in a speedup of 4.13 for n=38. This processor has four physical cores.

So this is efficient. Haskell is still one of the best programming languages.

An excercise in parallelization with the Cell Broadband Engine

May 19th, 2009

The cell broadband engine is a multi-core processor. One of the cores, the so called PPE, is a general processor that can handle I/O, memory, etc. There are 6 so called SPEs that are spezialized to number crunching. All the cores are 128-bit SIMD .

So basically there are two ways to parallelize here.

  1. Run the ray tracer on the six SPEs and merge the results.
  2. Rewrite the ray tracer to process 4 rays simultaneously using the SIMD vectors.

At the point of writing i only implemented the first point. See my homepage for details. The following film shows the ray tracer in action. The ray tracer simply splits the screen into n parts and uses an SPE for each part.

Updated version of the Dynamic Languages Shootout Game available

August 17th, 2008

I updated my contribution to the “Dynamic Languages Shootout”.

I upgraded to Groovy 1.5.6 and to Grails 1.0.3.

New version of the Dynamic Languages Shootout Game available

February 27th, 2008

I updated my contribution to the “Dynamic Languages Shootout”.

I upgraded from Groovy 1.5.1 to Groovy 1.5.4 and from Grails 1.0RC4 to Grails 1.0.1.

First tests have shown performance improvements.

New version of the library of geometric algorithms in Haskell

February 27th, 2008

Almost 10 years after the initial release, i released an updated version of the library of geometric algorithms in Haskell. It now builds with Cabal and requires the Glasgow Haskell Compiler.

Memoization in Groovy with a Decorator

February 27th, 2008

Memoization is a well known optimization technique to avoid repeated calculations. With dynamic programming languages like Groovy it is possible to extend the behaviour of an already exisiting class at runtime. In Groovy this is accomplished with the Meta Object Protocoll and its ExpandoMetaClass.

In Groovy every class has a meta class that can be changed and extended at runtime. One method of this meta class is the invokeMethod() that has the following signature.

Object invokeMethod(Object object, String methodName, Object arguments)

This method controls the calls of methods in the class. By overwriting this method one can implement memoization easily.

class MemoizationDecorator {
	static void memoizeMethods(Class clazz, Set methods) {
		Map cache = [:]
		clazz.metaClass.invokeMethod = { String name, args ->
			def key
			def result
			if (methods.contains(name)) {
				// initialise the cache
				if (!cache[name]) cache[name] = [:]
				if (!cache[name][delegate]) cache[name][delegate] = [:]
				// is there already a memoized result?
				key = args.collect { it.hashCode().toString() }.join(’-')
				result = cache[name][delegate][key]
			}
			if (null == result) {
				// if there is no result, call the method
				def method = delegate.metaClass.getMetaMethod(name, args)
				if (method) result = method.invoke(delegate, args)
			}
			if (methods.contains(name)) {
				// store the result
				cache[name][delegate][key] = result
			}
			return result
		}
	}
}

The cache cache contains the results of the previous calls. The set methods contains the name of the methods that should get memoized.

Lets write a test for this class.

class TestClass {
    int fCalls = 0
    int gCalls = 0
    int f() { fCalls++ }
    int g() { gCalls++ }
}

def m0 = new TestClass()
MemoizationDecorator.memoizeMethods(TestClass, ['f'] as Set)
def m1 = new TestClass()

m0.f() + m0.f() + m0.f() + m0.g() + m0.g() + m0.g()
assert m0.fCalls == 3
assert m0.gCalls == 3

m1.f() + m1.f() + m1.f() + m1.g() + m1.g() + m1.g()
assert m1.fCalls == 1
assert m1.gCalls == 3

The object m0 is created before the memoization decorator was called. Therefore all the three calls to the method f were executed. For object m1 the method f was called only once.

Well the observant reader will have noticed, that the results of the computations are different for m0 and m1. This is a reminder that the correctness is preserved only for purely functional methods, e. g. methods without internal state. This is not the case in our example.

Accessing EMF models with JRuby

January 27th, 2008

JRuby provides access to Java packages, so it is possible to use packages created with the Eclipse Modeling Framework (EMF).

First we have to load the Java packages. The java packages have to be contained in the CLASSPATH.

include Java
include_class 'org.eclipse.emf.common.util.URI'
include_class 'org.eclipse.emf.ecore.xmi.impl.XMIResourceImpl'
include_class 'org.eclipse.example.library.LibraryPackage'

Then we initialize the EMF package and load a file that contains the data of my little hardboiled library.

LibraryPackage.impl
fileURI = URI.createURI('data/generated_library_std.ecore')
resource = XMIResourceImpl.new(fileURI)
resource.load(nil)
library = resource.contents[0]

The following snippet prints out all the books.

library.books.each do |b|
    puts b.title+ "\t" + b.category.to_s + "\t" + b.pages.to_s
end

We can print out all the books with less than 240 pages with the following statement.

library.books.find_all { |b| b.pages < 240 }.each do |b|
    puts b.author.name + "\t" + b.title+ "\t" + b.category.to_s + "\t" b.pages.to_s
end

Or we can print out the titles of all the books of Raymond Chandler available.

puts library.books.find_all { |b|
    b.author.name == 'Raymond Chandler' }.collect { |b| b.title }.join(", ")

Jörn Dinkla came 6th place at the Dynamic Languages Shootout at OOP 2008

January 23rd, 2008

The german magazine JavaSpektrum organized the “Dynamic Languages Shootout” contest for the OOP 2008 conference. The task was the creation of a computer game similiar to Scrabble in a dynamically typed programming language.


Joern Dinkla came 6th at the Dynamic Languages Shootout/OOP 2008

I used Groovy and Grails and came 6th place. See the web pages for further information.

Writing Eclipse plugins with Groovy

January 17th, 2008

Writing plugins for Eclipse with other languages than Java is not officially supported, but there is a way to write an Eclipse plugin with Groovy only. As prerequisites you need Eclipse with a JDT and the Groovy Eclipse plugin.

Follow the following steps. The source code is also available.

1. Create a new Plugin-in project, do not use any templates.
2. Add the Groovy Nature to the project.
3. Create a Java package under src
4. Create a Groovy class HelloGroovyWorld with the following contents

package hellogroovyworld

import org.eclipse.jface.action.IAction
import org.eclipse.jface.viewers.ISelection
import org.eclipse.ui.IWorkbenchWindow
import org.eclipse.ui.IWorkbenchWindowActionDelegate
import org.eclipse.jface.dialogs.MessageDialog

class HelloGroovyWorld implements IWorkbenchWindowActionDelegate {

	private IWorkbenchWindow window

	void run(IAction action) {
		MessageDialog.openInformation(
			window.getShell(),
			"Hellogroovyworld Plug-in",
			"Hello, Groovy world")
	}

	void selectionChanged(IAction action, ISelection selection) {}

	void dispose() {}

	void init(IWorkbenchWindow window) {
		this.window = window
	}
}

5. Edit the plugin.xml file:

5a. On the Dependencies tab add org.eclipse.ui and org.eclipse.core.runtime.
and org.codehaus.groovy.

5b. On the Runtime tab add bin-groovy to the classpath

5c. On the Extensions tab add an org.eclipse.ui.actionSets extension, set visible to true.
Add a menu and an action (Left click the ActionSet, right click and choose “new, menu and action”) and
choose HelloGroovyWorld as the class for the action. The plugin.xml is shown below.

   <extension
         point="org.eclipse.ui.actionSets">
      <actionSet
            id="hellogroovyworld.actionSet1"
            label="Groovy ActionSet"
            visible="true">
         <menu
               id="groovyMenu"
               label="Groovy Menu">
         </menu>
         <action
               class="hellogroovyworld.HelloGroovyWorld"
               id="hellogroovyworld.action2"
               label="Groovy World"
               menubarPath="groovyMenu"
               toolbarPath="groovyMenu"
               tooltip="Hello Groovy World">
         </action>
      </actionSet>
   </extension>

Now run the project as an Eclipse Application.

Hello Groovy World