Tuesday, January 3, 2012

Introduction to TPL Part 1

For many years we have seen the Processor clock frequency going up. Well there is vertical limit. All of a sudden we are seeing processors having multiple cores. Well, what is a multicore processor? We can approximately say that multiple processors with in one processor. These internal processors are called Cores. Each of these cores can execute two threads (hardware threads) at a time (based on architecture). Here we must not assume that these hardware threads are executing by switching. But these are getting executed in parallel in real time.

Well what does this make difference to the developer of the .net programs? Generally looking at the history of development of (common) windows based application; It is designed such a way that the developers tend not to think too much into the Processor architecture. These programs are written such a way that the processor is a core central resource and being used all over... Even if the threads are being used they were used in the thread switching scenarios. The Multi-threading programming model is one of the most efficient ways for creating flexible, scalable and responsive applications.

The same model is being used in the Parallel programming. The Threads that run on the same processor run on various cores of the microprocessor. This happens in the real time and the developer gets the access to the threads that are running in real time. This is the biggest of all the programming advances apart from the language semantics offered by various programming languages like C#, F#, Iron Python, Etc…

This great advantage is not only offered to the developers but to the industry as a whole. When the computers that are built on multiple cores have more efficiency running the programs than on single core. The customer is the King in the industry. He requires more from less. The task of achieving this is one of the prime challenges to the software industry. The point is that “why should I pay for the application running less efficiently on a better machine even after many years of computer revolution, let use my old app ??!!”.

Here the developers and the Technical decision makers must realize that writing good old code is not sufficient and efficient enough. For the developer, he must upgrade the skills or remember good old engineering days. For the Companies who offer the services, it is right time to cut through the competition which will be stiff even for a mid-level mid-market segment company. These companies can top up the technical skills by training their resources. Some people still think that parallel programming means Super-computing and are not worried about the cores of anything. Some people even think that it is just multi-threading need not be cared too much.

Let’s concentrate about how a developer can leverage the benefit from the Parallel programming to stay ahead of the competition.

The Parallel programming model is provided by the .Net framework V 4.0. This is called Task Parallel Library in general.

The TPL (Task Parallel Library) consists mainly of three parts

1. Task and Task Scheduling
2. Parallel LinQ
3. Blocking Collection

Task

A Task is a unit of work that runs as a thread. We can assume that a task is a thread. These tasks can be spanned with in another task. These tasks use data independently or use shared data across multiple tasks.

Task Scheduling

The developer can control how the tasks can execute, like for example, are they mutually exclusive or concurrent etc… Developer can also write custom task scheduler to their requirements. It is visualized that the future of the TPL will have scheduler that runs the tasks on different computers. I read some interesting statistics about the task switching. It takes around 6K to 8K processor cycles to switch the tasks, about 200K cycles to create a thread and about 100K cycles to park a thread.

Blocking Collection

Blocking collection is a Thread safe collection of classes that also implement the Producer – consumer pattern. The thread safe mechanism offers the developers the flexibility to use in multi-threading environment. This collection also implements the enumeration so that Languages like C# and VB.Net can use them in the “for each” loops.

PLinQ – Parallel Language linked Query

PLinQ is nothing but LinQ having parallel processing capability. There are situations where the linq entities will behold data of magnanimous proportions. When any aggression functions run on the data collection the processing happens on a single thread. This reduces the efficiency of processing. The aggression functions are easily processed in parallel as these are highly linear and predictable in nature.

Code - Typical
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Threading;

namespace TPLExample1
{
    class Program
    {
        static void Main(string[] args)
        {
            var outerTask = Task.Factory.StartNew(() =>
            {
                Console.WriteLine("Outer task starting.");

                var childTask = Task.Factory.StartNew(() =>
                {
                    Console.WriteLine("Inner task running");
                    Thread.Sleep(5000);
                    Console.WriteLine("Inner task completed");
                });
                Console.WriteLine("waiting for child Task to complete...");
                childTask.Wait();
            });

            Console.WriteLine("waiting for outer to complete...");
            outerTask.Wait();
            Console.WriteLine("Outer task completed. Hit any key to exit...");
            Console.ReadLine();
        }
    }
}