17 April 2009

Unix Programming - What is Process?

Post 1

Today, we have so many programming languages and many software that have been created with those programming languages. It is surprising to see that an operating system design is still scaling from mini devices such as watches, washing machines to super computers and clusters. KISS, stands for Keep It Simple and Stupid, an acronymn given any piece of software that does one thing and does it well. It is not overstatement if I say Unix is a cornerstone in computing. There has been so many software that were produced but yet it is difficult to find one something like Unix. Unix is such a marvelous creation like the ones of Bethoven, Michaelanglo. The concept of Unix is simple - Process have life and files have space. Many operating systems have cloned the concept and design philosophies of Unix. If your Unix/Linux system is doing something, it should be combination of Processes and Files. In this post and few (not sure how many) subsequent posts, we will discussing more on Unix/Linux programming concepts  predominantly from user space and from kernel space as and when it is required. My intention is not cover the entire stuff in a single post, but rather I want to be consistent in posting short post. I feel, it is gives a sense of accomplishment to interested readers after reading few posts. It also gives better retaintivity as you can come back to short posts and quickly go through it which improves your short term memory. Without wasting much time, let us get into action.

What is a process? A process is a live entity, it is a program under execution. But when you generalize with such as definition, we tend to forget what really is a process. This may be an answer that you can give in an interview but not when you are trying to understand the system. It is the processes that move the system. Apart from being a program under execution, what is the process? A process is a black box that takes input, processes it and produces the output. It is an algorithm or group of algorithms that is running on the computer. When you say "taking input", what sort of inputs does it take? The input can be a data from a memory location, a stream of bytes from hard disk or network hosts, a signal from its fellow process or its parent and the input resources can be of any form. So, now you know that you need input. The next thing is how do you intend to operate on the input data. Apart from input resources, you also need a logic (code or machine instruction) that processes the data. Until now we have seen data received from input resources and code that processes the data both requiring some space. After you manipulate the data, you need store the data to a output resource or device. But generally, computing is not that simple. Before storing it to output device, you may need to have so many intermediate stages as a part of your algorithm. When you connect these things, you will get the process. A process is not just a program under execution but much more than that. A process is live entity that has address space, context, input resources (open files, database connection, sockets) and output resources (open files, database connection and sockets), state and so many other things in user space and as well as in kernel space. Not only user space, the process has also something in kernel space like per process area and data structures to access the input/output resources. Everything put togother is a process. It is not just code, it is much more than that.

Assume that you have given a billion dollar to answer this question. If you know "C" programming language or if you do not know "C" you can take any programming language. Can you give me a write up (you can to the comments section) on how the following program is executed in a Unix/Linux system. If you answer this question, I believe the size does not matter. You can work on super computers or high performance clusters. Here you go.
int main()
{
    char *str = "Hello, World";
    printf("%s", str);
    return 0;
}
In the next post, we will be discussing how this program is loaded into the memory and how the address space of this process is going to look like and then slowly move on to how this process is executed, how "Hello, World" is printed.

Catch you later.