19 April 2009

Unix Programming - Getting the executable

Post 2

So, you now know what is a process from a layman's view. Process is a live entity which has some mission to be completed and it has some input and output resources and does some computing. These processes are run in computers. If you see the computers right from 1980s, there are many types of computers like super computers, desktop personal computers, laptops and even you have decent amout of computing being done in mobile devices. In all these devices, there are at least handful of processes to tens of thousands of processes running at the same time. If you look at these devices, they are quite different in application and quite different in assembling. For example, your desktop computer may have many hardware components including processor which does the computing. The way the computation is done may vary from processor to processor. So, each processor comes up with a way of getting things done. Each processor will have specific set of instructions, often referred as instruction set, using which the computation can be done.

Assume a scenario like this, you are writing a software that takes a year's time to write but you want that software to be run in most devices. For example, let us assume that you are much concerned about time management and you are writing a simple daily planner. Isn't it reasonable if you want it to be run in your desktop, mobile and even super computer? But if you write a software using instruction set of a specific processor, your software will not work in other devices which has different processor. In order to overcome this, computer scientist came with the concept of high languages like C and C++. These high level languages have specific syntax often expressed using English, special characters and scientific notations. A special software, compiler or interpreter, is used to convert the code written in high level language to machine language. So, now you have moved from writing software for specific device to writing software for whole bunch of devices and using compilers/interpreters to actually generate machine instructions from your high level language.

When you write a program in high level language, it cannot be directly run on your computer. As a first step, you have to compile it, link it and load it. We have seen what is compilation. But what really is linking? In your software, you will do some commonly used operations such as getting input from the user, reading a file, writing to a file and displaying output to the user. What you do with the input and how you process the input to produce output may vary from program to program but all the programs tend to have certain common denominator. Rather than writing code for this common denominator for each software you write, does it sound good to write the common code once and keep it for lifetime. Yes, it is in fact a brillant idea and the common code is called library. When you are writing software you will be quite often referring this common library. For example, in our "Hello, World", printf is a library function that outputs to standard output. Don't bother if you do not get what is standard output. We will uncover that in future. Time being, assume that it is your display. The concept of attaching your code with the common library is called linking. Only after compiling and linking, your code becomes fully functional and ready to be run. We call it as "binary" or "executable" or "exe".

So far, we have prepared a program and not yet executed it which is equall exciting if not more exciting. In the next post, we will again discuss about a simple program little bit deeper. I assure you, before end of next week, I will tell you about process address space :-). I feel that it is very important to know these details so as to get deeper understanding. I just don't want to 100th text book and 100001th webpage. I believe in quality rather than quantity and quantity makes sense when there is quality. Hope you agree with me.

Catch you later