Perl Scripting Tips

Now if you are writing scripts with Perl and come from a C or C++ programming background, one of the first things you would want to know is Multiline comment. Well its simple, as shown below.

=begin comment

<your script>

=end comment

=cut

The next thing I wanted to work on was to read a file or paragraph line by line and search for the presence of a string in any line. Here’s how to go about it.

#$obj has the set of lines to be parsed

foreach my $resultline (split /^/m, $obj) {

#$resultline has a line currently being read, $str has to be searched in this line

if (index ($resultline, $str) != 1) {

<Your action>

}

}

 

Reference:

 

 

Advertisements

Clipboard copy

As a follow up to the previous article on setup of VNC server and client, I believe that it would be very easy to copy and paste between the client and the server windows. Hence I found this link that can help you setup exactly that.

http://askubuntu.com/questions/41273/how-to-copy-paste-text-from-remote-system

FreeBSD, VNC, Gnome, et. al.

Some of us use Windows as our work environment, while some of us use GNU/Linux. And yet, some of us use FreeBSD. Now assuming you are to ssh to your remote Linux or Unix server with your account and work, the most popular option is to use Putty. However the major disadvantage of an SSH session is that they are non-persistent. Hence once you restart your PC or there is a change in the network, your Putty session has to be restarted, which means you get a fresh shell/terminal and you might be lost as to where you stopped the last time.

Now fret not, since there is a better way. The option is to use VNC or NoMachine or similar tools. What these do is that there is a server running on your remote session (Linux/BSD) in this case and there is a client/viewer application on the machine on which you work on. The client-server application work such that the server transmits graphical information (i.e. each pixel of your remote session) to the client which is displayed to the user. Cool right! It is something similar to Microsoft Remote Desktop, though I have no idea if they work in the same way!

Now normally your FreeBSD machine or VM would just have console access without any GUI. So the first step is to enable Gnome (or KDE) on your FreeBSD VM. Connect to your VM using Putty, and compile XOrg first.

cd /usr/ports/x11/xorg
make
make install

Once this is completed, repeat the same for

gnome2
gnome-terminal
gnome-shell
gnome-session,
gnome-panel
gnome-panel-reference
gnome-menu
gnome-libs
gnome-desktop
gnome-desktop-reference
gnome-applets.

under the x11 folder.

Similarly if you find any other apps missing on your FreeBSD, just google for the ported source code and compile and install the same.

Ex. vim can be found at /usr/ports/editors/vim

In case you come across a situation where the installation requires any file and its not able to find it through any of its online sources, you will need to copy the file to folder directed, which is usually

/usr/ports/distfiles

You can download the necessary file on you working PC,

www.filewatcher.com

is a popular portal to download the missing files, though there are plenty of others.

Use WinSCP to copy the file to your FreeBSD VM. Download and install WinSCP on your working PC, specify the IP address of you VM in WinSCP for an SCP session, and provide your credentials. Once WinSCP starts, you can just copy paste files to your remote VM/machine using your mouse!

 

Then head over to your home directory on your remote machine, and edit the file

~/.vnc/xstartup

and add the below lines

export XKL_XMODMAP_DISABLE=1
/etc/X11/Xsession
/usr/local/bin/gnome-session &

Restart your remote machine/VM and proceed to start the VNC server with the following command.

vncserver -geometry <width>x<height>

You will be provided a URL which you can copy and paste in your local PC’s VNC viewer. You will be prompted to set a password on your first login (which you will need for every login) and viola, you have a GUI based remote VM/machine based on FreeBSD.

FreeBSD usually defaults to C Shell, hence make any changes to the shell config in the file

~/.cshrc

Also

/etc/rc.conf

may need to be edited to enable few things at the system startup.

 

References:

Nitty Gritties of variables and functions

Declaration and Definition

In simple terms, the main difference between declaration and definition is that declaration mentions the type and structure of a variable or a function. Whereas definition allocates memory for a variable or function.

int add(int a, int b);

is a function prototype and is a function declaration. This line has not allocated any storage for the function in the code segment of the program.

int add(int a, int b)
{
    return a+b; 
}

is a function definition for which storage is allocated in the code segment of the program.

Similarly it is possible to declare or define variables.

int a = 10;

int b;

Now in this case, both the above statements are definitions, the only difference being that a is initialized with value 10 whereas b has no initial value. It means both have storage allocated in the data segment of the program. However if you want to specify a variable declaration without allocating memory on the data segment, you have to prefix the extern keyword.

extern int c;
extern int d = 10;

If the variable is prefixed with extern and has no initializer, it is a variable declaration. However explicitly initializing the external variable with a value becomes a definition, which means storage is allocated for it. Hence in the above statements, c is a declaration and d is a definition with storage allocated.

Now comparing the function and variable declaration, you might be wondering why there is no extern prefix for a function declaration. The fact is that extern keyword is automatically pre-appended to a function declaration even if the user doesn’t specify it.

int add(int a, int b);
extern int add(int a, int b);

Hence the add declaration written by the programmer is automatically upgraded to a declaration with extern prefix by the compiler.

Now coming to the scope of variables and functions, the following always holds:

  • Function and variable declarations have external linkage, i.e. they are visible outside the file they are declared in. (They have ‘extern’ prefix, remember)
  • Function and variable definition have internal linkage, i.e. they are visible only in the file they are defined.

const

The const keyword in C/C++ specifies whether the variable/object can be changed or not. If it’s a const in a function, it just means that the variable or object may not be changed. The const keyword mentioned after a function definition or declaration takes it a step further. It means that the function may not modify any variable or object in its definition.

However in C++, const_cast casting can be used to cast away the const’ness of the variable and allow it to be changed. This however means that the programmer is surely knowing what he/she is doing with the variable.

If however the programmer is sure that the variable might be changed in a const function, it may be mentioned as mutable. It simply means that the variable or object may be changed in a const function.

Constant Folding

Constant folding is a concept in C++ where the value of a const variable is directly entered in the symbol table during compilation and the variable is not used. This is a compiler optimization only in C++. Do note that storage is compulsorily allocated for const in C.

So what about the scope of const variables? The following statements hold in this case:

  • In C++, const variables have internal linkage, mainly because they are folded and no storage is allocated for them in the data segment.
  • In C++, const variables which have a declaration using extern, have external linkage, because usage of extern keyword makes it visible outside the program file, hence storage is allocated in the code segment.
  • In C, const variables always have storage allocated in the code segment and have external linkage.

static

Now when a novice programmer is asked the question, what is the use of a static variable, the reply usually is “It is used when the value of the variable has to persist across function calls”

Now when a variable is defined in a function, it is stored in the stack of the function call. So how does a variable value persist if the function is called repeatedly? Shouldn’t the function stack be pushed and unwound based on each function call? The answer here is that static variables or functions are never part of a function stack. They are stored in the global variables area of the program and this is how the value persists, this is how they can be accessed directly without a class object. The only difference is that when a variable in a function is static, its location is that of global variables, but its scope is confined only to the block where it is defined.

Hence the lifetime of static variables or functions is the lifetime of the file it is present in. It also means, destructors of static objects are called when the program ends and not when the function in which it is defined exits!

Reference: Thinking in C++ – Bruce Eckel

The dreaded SIGSEGV !

Segmentation Fault or SIGSEGV is a dreaded error of most C programmers. It mostly occurs due to invalid access of memory. Most of the times the invalid memory access is a mistake. However recently one of my friend asked me a question (who was asked the same in an interview) as to what happened if an arbitrary address is accessed?

The question was as follows:

#include <stdio.h>
int main(int argc, char *argv[])
{
    int *p;
    p = (int*)1;
    printf("%d  %d", p, *p);
    return 0;
}

Now I had never come across a scenario where you assign a value directly to a pointer variable and my first thought was, would this even compile? As a matter of fact it did, when I did compile with gcc. But on running the output file, I got the dreaded Segmentation Fault.

Now curious as to why, I stepped through the code using gdb and this is what I stumbled upon:

breakpoint 1, main () at test.c:6
6 p = (int*)1;
(gdb) p p
$1 = (int *) 0x0
(gdb) n
7 printf("%d %d", p, *p);
(gdb) p p
$2 = (int *) 0x1
(gdb) p *p
Cannot access memory at address 0x1
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x00000000004004ac in main () at test.c:7
7 printf("%d %d", p, *p);

Now let’s dissect what exactly is going on. Variable is a pointer variable, means it can hold an address. Since its an integer pointer, p can hold address of an integer variable. Now when we assign (int*)1 to p, what is happening is 1 is being assigned to p. But 1 is not being assigned directly, since 1 is prefixed with (int*) which is essentially casting 1 as an address of an integer variable. Hence p is being assigned an address 0x01 which is a valid instruction! Now in all probability, 0x01 is a valid address, but most definitely not a valid address for the program we are running. (We know the OS and other programs are already running, so we can be certain some other process is running at the first memory location, which is a valid address, but not valid for our program to access!) Hence on trying to print the value at this address, we encounter Segmentation Fault!

Data Alignment and Padding

Usually we use structures in C to combine data relevant to an object or a problem. If we intend to create many objects of this structure, we have an array of structures. But how are these structures stored in memory? Let’s take a simple structure defined as below

struct abc
{
    int i;
    char c;
};

Now assuming a 32 bit architecture, we know an integer occupies 4 bytes and a char occupies 1 byte. So the instantaneous answer as to what is the size of struct abc would be 5. But the actual size occupied by a variable of the above struct is 8. This is because of the concept of Padding.

http://www.geeksforgeeks.org/structure-member-alignment-padding-and-data-packing/

The above link provides an excellent reference of how the memory is organized in today’s computers. We can see that memory is usually divided into four banks, each of 1 byte width. It so happens that the CPU can access all the four banks of 1 byte each in a single read instruction cycle which is the case with modern CPU architectures. So reading 4 bytes of data needs a single instruction and not four. In a 64 bit architecture, the CPU is capable of reading 8 bytes in a single instruction!

So, we learn that the CPU architecture decides how many bytes can be read at a time and this is referred as word length, which is 4 bytes or 8 bytes depending on the architecture. Let’s stick to 32 bit architecture or 4 byte word length for the rest of the article.

Now it does not mean that the CPU is not capable of reading one byte at a time. Just that reading one byte and four bytes would require equal CPU processing. Hence if we have a char and an int in a structure, the compiler would pad 3 bytes after the char. This is called Data Alignment.

So as a general rule of thumb, the starting address of a variable would depend on its type. A char can start at any address. However, a short, int, long, float or double will start at even memory addresses. Further, short will start at addresses divisible by 2, int and float would start at addresses divisible by 4, long and double would begin at addresses divisible by 8. Confused? Have a look at the diagram in the above link to clear things up.

So let’s consider the following structure.

struct xyz
{
    double a;    
    int b;
    double c;
};

Now would the size of the above structure be 20 bytes (8+4+8) ? Nope! It would be 24 bytes! This is because apart from word alignment, the compiler follows a natural alignment for the structure. It means that the padding is with respect to the largest data member of the structure. Hence there would be a 4 byte padding after variable b in the above structure. But why is this required? Now let’s say we have an array of the above structure and for simplicity, the start of address is 1000. If it were to occupy 20 bytes, then second element of the address would begin at address 1021, which is not a multiple of 8. So padding 4 bytes after b will begin the second element of the array at 1024 which is a multiple of 8. But why, you ask! This is because the emphasis is on performance. Since the cost of physical memory has reduced over time, the compiler would rather waste 4 bytes and ensure the CPU can read the next variable of the array quickly than to save 4 bytes. So in the above case, if the padding has happened, assuming the memory bank model mentioned in the above link, the CPU has to read every 7th memory bank when the structure array is accessed by index, which means it has to skip an even number of memory rows, which is more efficient for the CPU!

References:

http://www.geeksforgeeks.org/structure-member-alignment-padding-and-data-packing/

http://www.catb.org/esr/structure-packing/