Saturday, May 16

64 bit Computing - CSE

What is 64-bit Computing?

The labels "16-bit," "32-bit" or "64-bit," when applied to a microprocessor, characterize the processor's data stream. Although you may have heard the term "64-bit code," this designates code that operates on 64-bit data. 

In more specific terms, the labels "64-bit," 32-bit," etc. designate the number of bits that each of the processor's general-purpose registers (GPRs) can hold. So when someone uses the term "64-bit processor," what they mean is "a processor with GPRs that store 64-bit numbers." And in the same vein, a "64-bit instruction" is an instruction that operates on 64-bit numbers.

64 bit Architectures

Let’s discuss 64 bit Architectures from the leaders of Processor Manufacturers – AMD & Intel (AMD’s Opteron & Intel’s Itanium). 

Intel 64-bit architecture (IA-64)

By using a technique called VLIW, the letters VLIW mean “Very Large Instruction Word”. Processors that use this technique access the memory by transferring long program words, and in each word many instructions are packed. In the case of the IA-64, three instructions are used for each pack of 128 bits. As each instruction has 41 bits, there are 5 bits left that will be used to indicate the kinds of instruction that were packed. Figure 1 shows the instruction packaging scheme. This packaging lessens the number of memory accesses, leaving to the compiler the task of grouping the instructions in order to get the best of the architecture. 

As it has already been said, the 5-bit field, named as “pointer”, serves to indicate the kinds of instructions that are packed. Those 5 bits offer 32 kinds of packaging possible that, in fact, are reduced to 24 kinds, since 8 are not used. Each instruction uses one of the CPU features, which are listed below, and that can be identified in Figure given below.

         Unit I - integer data
         Unit F - floating-point operations
         Unit M - memory access and
        Unit B - branch prediction. 

The architecture that Intel suggests to execute those instructions, that was called Itanium, is versatile and promises performance by means of the simultaneous (parallel) execution of up to 6 instructions. Figure shows the diagram in blocks of this architecture that uses a ‘pipeline’ of 10 stages.


The Itanium can load instructions and data onto the CPU before they're actually needed or even if they prove not to be needed, effectively using the processor itself as a cache. Presumably, this early loading is done when the processor is otherwise idle. The advantage gained by speculation limits the effects of memory latency by allowing loading of data before it is needed, thus making it ready to go the moment the processor can use it.                                                                                                                                       
       There are two kinds of speculation: data and control. With the speculation, the compiler advances an operation in a way that its latency (time spent) is removed from the critical way. The speculation is a form of allowing the compiler to avoid that slow operations spoil the parallelism of the instructions. Control speculation is the execution of an operation before the branch that precedes it. On the other hand, data speculation is the execution of a memory load before a storage operation (store) that precedes it and with which it can be related.

Rotating Registers

On top of the frames, there's register rotation, a feature that helps loop unrolling more than parameter passing. With rotation, Itanium can shift up to 96 of its general-purpose registers (the first 32 are still fixed and global) by one or more apparent positions. Why? So that iterative loops that hammer on the same register(s) time after time can all be dispatched and executed at once without stepping on each other. Each instance of the loop actually targets different physical registers, allowing them all to be in flight at once. 

If this sounds a lot like register renaming, it is. Itanium's register-rotation feature is less generic than all-purpose register renaming like Athlon's, so it's easier to implement and faster to execute. Chip-wide register renaming like Athlon's adds gobs of multiplexers, adders, and routing, one of the big drawbacks of a massively out-of-order machine. On a smaller scale, ARM used this trick with its ill-fated Piccolo DSP coprocessor. At the high end, Cydrome also used this technique, a favorite feature that Cydrome alumnus and Itanium team member Bob Rau apparently brought with him. 

So IA-64 has two levels of indirection for its own registers: the logical-to-virtual mapping of the frames and the virtual-to-physical mapping of the rotation. All this means that programs usually aren't accessing the physical registers they think they are, but that's nothing new to high-end microprocessors. Arcane as it seems, this method still uses less hardware trickery than the full register renaming of Athlon, Pentium III, or P4. 

AMD's 64-bit Platform 

To access an area in the computer's physical memory (RAM) to store or retrieve data, the processor needs the address of that location, which is an integer number representing one byte of memory storage. 

Suddenly, having 64-bit registers makes sense as, while a 32-bit processor can access up to 4.3 billion memory addresses (232) for a total of about 4GB of physical memory, a 64-bit processor could conceivably access over 18 petabytes of physical memory. This is the one area that clearly shows why 64-bit processors are the future of computing, as demanding applications such as databases have long been scraping on the 4GB memory ceiling.

If you are a business with a database of a terabyte or more of information, 64-bit processors look pretty good right now. Formerly known as X86-64, the AMD64 architecture is AMD's method of implementing 64-bit processors.


Home About-us Computer Science Electronics Mechanical Electrical IT Civil
Copyright © 2017 | All Rights Reserved. Design By Templateclue