Abstract
This paper presents a study of the impact of reducing the vector register length in an out-of-order vector architecture. In traditional in-order vector architectures, long vector registers have typically been the norm. We start presenting data that shows that, even for highly vectorizable codes, only a small fraction of all elements of a long vector register are actually used. We also show that reducing the register size in a traditional vector architecture in an attempt to reduce hardware cost and maximize register utilization results in a severe performance degradation. However, when we combine out-of-order execution and short registers, our simulations show that the performance penalty can be made very small. Moreover, this new architecture tolerates memory latency much better than a traditional machine and uses the storage space in each register more efficiently. We present results for a selection of the Specfp92 and Perfect Club codes that show speedups of the out-of-order machine over the traditional machine anywhere in the range 1.1 to 1.6. Halving the register size (from 16Kb in the out-of-order machine down to 8Kb) yields speedups around 1.3 and as high as 1.6. Even when reducing the register length to 1/4 the original size, speedups are still around 1.2 and when going to a register length of 16 elements (1/8 the original) most programs perform very close to the traditional in-order vector machine.
Original language | English |
---|---|
Pages | 37-44 |
Number of pages | 8 |
State | Published - 1998 |
Externally published | Yes |
Event | Proceedings of the 1998 International Conference on Supercomputing - Melbourne, Aust Duration: 13 Jul 1998 → 17 Jul 1998 |
Conference
Conference | Proceedings of the 1998 International Conference on Supercomputing |
---|---|
City | Melbourne, Aust |
Period | 13/07/98 → 17/07/98 |