IA-64 and Elementary Functions : Speed and Precision

IA-64 and Elementary Functions : Speed and Precision

by Peter Markstein

Paperback

$64.95

Product Details

ISBN-13: 9780130183484
Publisher: Pearson Education
Publication date: 05/01/2000
Series: Hewlett-Packard Professional Books Series
Pages: 298
Product dimensions: 7.22(w) x 9.51(h) x 1.05(d)

Read an Excerpt

PREFACE:

Preface

This book puts under one cover the details of an elementary functionlibrary, covering the underlying mathematics as well as providingimplementation details, directed toward IA-64 architecture. Some of thematerial is difficult to find elsewhere, and some of it is scattered over avariety of conference proceedings and journals. The material should appealto readers with interest in elementary functions, as well as readersinterested in using IA-64 effectively. Part I discusses IA-64 architecturein detail, including motivation for the architecture. The description ofIA-64 is illustrated with extended examples chosen from numericalcalculation. Part II shows how to exploit IA-64 architecture in the domainof elementary functions. While the text emphasizes accurate computation, italso points to shortcuts in division and square root that may be of interestin graphics and other applications which heavily use short floating pointtypes. Most of the mathematical arguments are relatively elementary andshould be readable by anyone with an elementary calculus background.

This work is an outgrowth of the Precision Architecture Wide Word (PAWW) project at Hewlett-Packard Laboratories. Thearchitecture drew from prior experiences with very long instruction setarchitectures, particularly those at Cydrome and Multiflow, as well asPA-RISC (Precision Architecture - Reduced InstructionSet Computer). By the time I joined the project in 1992, much of thearchitecture had already been solidified. My architectural contributionsmainly dealt with floating point arithmetic, and I was also active inproducing a prototype compiler for Wide Word, which allowed many ofthearchitectural ideas to be tested. PAWW later developed into IA-64.

One of my colleagues, Clemens Roothaan, had produced a library of elementaryfunction routines which exploited the software pipelining capabilities ofthe architecture. He was able to demonstrate routines which ran at speedsassociated with vector processors, but which did not sacrifice numericalaccuracy for performance. Over time, some of these algorithms werestrengthened to run faster, or produce even higher precision. We refer tothe software pipelined implementation as the vector library for theelementary functions.

My plan was to use the same algorithmic ideas to construct a very robustscalar elementary function library. My hope was that the fundamentalalgorithms could be implemented in the C language in such a manner that theywould yield closed subroutines, but would also be amenable to in-lining,after which they could then be software pipelined by the compiler.Eventually, Roothaan's handcrafted functions could be matched by thecompiler, which could also customize an elementary function to theparticular settings where it was invoked. This notion led to the in-lineassembly capability, which enables much finer control to be exercised overfloating point computation than is normally present in a compiler.

Eventually, I undertook to document these algorithms, indicating clearly themethods we had used, as well as the error characteristics of our algorithms.A fascinating by-product developed almost immediately: the act of writingclarified some of the fundamental processes that we were employing. New,faster algorithms were suggested by the text, and they replaced some of ourold techniques. This was especially true in the operations of division andsquare root, for which almost none of our 1992 algorithms survive in thistext. Logarithm also was markedly improved, and, as a by-product, theprecision of the power routine was improved. The trigonometric routines wereenhanced with "accurate A " argument reduction, and an improvedimplementation of the A and A addition formulas which preserveadditional precision.

This book describes a work in progress. Even now, new algorithms have cometo my attention from colleagues at Intel, and, as the greater programmingcommunity comes to use IA-64, I expect new, innovative developments toblossom.

Peter Markstein
February 2000
Woodside, CA

Table of Contents

Foreword xv
Preface xvii
Introduction 1(6)
I IA-64 Architecture 7(78)
New Architecture Objectives
9(8)
VLIW
10(2)
Memory Enhancements
12(1)
Data Access
12(1)
Instruction Stream Access
13(1)
Software Pipelining
13(1)
Floating Point Enhancements
14(1)
Summary
15(2)
IA-64 Instructions and Registers
17(10)
Instructions
18(1)
Register Sets
19(4)
General Purpose Registers
19(2)
Floating Point Registers
21(2)
Predicate Registers
23(1)
Branch Registers
23(1)
Accessing Memory
23(1)
Assembly Language
24(1)
Problems
25(2)
Increasing Instruction Level Parallelism
27(14)
Branching
28(9)
Predication
28(2)
Branch Instructions
30(4)
Software Pipelining
34(3)
Speculation
37(3)
Control Speculation
38(1)
Data Speculation
39(1)
Problems
40(1)
Floating Point Architecture
41(14)
Floating Point Status Register
41(3)
Precision
44(1)
Fused Multiply-Add
45(1)
Division and Square Root Assists
46(1)
Floating Comparisons
47(1)
Communication between Floating Point and General Purpose Registers
48(2)
Fixed Point Multiplication
50(1)
SIMD Arithmetic
50(2)
Problems
52(3)
Programming For IA-64
55(30)
Compiler Options
55(1)
Pragmas
56(2)
Floating Point Data Types
58(1)
In-Line Assembly
59(2)
The fenv.h Header
61(1)
Extended Examples
61(19)
In-Lining a Library Function
62(8)
Unrolling a Loop
70(6)
Software Pipelined while-Loop
76(4)
Quad Precision
80(3)
Basic Operations
81(1)
Polynomial Evaluation
82(1)
Problems
83(2)
II Computation of Elementary Functions 85(166)
Mathematical Preliminaries
87(8)
Floating Point
88(1)
Approximation and Error Analysis
89(2)
The Exclusion Theorem
91(1)
Ulps
92(1)
Problems
93(2)
Approximations of Functions
95(16)
Taylor Series
95(2)
Lagrangian Interpolation
97(1)
Chebychev Approximation
98(4)
Lanczos' Method
100(2)
Remez Approximation
102(1)
Practical Considerations
103(3)
Function Evaluation
106(1)
Table Construction
107(1)
Problems
108(3)
Division
111(24)
Approximations for the Reciprocal
112(2)
Computing the Quotient
114(4)
Single Precision Division
115(2)
Double Precision Division
117(1)
Division Using Only Final Precision Results
118(7)
Single Precision Division
122(1)
Double-Extended Precision Division
123(1)
Quad Precision Division
123(2)
Fast Variants of Division
125(3)
Round-to-Nearest Algorithms
127(1)
Remainder
128(1)
Integer Division
129(2)
32-Bit Division
129(1)
64-Bit Division
130(1)
An Implementation of Division
131(2)
Problems
133(2)
Square Root
135(16)
Approximations
135(4)
Square Root Approximation
136(2)
Reciprocal Square Root Approximation
138(1)
Rounding the Square Root
139(2)
Computing the Square Root
141(5)
Single Precision Square Root
141(3)
Double Precision Square Root
144(1)
Double-Extended Precision Square Root
145(1)
Quad Precision Square Root
146(1)
Calculating the Reciprocal Square Root
146(2)
An Implementation of Square Root
148(1)
Problems
149(2)
Exponential Functions
151(12)
Definitions and Formulas
151(1)
Argument Reduction
152(1)
Error Containment
153(3)
Quad Precision
156(1)
Computing the Exponential
156(4)
The Function expm1
160(1)
Problems
160(3)
Logarithmic Functions
163(14)
General Relations
163(1)
Argument Reduction
164(1)
Error Analysis
165(6)
Comparison of Methods
169(2)
The Function log1p
171(1)
Computing the Logarithm
171(5)
Problems
176(1)
The Power Function
177(12)
Definition
177(1)
Single Precision
178(1)
Double Precision
178(2)
Double-Extended Precision
180(1)
Quad Precision
180(1)
Computing the Power Function
181(6)
Problems
187(2)
Trigonometric Functions
189(16)
Formulas and Identities
189(1)
Argument Reduction
190(5)
Radian Arguments
191(3)
Degree Arguments
194(1)
Error Analysis
195(3)
Computing the Trigonometric Functions
198(5)
Problems
203(2)
Inverse Sine and Cosine
205(10)
Definitions and Formulas
205(2)
Argument Reduction
207(2)
Error Analysis
209(2)
Computing the arcsin
211(3)
Problems
214(1)
Inverse Tangent Functions
215(10)
Definitions and Formulas
215(1)
Argument Reduction
216(3)
arctan
216(2)
arctan 2
218(1)
Error Analysis
219(2)
Computing the arctan
221(2)
Problems
223(2)
Hyperbolic Functions
225(10)
Definitions and Formulas
225(2)
Argument Reduction
227(1)
Error Analysis
228(2)
Computing the Hyperbolic Functions
230(3)
Problems
233(2)
Inverse Hyperbolic Functions
235(6)
Definitions and Formulas
235(1)
arcsinh
236(3)
arccosh
239(1)
arctanh
239(1)
Problems
240(1)
Odds and Ends
241(10)
Correctly Rounded Functions
241(3)
A Probabilistic Argument
242(1)
A Strategy for Correct Rounding
243(1)
Monotonicity
244(1)
Alternative Algorithms
245(1)
Testing
246(1)
New Architectural Directions
247(2)
Elementary Function Assists
247(1)
Load Immediate---Floating Point
247(1)
Quad Precision Assists
248(1)
Problems
249(2)
In-Line Assembly 251(6)
Solutions to Problems 257(36)
Bibliography 293(2)
Subject Index 295

Preface

PREFACE:

Preface

This book puts under one cover the details of an elementary functionlibrary, covering the underlying mathematics as well as providingimplementation details, directed toward IA-64 architecture. Some of thematerial is difficult to find elsewhere, and some of it is scattered over avariety of conference proceedings and journals. The material should appealto readers with interest in elementary functions, as well as readersinterested in using IA-64 effectively. Part I discusses IA-64 architecturein detail, including motivation for the architecture. The description ofIA-64 is illustrated with extended examples chosen from numericalcalculation. Part II shows how to exploit IA-64 architecture in the domainof elementary functions. While the text emphasizes accurate computation, italso points to shortcuts in division and square root that may be of interestin graphics and other applications which heavily use short floating pointtypes. Most of the mathematical arguments are relatively elementary andshould be readable by anyone with an elementary calculus background.

This work is an outgrowth of the Precision Architecture Wide Word (PAWW) project at Hewlett-Packard Laboratories. Thearchitecture drew from prior experiences with very long instruction setarchitectures, particularly those at Cydrome and Multiflow, as well asPA-RISC (Precision Architecture - Reduced InstructionSet Computer). By the time I joined the project in 1992, much of thearchitecture had already been solidified. My architectural contributionsmainly dealt with floating point arithmetic, and I was also active inproducing a prototype compiler for Wide Word, which allowed manyofthearchitectural ideas to be tested. PAWW later developed into IA-64.

One of my colleagues, Clemens Roothaan, had produced a library of elementaryfunction routines which exploited the software pipelining capabilities ofthe architecture. He was able to demonstrate routines which ran at speedsassociated with vector processors, but which did not sacrifice numericalaccuracy for performance. Over time, some of these algorithms werestrengthened to run faster, or produce even higher precision. We refer tothe software pipelined implementation as the vector library for theelementary functions.

My plan was to use the same algorithmic ideas to construct a very robustscalar elementary function library. My hope was that the fundamentalalgorithms could be implemented in the C language in such a manner that theywould yield closed subroutines, but would also be amenable to in-lining,after which they could then be software pipelined by the compiler.Eventually, Roothaan's handcrafted functions could be matched by thecompiler, which could also customize an elementary function to theparticular settings where it was invoked. This notion led to the in-lineassembly capability, which enables much finer control to be exercised overfloating point computation than is normally present in a compiler.

Eventually, I undertook to document these algorithms, indicating clearly themethods we had used, as well as the error characteristics of our algorithms.A fascinating by-product developed almost immediately: the act of writingclarified some of the fundamental processes that we were employing. New,faster algorithms were suggested by the text, and they replaced some of ourold techniques. This was especially true in the operations of division andsquare root, for which almost none of our 1992 algorithms survive in thistext. Logarithm also was markedly improved, and, as a by-product, theprecision of the power routine was improved. The trigonometric routines wereenhanced with "accurate A " argument reduction, and an improvedimplementation of the A and A addition formulas which preserveadditional precision.

This book describes a work in progress. Even now, new algorithms have cometo my attention from colleagues at Intel, and, as the greater programmingcommunity comes to use IA-64, I expect new, innovative developments toblossom.

Peter Markstein
February 2000
Woodside, CA

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews