- Shopping Bag ( 0 items )
Many textbooks exist which describe the principles of the finite element method of analysis and the wide scope of its applications to the solution of practical engineering problems. Usually, little attention is devoted to the construction of the computer programs by which the numerical results are actually produced. It is presumed that readers have access to pre-written programs (perhaps to rather complicated "packages") or can write their own. However, the gulf between understanding in principle what to do, and actually doing it, can still be large for those without years of experience in this field.
The present book bridges this gulf. Its intention is to help readers assemble their own computer programs to solve particular engineering problems by using a "building block" strategy specifically designed for computations via the finite element technique. At the heart of what will be described is not a "program" or a set of programs but rather a collection (library) of procedures or subroutines which perform certain functions analogous to the standard functions (SIN, SQRT, ABS, etc.) provided in permanent library form in all useful scientific computer languages.Because of the matrix structure of finite element formulations, most of the building block routines are concerned with manipulation of matrices.
The building blocks are then assembled in different patterns to make test programs for solving a variety of problems in engineering and science. The intention is that one of these test programs then serves as a platform from which new applications programs are developed by interested users.
The aim of the present book is to teach the reader to write intelligible programs and to use them. Both serial and parallel computing environments are addressed and the building block routines (numbering over 70) and all test programs (numbering over 50) have been verified on a wide range of computers. Efficiency is considered.
The chosen programming language is the latest dialect of FORTRAN, called Fortran 95. Later in this Chapter, a fairly full description of the features of Fortran 95 which influence the programming of the finite element method will be given. At present, all that need be said is that Fortran 95 represents a very radical improvement compared with the previous standard, FORTRAN 77 (which was used in earlier editions of this book), and that Fortran remains, overwhelmingly, the most popular language for writing large engineering and scientific programs. For parallel environments MPI has been used, although the programming strategy has been tested successfully in OpenMP as well.
In principle, any computing machine capable of compiling and running Fortran programs can execute the finite element analyses described in this book. In practice, hardware will range from personal computers for more modest analyses and teaching purposes to "super" computers, usually with parallel processing capabilities, for very large (especially nonlinear 3D) analyses. It is a powerful feature of the programming strategy proposed that the same software will run on all machine ranges. The special features of vector and parallel processors are described later (see Sections 1.4 and 1.5).
The user's choice of hardware is a matter of accessibility and of cost. Thus a job taking five minutes on one computer may take one hour on another. Which hardware is "better" clearly depends on individual circumstances. The main advice that can be tendered is against using hardware that is too weak for the task; that is the user is advised not to operate at the extremes of the hardware's capability. If this is done turn round times become too long to be of value in any design cycle. For example, in "virtual prototyping" implementations, execution time has currently to be of the order of 0.1 s to enable refresh graphics to be carried out.
1.3 Memory management
In the programs in this book it will be assumed that sufficient main random access memory (RAM) is available for the storage of data and the execution of programs. However, the arrays processed in finite element calculations might be of size, say, 100,000 by 1000. Thus a computer would need to have a main memory of [10.sup.8] words to hold this information, and while some such computers exist, they are still comparatively rare. A more typical memory size is still of the order of [10.sup.7] words.
One strategy to get round this problem is for the programmer to write "out-of-memory" routines which arrange for the processing of chunks of arrays in memory and the transfer of the appropriate chunks to and from back-up storage.
Alternatively store management is removed from the user's control and given to the system hardware and software. The programmer sees only a single level of memory of very large capacity and information is moved from secondary memory to main memory and out again by the supervisor or executive program which schedules the flow of work through the machine. This concept, namely of a very large "virtual" memory, was first introduced on the ICL ATLAS in 1961, and is now almost universal.
Clearly it is necessary for the system to be able to translate the virtual address of variables into a real address in memory. This translation usually involves a complicated bit-pattern matching called paging. The virtual store is split into segments or pages of fixed or variable size referenced by page tables, and the supervisor program tries to "learn" from the way in which the user accesses data in order to manage the store in a predictive way. However, memory management can never be totally removed from the user's control. It must always be assumed that the programmer is acting in a reasonably logical manner, accessing array elements in sequence (by rows or columns as organised by the compiler and the language). If the user accesses a virtual memory of [10.sup.8] words in a random fashion the paging requests will ensure that very little execution of the program can take place (see e.g. Willé, 1995).
In the immediate future, "large" finite element analyses, say involving more than 1 million unknowns, are likely to be processed by the vector and parallel processing hardware described in the next sections. When using such hardware there is usually a considerable time penalty if the programmer interrupts the flow of the computation to perform out-of-memory transfers or if automatic paging occurs. Therefore, in Chapter 3 of this book, special strategies are described whereby large analyses can still be processed "in-memory". However, as problem sizes increase, there is always the risk that main memory, or fast subsidiary memory ("cache") will be exceeded with consequent deterioration of performance on most machine architectures.
1.4 Vector processors
Early digital computers performed calculations "serially", that is, if a thousand operations were to be carried out, the second could not be initiated until the first had been completed, and so on. When operations are being carried out on arrays of numbers, however, it is perfectly possible to imagine that computations in which the result of an operation on two array elements has no effect on an operation on another two array elements, can be carried out simultaneously. The hardware feature by means of which this is realised in a computer is called a pipeline, and in general, all modern computers use this feature to a greater or lesser degree. Computers which consist of specialised hardware for pipelining are called vector computers. The "pipelines" are of limited length and so for operations to be carried out simultaneously it must be arranged that the relevant operands are actually in the pipeline at the right time. Furthermore, the condition that one operation does not depend on another must be respected. These two requirements (amongst others) mean that some care must be taken in writing programs so that best use is made of the vector processing capacity of many machines. It is moreover an interesting side effect that programs well structured for vector machines will tend to run better on any machine because information tends to be in the right place at the right time (e.g. in a special cache memory) and modern so-called scalar computers tend to contain some vector-type hardware. In this book, beginning at Chapter 5, programs which "vectorise" well will be illustrated.
True vector hardware tends to be expensive and at the time of writing a much more common way of increasing processing speed is to execute programs in parallel on many processors. The motivation here is that the individual processors are then "standard" and therefore cheap. However for really intensive computations, it is likely that an amalgamation of vector and parallel hardware is ideal.
1.5 Parallel processors
In this concept (of which there are many variants) there are several physically distinct processors (e.g. a few expensive ones or a lot of cheaper ones). Programs and/or data can reside on different processors which have to communicate with one another.
There are two foreseeable ways in which this communication can be organised (rather like memory management which was described earlier). Either the programmer takes control of the communication process, using a programming feature called message passing, or it is done automatically, without user control. The second strategy is of course appealing and has led to the development of "High Performance Fortran" or HPF (e.g. see Koelbel et al., 1995) which has been designed as an extension to Fortran 95. "Directives", which are treated as comments by non-HPF compilers, are inserted into the Fortran 95 programs and allow data to be mapped onto parallel processors together with the specification of the operations on such data which can be carried out in parallel. The attractive feature of this strategy is that programs are "portable", that is they can be easily transferred from computer to computer. One would also anticipate that manufacturers could produce compilers which made best use of their specific type of hardware. At the time of writing, the first implementations of HPF are just being reported.
An alternative to HPF, involving roughly the same level of user intervention, can be used on specific hardware. Manufacturers provide "directives" which can be inserted by users in programs and implemented by the compiler to parallelise sections of the code (usually associated with DO-loops). Smith (2000) shows that this approach can be quite effective for up to a modest number of parallel processors (say 10). However such programs are not portable to other machines.
A further alternative is to use OpenMP, a portable set of directives but limited to a class of parallel machines with so-called "shared memory". Although the codes in this book have been rather successfully adapted for parallel processing using OpenMP (Pettipher and Smith, 1997) the most popular strategy applicable equally to "shared memory" and "distributed memory" systems is described in Chapter 12. The programs therein have been run successfully on clusters of PCs communicating via Ethernet and on shared and distributed memory supercomputers with their much more expensive communication systems. This strategy of message passing under programmer control is realised by MPI ("message passing interface") which is a de facto standard thereby ensuring portability (MPI Web reference, 2003).
1.6 BLAS libraries
As was mentioned earlier, programs implementing the Finite Element Method make intensive use of matrix or array structures. For example a study of any of the programs in the succeeding chapters will reveal repeated use of the subroutine MATMUL described in Section 1.9. While one might hope that the writers of compilers would implement calls to MATMUL efficiently, this turns out in practice not always to be so.
Particularly on supercomputers, an alternative is to use "BLAS" or Basic Linear Algebra Subroutine Libraries (e.g. Dongarra and Walker, 1995). There are three "levels" of BLAS subroutines involving vector-vector, matrix-vector and matrix-matrix operations respectively. To improve efficiency in large calculations, it is always worth experimenting with BLAS routines if available. The calling sequence is rather cumbersome, for example the Fortran:
has to be replaced by:
CALL DGEMV('n', ntot, ntot, 1.0, km, ntot, pmul, 1, 0.0, utemp, 1)
in a typical example in Chapter 12. However, very significant gains in processing speed can be achieved; a factor of 3 times speedup is not uncommon.
1.7 MPI libraries
MPI (MPI Web reference, 2003) is itself essentially a library of routines for communication callable from Fortran. For example,
CALL MPI_BCAST(no_f, fixed_freedoms, MPI_INTEGER, npes-1, MPI_COMM_WORLD, ier)
"broadcasts" the array no_f of size fixed_freedoms to the remaining npes-1 processors on a parallel system. In the parallel programs in this book (Chapter 12) these MPI routines are mainly hidden from the user and contained within routines collected in library modules such as gather_scatter. In this way, the parallel programs can be seen to be readily derived from their serial counterparts. The detail of the new MPI library is left to Chapter 12.
1.8 Applications software
Since all computers have different hardware (instruction formats, vector capability, etc.) and different store management strategies, programs which would make the most effective use of these varying facilities would of course differ in structure from machine to machine. However, for excellent reasons of program portability and programmer training, engineering computations on all machines are usually programmed in "high level" languages which are intended to be machine-independent. The high level language is translated into the machine order code by a program called a compiler. Fortran is by far the most widely used language for programming engineering and scientific calculations and in this section the principal features of the latest standard, called Fortran 95, will be described with particular reference to features of the language which are useful in finite element computations.
Figure 1.1 shows a typical simple program written in Fortran 95 (Smith, 1995). It concerns an opinion poll survey and serves to illustrate the basic structure of the language for those used to its predecessor, FORTRAN 77, or to other languages.
It can be seen that programs are written in "free source" form. That is, statements can be arranged on the page or screen at the user's discretion. Other features to note are:
Upper and lower case characters may be mixed at will. In the present book, upper case is used to signify intrinsic routines and "key words" of Fortran 95.
Excerpted from Programming the Finite Element Method by Ian M. Smith D. Vaughan Griffiths Copyright © 2004 by John Wiley & Sons, Ltd . Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
|1||Preliminaries : computer strategies||1|
|2||Spatial discretisation by finite elements||21|
|3||Programming finite element computations||55|
|4||Static equilibrium of structures||109|
|5||Static equilibrium of linear elastic solids||165|
|7||Steady state flow||319|
|8||Transient problems : first order (uncoupled)||357|
|12||Parallel processing of finite element analyses||509|
|A||Equivalent nodal loads||577|
|B||Shape functions and element node numbering||583|
|C||Plastic stress-strain matrices and plastic potential derivatives||591|
|D||Main library subroutines||595|
|E||Geom library subroutines||605|
|F||Parallel library subroutines||609|