**Uh-oh, it looks like your Internet Explorer is out of date.**

For a better shopping experience, please upgrade now.

Reviews in Computational Chemistry available in Hardcover

- ISBN-10:
- 156081957X
- ISBN-13:
- 9781560819578
- Pub. Date:
- 11/01/1996
- Publisher:
- Wiley, John & Sons, Incorporated

# Reviews in Computational Chemistry

###### ADVERTISEMENT

## Overview

This volume, like those prior to it, features chapters by experts in various fields of computational chemistry. Volume 23 covers linear scaling methods for quantum chemistry variational transition state theory, coarse grain modeling of polymers, support vector machines, conical intersections, analysis of information content using shannon entropy and historical insights into how computing evolved in the pharmaceutical industry.

## Product Details

ISBN-13: | 9781560819578 |
---|---|

Publisher: | Wiley, John & Sons, Incorporated |

Publication date: | 11/01/1996 |

Product dimensions: | 6.36(w) x 9.28(h) x 1.06(d) |

## About the Author

Kenny B. Lipkowitz and Raima Larter are Professors of Chemistry at Indiana University - Purdue University at Indianapolis. Tom Cundari is Professor of Chemistry at the University of Memphis.

## Read an Excerpt

Reviews in Computational Chemistry, Volume 19

John Wiley & Sons

John Wiley & Sons

Copyright © 2003

Copyright © 2003

Kenny B. Lipkowitz, Raima Larter, Thomas R. Cundari

All right reserved.

Kenny B. Lipkowitz, Raima Larter, Thomas R. Cundari

All right reserved.

ISBN: 0-471-23585-7

ISBN: 0-471-23585-7

Chapter One

**INTRODUCTION**

This chapter is written for the reader who would like to learn how

Monte Carlo methods are used to calculate thermodynamic properties of systems

at the atomic level, or to determine which advanced Monte Carlo methods

might work best in their particular application. There are a number of

excellent books and review articles on Monte Carlo methods, which are generally

focused on condensed phases, biomolecules or electronic structure theory.

The purpose of this chapter is to explain and illustrate some of the

special techniques that we and our colleagues have found to be particularly

well suited for simulations of nanodimensional atomic and molecular clusters.

We want to help scientists and engineers who are doing their first work in this

area to get off on the right foot, and also provide a pedagogical chapter for

those who are doing experimental work. By including examples of simulations

of some simple, yet representative systems, we provide the reader with some

data for direct comparison when writing their own code from scratch.

Although a number of Monte Carlo methods in current use will be

reviewed, this chapter is not meant to be comprehensive in scope.Monte Carlo

is a remarkably flexible class of numerical methods. So many versions of the

basic algorithms have arisen that we believe a comprehensive review would be

of limited pedagogical value. Instead, we intend to provide our readers with

enough information and background to allow them to navigate successfully

through the many different Monte Carlo techniques in the literature. This

should help our readers use existing Monte Carlo codes knowledgably, adapt

existing codes to their own purposes, or even write their own programs. We

also provide a few general recommendations and guidelines for those who are

just getting started with Monte Carlo methods in teaching or in research.

This chapter has been written with the goal of describing methods that

are generally useful. However, many of our discussions focus on applications

to atomic and molecular clusters (nanodimensional aggregates of a finite number

of atoms and/or molecules). We do this for two reasons:

1. A great deal of our own research has focused on such systems,

particularly the phase transitions and other structural transformations induced by

changes in a cluster's temperature and size, keeping an eye on how various

properties approach their bulk limits. The precise determination of thermodynamic

properties (such as the heat capacity) of a cluster type as a function of

temperature and size presents challenges that must be addressed when using

Monte Carlo methods to study virtually any system. For example, analogous

structural transitions can also occur in phenomena as disparate as the denaturation

of proteins. The modeling of these transitions presents similar

computational challenges to those encountered in cluster studies.

2. Although cluster systems can present some unique challenges, their

study is unencumbered by many of the technical issues regarding periodic

boundary conditions that arise when solids, liquids, surface adsorbates, and

solvated biomolecules and polymers are studied. These issues are addressed

well elsewhere, and can be thoroughly appreciated and mastered once

a general background in Monte Carlo methods is obtained from this chapter.

It should be noted that "Monte Carlo" is a term used in many fields of

science, engineering, statistics, and mathematics to mean entirely different

things. The one (and only) thing that all Monte Carlo methods have in common

is that they all use random numbers to help calculate something. What we

mean by "Monte Carlo" in this chapter is the use of random-walk processes

to draw samples from a desired probability function, thereby allowing one to

calculate integrals of the form [integral] dqf(q) [rho](q). The quantity [rho](q) is a normalized

probability density function that spans the space of a many-dimensional

variable q, and f(q) is a function whose average is of thermodynamic importance

and interest. This integral, as well as all other integrals in this chapter,

should be understood to be a definite integral that spans the entire domain of

q. Finally, we note that the inclusion of quantum effects through path-integral

Monte Carlo methods is not discussed in this chapter. The reader interested in

including quantum effects in Monte Carlo thermodynamic calculations is

referred elsewhere.

**METROPOLIS MONTE CARLO**

Monte Carlo simulations are widely used in the fields of chemistry, biology,

physics, and engineering in order to determine the structural and thermodynamic

properties of complex systems at the atomic level. Thermodynamic

averages of molecular properties can be determined from Monte Carlo methods,

as can minimum-energy structures. Let *(f)* represent the average value

of some coordinate-dependent property *f(x)*, with *x* representing the 3*N*

Cartesian coordinates needed to locate all of the N atoms. In the canonical

ensemble (fixed *N*, *V* and *T*, with *V* the volume and *T* the absolute temperature),

averages of molecular properties are given by an average of *f(x)* over the

Boltzmann distribution

*(f)* = [integral] *dxf(x)*exp[-*[beta]U(x)*]/[integral]*dx* exp[-*[beta]U(x)*] [1]

where *U(x)* is the potential energy of the system, *[beta] = 1/[k.sub.B]T*, and *[k.sub.B]* is the

Boltzmann constant. If one can compute the thermodynamic average of*f(x)* it is then possible to calculate various thermodynamic properties. In

the canonical ensemble it is most common to calculate *E*, the internal energy,

and *[C.sub.v]* , the constant-volume heat capacity (although other properties can be

calculated as well). For example, if we average *U(x)* over all possible configurations

according to Eq. [1], then *E* and *[C.sub.v]* are given by

*E = 3N[k.sub.B]T*/2 + *(U)* [2]

*[C.sub.v]* = 3*N[k.sub.B]*/2 + *([U.sup.2]) - [(U).sup.2]/([k.sub.B][T.sup.2])* [3]

The first term in each equation represents the contribution of kinetic energy,

which is analytically integrable. In the harmonic (low-temperature) limit, *E*

given by Eq. [2] will be a linear function of temperature and *[C.sub.v]* from Eq. [3]

will be constant, in accordance with the Equipartition Theorem. For a small

cluster of, say, 6 atoms, the integrals implicit in the calculation of Eqs. [1]

and [2] are already of such high dimension that they cannot be effectively computed

using Simpson's rule or other basic quadrature methods. For

larger clusters, liquids, polymers or biological molecules the dimensionality

is obviously much higher, and one typically resorts to either Monte Carlo,

molecular dynamics, or other related algorithms.

To calculate the desired thermodynamic averages, it is necessary to have

some method available for computation of the potential energy, either explicitly

(in the form of a function representing the interaction potential as in

molecular mechanics) or implicitly (in the form of direct quantum-mechanical

calculations). Throughout this chapter we shall assume that *U* is known or can

be computed as needed, although this computation is typically the most computationally

expensive part of the procedure (because *U* may need to be computed

many, many times). For this reason, all possible measures should be

taken to assure the maximum efficiency of the method used in the computation

of *U*.

Also, it should be noted that constraining potentials (which keep the

cluster components from straying too far from a cluster's center of mass)

are sometimes used. At finite temperature, clusters have finite vapor pressures,

and particular cluster sizes are typically unstable to evaporation. Introducing

a constraining potential enables one to define clusters of desired sizes.

Because the constraining potential is artificial, the dependence of calculated

thermodynamic properties on the form and the radius of the constraining

potential must be investigated on a case-by-case basis. Rather than diverting

the discussion from our main focus (Monte Carlo methods), we refer the

interested reader elsewhere for more details and references on the use of con-straining

potentials.

**Random-Number Generation: A Few Notes**

Because generalized Metropolis Monte Carlo methods are based on

"random" sampling from probability distribution functions, it is necessary

to use a high-quality random-number generator algorithm to obtain reliable

results. A review of such methods is beyond the scope of this chapter,

but a few general considerations merit discussion.

Random-number generators do not actually produce random numbers.

Rather, they use an integer "seed" to initialize a particular "pseudorandom"

sequence of real numbers that, taken as a group, have properties that leave

them nearly indistinguishable from truly random numbers. These are conventionally

floating-point numbers, distributed uniformly on the interval (0,1). If

there is a correlation between seeds, a correlation may be introduced between

the pseudorandom numbers produced by a particular generator. Thus, the

generator should ideally be initialized only once (at the beginning of the random

walk), and not re-initialized during the course of the walk. The seed

should be supplied either by the user or generated arbitrarily by the program

4 Strategies for Monte Carlo Thermodynamic Calculations

using, say, the number of seconds since midnight (or some other arcane formula).

One should be cautious about using the "built-in" random-number

generator functions that come with a compiler for Monte Carlo integration

work because some of them are known to be of very poor quality. The

reader should always be sure to consult the appropriate literature and obtain

(and test) a high-quality random-number generator before attempting to write

and debug a Monte Carlo program.

**The Generalized Metropolis Monte Carlo Algorithm**

The Metropolis Monte Carlo (MMC) algorithm is the single most widely

used method for computing thermodynamic averages. It was originally developed

by Metropolis et al. and used by them to simulate the freezing transition

for a two-dimensional hard-sphere fluid. However, Monte Carlo methods can

be used to estimate the values of multidimensional integrals in whatever context

they may arise. Although Metropolis et al. did not present their algorithm

as a general-utility method for numerical integration, it soon became

apparent that it could be generalized and applied to a variety of situations.

The core of the MMC algorithm is the way in which it draws samples from

a desired probability distribution function. The basic strategies used in

MMC can be generalized so as to apply to many kinds of probability functions

and in combination with many kinds of sampling strategies. Some authors

refer to the generalized MMC algorithm simply as "Metropolis sampling,"

while others have referred to it as the M(RT) method in honor of the five

authors of the original paper (Metropolis, the Rosenbluths, and the Tellers).

We choose to call this the generalized Metropolis Monte Carlo (gMMC) method,

and we will always use the term MMC to refer strictly to the combination

of methods originally presented by Metropolis et al.

In the literature of numerical analysis, gMMC is classified as an importance

sampling technique. Importance sampling methods generate configurations

that are distributed according to a desired probability function rather

than simply picking them at random from a uniform distribution. The probability

function is chosen so as to obtain improved convergence of the properties

of interest. gMMC is a special type of importance sampling method which

asymptotically (i.e., in the limit that the number of configurations becomes

large) generates states of a system according to the desired probability distribution.

This probability function is usually (but not always) the actual

probability distribution function for the physical system of interest. Nearly

all statistical-mechanical applications of Monte Carlo techniques require the

use of importance sampling, whether gMMC or another method is used (alternatively,

"stratified sampling" is sometimes an effective approach).

gMMC is certainly the most widely used importance sampling method.

In the gMMC algorithm successive configurations of the system are

generated to build up a special kind of random walk called a *Markovchain*. The random walk visits successive configurations, where each

configuration's location depends on the configuration immediately preceding

it in the chain. The gMMC algorithm establishes how this can be done so as

to asymptotically generate a distribution of configurations corresponding to

the probability density function of interest, which we denote as [rho]

*(q)*.

We define *K([q.sub.j] -> [q.sub.j)* to be the conditional probability that a

configuration at *[q.sub.i]* will be brought to *[q.sub.j]* in the next step of the random walk. This conditional

probability is sometimes called the "transition rate." The probability

of moving from *q* to *[q']* (where *q* and *[q']* are arbitrarily chosen configurations

somewhere in the available domain) is therefore given by *P(q -> [q')*:

*P(q -> [q') = K(q -> [q')[rho](q)* [4]

For the system to evolve toward a unique limiting distribution, we must place

a constraint on *P(q -> [q'])*. The gMMC algorithm achieves the desired limiting

behavior by requiring that, on the average, a point is just as likely to move

from *q* to *[q']* as it is to move in the reverse direction, namely, that*P(q -> [q') = P([q'] -> q)*. This likelihood can be achieved only if the walk is

ergodic (an ergodic walk eventually visits all configurations when started

from any given configuration) and if it is aperiodic (a situation in which no

single number of steps will generate a return to the initial configuration).

This latter requirement is known as the "detailed balance" or the "microscopic

reversibility" condition:

*K(q -> [q'])[rho](q) = K([q'] -> q)[rho]([q'])* [5]

Satisfying the detailed balance condition ensures that the configurations

generated by the gMMC algorithm will asymptotically be distributed according

to *[rho](q)*.

The transition rate may be written as a product of a trial probability

and an acceptance probability *A*

K([q.sub.i] -> [q.sub.j]) = [PI]([q.sub.i] -> [q.sub.j])A([q.sub.i] -> [q.sub.j]) [6]

where [PI] can be taken to be any normalized distribution that asymptotically

spans the space of all possible configurations, and *A* is constructed so that

Eq. [5] is satisfied for a particular choice of [PI]. *Continues...*

Excerpted fromReviews in Computational Chemistry, Volume 19

Copyright © 2003 by Kenny B. Lipkowitz, Raima Larter, Thomas R. Cundari.

Excerpted by permission.

All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.

Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

## Table of Contents

Protein Structure Classification Patrice Koehl 1

Introduction 1

Classification and Biology 2

The Biomolecular Revolution 2

Basic Principles of Protein Structure 4

Visualization 4

Protein Building Blocks 5

Protein Structure Hierarchy 5

Three Types of Proteins 7

Geometry of Globular Proteins 9

Protein Domains 12

Resources on Protein Structures 13

Protein Structure Comparison 14

Automatic Identification of Protein Structural Domains 14

The Rigid-Body Transformation Problem 16

Protein Structure Superposition 23

cRMS: An Ambiguous Measure of Similarity 31

Differential Geometry and Protein Structure Comparison 33

Upcoming Challenges for Protein Structure Comparison 36

Protein Structure Classification 37

The Structure Classification of Proteins (SCOP) 40

The CATH Classification 42

The DALI Domain Dictionary (DDD) 43

Comparing SCOP, CATH, and DDD 43

Conclusions 45

Acknowledgments 46

Appendix 47

References 48

Comparative Protein Modeling Emilio Xavier Esposito Dror Tobi Jeffry D. Madura 57

Introduction 57

Anatomy of a Comparative Model 60

Searching for Related Sequences and Structures 61

Expert Protein Analysis System (ExPASy) 62

Blast and PSI-Blast 65

Protein Data Bank (PDB) 68

Sequence Alignment and Modeling System with Hidden Markov Models 70

Threading 73

Threader 78

Example: Finding Related Sequences and 3-D Structures 80

Sequence Alignment 84

Preparing the Sequences 87

Alignment Basics 90

Similarity Matrices 91

Clustal 95

Tree-Based Consistency Objective Function for Alignment Evaluation (T-Coffee) 99

Divide-and-Conquer Alignment (DCA) 100

Example: Aligning Sequences 101

Selecting Templates and Improving Alignments 104

Selecting Templates 104

Improving Sequence Alignments With Primary and Secondary Structure Analysis 107

Example: Aligning the Target to the Selected Template 111

Constructing Protein Models 111

Satisfaction of Spatial Restraints 113

Segment Match Modeling 115

Multiple Template Method 118

3D-Jigsaw 119

Overall Protein Model Construction Methods 121

Example: Constructing a Protein Model 122

Refinement of Protein Models 124

Side-Chains with Rotamer Library (SCWRL) 125

Energy Minimization 132

Molecular Dynamics 133

Molecular Dynamics with Simulated Annealing 135

Evaluating Protein Models 138

Procheck 138

Verify3D 140

Errat 141

Protein Structure Analysis (ProSa) 142

Protein Volume Evaluation (PROVE) 144

Model Clustering Analysis 146

Example: Evaluation of Protein Models 148

Conclusions 154

References 155

Simulations of Protein Folding Joan-Emma Shea Miriam R. Friedel Andrij Baumketner 169

Introduction 169

Theoretical Framework 172

Energy Landscape Theory 172

Thermodynamics and Kinetics of Folding: Two-State and Multistate Folders 175

Protein Models 179

Introduction and General Simulation Techniques 179

Coarse-Grained Protein Models 181

Fully Atomic Simulations 190

Advanced Topics: The Transition State Ensemble for Folding 201

Transition State and Two-State Kinetics 202

Methods for Identifying the TSE 204

Conclusions and Future Directions 219

Acknowledgments 219

References 220

The Simulation of Ionic Charge Transport in Biological Ion Channels: An Introduction to Numerical Methods Marco Saraniti Shela Aboud Robert Eisenberg 229

Introduction 229

System Components 231

Time and Space Scale 241

Experiments 242

Electrostatics 243

Long-Range Interaction 245

Short-Range Interaction 258

Boundary Conditions 261

Particle-Based Simulation 263

Implicit Solvent: Brownian Dynamics 264

Explicit Solvent: Molecular Dynamics 267

Flux-Based Simulation 273

Nernst-Planck Equation 274

The Poisson-Nernst-Planck (NP) Method 278

Hierarchical Simulation Schemes 282

Future Directions and Concluding Remarks 283

References 284

Wavelets in Chemistry and Chemoinformatics C. Matthew Sundling Nagamani Sukumar Hongmei Zhang Mark J. Embrechts Curt M. Breneman 295

Preface 295

Introduction to Wavelets 296

Fourier Transform 297

Continuous Fourier Transform 297

Short-Time Fourier Transformation 298

Wavelet Transform 300

Continuous Wavelet Transform 301

Discrete Wavelet Transform 303

Wavelet Packet Transform 307

Wavelets vs. Fourier Transforms: A Summary 308

Application of Wavelets in Chemistry 309

Smoothing and Denoising 309

Signal Feature Isolation 312

Signal Compression 313

Quantum Chemistry 314

Classification, Regression, and QSAR/QSPR 316

Summary 321

References 321

Author Index 331

Subject Index 349

## What People are Saying About This

**From the Publisher**

"...I recommend this book. Anyone interest in computational chemistry should browse through it and may well end up reading most of it." (*Journal of the American Chemical Society*, Vol. 125, No. 22, 2003)