Table of Contents
1 Introduction 1
1.1 Motivation and Current Practices 1
1.2 Multidimensional System Level Design Overview 4
2 Design of Image Processing Applications 9
2.1 Classification of Image Processing Algorithms 10
2.2 JPEG2000 Image Compression 11
2.3 Parallelism of Image Processing Applications 15
2.4 System Implementation 16
2.4.1 Design Gap Between Available Software Solution and Desired Hardware Implementation 17
2.4.2 Lack of Architectural Verification 17
2.4.3 Missing Possibility to Explore Consequences of Implementation Alternatives 17
2.4.4 Manual Design of Memory System 18
2.4.5 Lack to Simulate the Overall System 18
2.4.6 Inability to Precisely Predict Required Computational Effort for Both Hardware and Software 18
2.5 Requirements for System Level Design of Image Processing Applications 18
2.5.1 Representation of Global, Local, and Point Algorithms 19
2.5.2 Representation of Task, Data, and Operation Parallelism 19
2.5.3 Capability to Represent Control Flow in Multidimensional Algorithms 19
2.5.4 Tight Interaction Between Static and Data-Dependent Algorithms 19
2.5.5 Support of Data Reordering 19
2.5.6 Fast Generation of RTL Implementations for Quick Feedback During Architecture Design 20
2.5.7 High-Level Verification 20
2.5.8 High-Level Performance Evaluation 20
2.5.9 Tool-Supported Design of Memory Systems 20
2.6 Multidimensional System Level Design 20
3 Fundamentals and Related Work 23
3.1 Behavioral Specification 23
3.1.1 Modeling Approaches 23
3.1.2 Sequential Languages 25
3.1.3 One-Dimensional Data Flow 27
3.1.4 Multidimensional Data Flow 35
3.1.5 Conclusion 41
3.2 Behavioral Hardware Synthesis 42
3.2.1 Overview 43
3.2.2 SA-C 44
3.2.3 ROCCC 44
3.2.4 DEFACTO 45
3.2.5 Synfora PICO Express 51
3.2.6 MMAlpha 54
3.2.7 PARO 55
3.2.8 Conclusion 56
3.3 Memory Analysis and Optimization 57
3.3.1 Memory Analysis for One-Dimensional Data Flow Graphs 57
3.3.2 Array-Based Analysis 59
3.3.3 Conclusion 64
3.4 Communication and Memory Synthesis 64
3.4.1 Memory Mapping 65
3.4.2 Parallel Data Access 65
3.4.3 Data Reuse 66
3.4.4 Out-of-Order Communication 66
3.4.5 Conclusion 68
3.5 System Level Design 68
3.5.1 Embedded Multi-processor Software Design 68
3.5.2 Model-Based Simulation and Design 71
3.5.3 System Level Mapping and Exploration 77
3.6 Conclusion 79
4 Electronic System Level Design of Image Processing Applications with SYSTEMCODESIGNER 81
4.1 Design Flow 81
4.1.1 Actor-Oriented Model 82
4.1.2 Actor Specification 83
4.1.3 Actor and Communication Synthesis 83
4.1.4 Automatic Design Space Exploration 84
4.1.5 System Building 86
4.1.6 Extensions 86
4.2 Case Study for the Motion-JPEG Decoder 86
4.2.1 Comparison Between VPC Estimates and Real Implementation 87
4.2.2 Influence of the Input Motion-JPEG Stream 90
4.3 Conclusions 91
5 Windowed Data Flow (WDF) 93
5.1 Sliding Window Communication 94
5.1.1 WDF Graph and Token Production 94
5.1.2 Virtual Border Extension 96
5.1.3 Token Consumption 97
5.1.4 Determination of Extended Border Values 99
5.1.5 WDF Delay Elements 99
5.2 Local WDF Balance Equation 100
5.3 Communication Order 102
5.4 Communication Control 105
5.4.1 Multidimensional FIFO 105
5.4.2 Communication Finite State Machine for Multidimensional Actors 107
5.5 Windowed Synchronous Data Flow (WSDF) 108
5.6 WSDF Balance Equation 110
5.6.1 Derivation of the WSDF Balance Equation 112
5.6.2 Application to an Example Graph 115
5.7 Integration into SYSTEMCODESIGNER 117
5.8 Application Examples 118
5.8.1 Binary Morphological Reconstruction 118
5.8.2 Lifting-Based Wavelet Kernel 125
5.9 Limitations and Future Work 129
5.10 Conclusion 130
6 Memory Mapping Functions for Efficient Implementation of WDF Edges 133
6.1 Problem Formulation 134
6.2 Hierarchical Iteration Vectors 137
6.3 Memory Models 138
6.3.1 The Rectangular Memory Model 139
6.3.2 The Linearized Buffer Model 140
6.4 Simulation Results 144
6.5 Conclusion 149
7 Buffer Analysis for Complete Application Graphs 151
7.1 Problem Formulation 152
7.2 Buffer Analysis by Simulation 153
7.3 Polyhedral Representation of WSDF Edges 155
7.3.1 WSDF Lattice 156
7.3.2 Lattice Scaling 157
7.3.3 Out-of-Order Communication 159
7.3.4 Lattice Shifting Based on Dependency Vectors 164
7.3.5 Pipelined Actor Execution 172
7.4 Lattice Wraparound 173
7.4.1 Principle of Lattice Wraparound 175
7.4.2 Formal Description of the Lattice Wraparound 176
7.4.3 Lattice Shifting for Lattices with Wraparound 177
7.5 Scheduling of Complete WSDF Graphs 179
7.5.1 Lattice Scaling 180
7.5.2 Lattice Shifting 180
7.6 Buffer Size Calculation 185
7.6.1 ILP Formulation for Buffer Size Calculation 186
7.6.2 Memory Channel Splitting 190
7.7 Multirate Analysis 192
7.8 Solution Strategies 196
7.9 Results 197
7.9.1 Out-of-Order Communication 198
7.9.2 Application to Complex Graph Topologies 199
7.9.3 Memory Channel Splitting 202
7.9.4 Multirate Analysis 204
7.9.5 Limitations 204
7.10 Conclusion 206
8 Communication Synthesis 209
8.1 Problem Formulation 210
8.2 Hardware Architecture 214
8.2.1 Read and Write Order Control 215
8.2.2 Memory Partitioning 218
8.2.3 Source Address Generation 221
8.2.4 Virtual Memory Channel Mapping 226
8.2.5 Trading Throughput Against Resource Requirements 233
8.2.6 Sink Address Generation 234
8.2.7 Fill-Level Control 236
8.2.8 Elimination of Modular Dependencies 244
8.3 Determination of Channel Sizes 247
8.4 Granularity of Scheduling 248
8.4.1 Latency Impact of Coarse-Grained Scheduling 248
8.4.2 Memory Size Impact of Coarse-Grained Scheduling 250
8.4.3 Controlling the Scheduling Granularity 250
8.5 Results 253
8.5.1 Implementation Strategy for High Clock Frequencies 253
8.5.2 Out-of-Order Communication 254
8.5.3 Out-of-Order Communication with Parallel Data Access 255
8.5.4 Influence of Different Memory Channel Sizes 258
8.5.5 Combination with Data Reuse 259
8.5.6 Impact of Scheduling Granularity 261
8.6 Conclusion and Future Work 262
9 Conclusion 265
9.1 Multidimensional System Design 265
9.2 Discussed Design Steps and Their Major Benefits 266
A Buffer Analysis by Simulation 269
A.1 Efficient Buffer Parameter Determination for the Rectangular Memory Model 269
A.1.1 Monitoring of Live Data Elements 269
A.1.2 Table-Based Buffer Parameter Determination 270
A.1.3 Determination of the Minimum Tables 272
A.1.4 Determination of the Maximum Tables 274
A.1.5 Complexity 277
A.2 Efficient Buffer Parameter Determination for the Linearized Buffer Model 277
A.2.1 Tree Data Structure for Tracking of Live Data Elements 278
A.2.2 Determination of the Lexicographically Smallest Live Data Element 279
A.2.3 Tree Update 280
A.2.4 Complexity of the Algorithm 282
A.3 Stimulation by Simulation 282
B Abbreviations 285
B Formula Symbols 287
References 289
Index 307