Mastering Computer Vision with PyTorch and Machine Learning

This book, together with the accompanying Python codes, provides a thorough and extensive guide for mastering advanced computer vision techniques for image processing by using the open-source machine learning framework PyTorch. Known for its user-friendly interface and Python programming style, PyTorch is accessible and one of the most popular tools among researchers and practitioners in the field of artificial intelligence.

Key Features:

  • Hands-on approach using the accompanying practical code examples
  • Codes for all projects listed in the book are in a same style with four parts: data input, data display, data process and data output
  • Emphasis on practical application development for computer vision
  • Includes latest computer vision technologies
  • Offers practical guidance on advanced computer vision techniques
  • Uses freely available and open-source resources like Kaggle and Google Colab
1145571218
Mastering Computer Vision with PyTorch and Machine Learning

This book, together with the accompanying Python codes, provides a thorough and extensive guide for mastering advanced computer vision techniques for image processing by using the open-source machine learning framework PyTorch. Known for its user-friendly interface and Python programming style, PyTorch is accessible and one of the most popular tools among researchers and practitioners in the field of artificial intelligence.

Key Features:

  • Hands-on approach using the accompanying practical code examples
  • Codes for all projects listed in the book are in a same style with four parts: data input, data display, data process and data output
  • Emphasis on practical application development for computer vision
  • Includes latest computer vision technologies
  • Offers practical guidance on advanced computer vision techniques
  • Uses freely available and open-source resources like Kaggle and Google Colab
95.0 In Stock
Mastering Computer Vision with PyTorch and Machine Learning

Mastering Computer Vision with PyTorch and Machine Learning

by Caide Xiao
Mastering Computer Vision with PyTorch and Machine Learning

Mastering Computer Vision with PyTorch and Machine Learning

by Caide Xiao

eBook

$95.00 

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.
WANT A NOOK?  Explore Now

Related collections and offers


Overview

This book, together with the accompanying Python codes, provides a thorough and extensive guide for mastering advanced computer vision techniques for image processing by using the open-source machine learning framework PyTorch. Known for its user-friendly interface and Python programming style, PyTorch is accessible and one of the most popular tools among researchers and practitioners in the field of artificial intelligence.

Key Features:

  • Hands-on approach using the accompanying practical code examples
  • Codes for all projects listed in the book are in a same style with four parts: data input, data display, data process and data output
  • Emphasis on practical application development for computer vision
  • Includes latest computer vision technologies
  • Offers practical guidance on advanced computer vision techniques
  • Uses freely available and open-source resources like Kaggle and Google Colab

Product Details

ISBN-13: 9780750362443
Publisher: Institute of Physics Publishing
Publication date: 04/29/2024
Series: IOP ebooks
Sold by: Barnes & Noble
Format: eBook
Pages: 300
File size: 32 MB
Note: This product may take a few minutes to download.

About the Author

Dr. Caide Xiao was born in China in and obtained his bachelor's degree in physics from Centre China Normal University in 1979 and was a lecturer of medical physics in Yunyang University. Following his PhD in optical biosensors from Tsinghua University he has subsequently been a research fellow and visiting scholar at the Biotechnology Research Institute in Montreal, Oakland University in Rochester, West Virginia University and the University of Calgary.

Table of Contents

Preface

Acknowledgements

Author biography

1 Mathematical tools for computer vision

1.1 Probability, entropy and Kullback–Leibler divergence

1.1.1 Probability and Shannon entropy

1.1.2 Kullback–Leibler divergence and cross entropy

1.1.3 Conditional probability and joint entropies

1.1.4 Jensen’s inequality

1.1.5 Maximum likelihood estimation and over fitting

1.1.6 Application of expectation-maximization algorithm to find a PDF

1.2 Using a gradient descent algorithm for linear regression

1.3 Automatic gradient calculations and learning rate schedulers

1.4 Dataset, dataloader, GPU and models saving

1.5 Activation functions for nonlinear regressions

References

2 Image classifications by convolutional neural networks

2.1 Classification of hand written digits in the MNIST database

2.2 Mathematical operations of a convolution

2.3 Using ResNet9 for CIFAR-10 classification

2.4 Transfer learning with ResNet for STL-10 dataset

References

3 Image generation by GANs

3.1 The GAN theory

3.1.1 Implement a GAN for quadratic curve generation

3.1.2 Using a GAN with two fully connected layers to generate MINST Images

3.2 Applications of deep convolutional GANs

3.2.1 Mathematical operations of ConvTranspose2D

3.2.2 Applications of a DCGAN for MNIST and fashion MNIST

3.2.3 Using a DCGAN to generate fake anime-faces and fake CelebA images

3.3 Conditional deep convolutional GANs

3.3.1 Applications of a cDCGAN to MNIST and fashion MNIST datasets

3.3.2 Applications of a cDCGAN to generate fake Rock Paper Scissors images

References

4 Image generation by WGANs with gradient penalty

4.1 Using a WGAN or a WGAN-GP for generation of fake quadratic curves

4.2 Using a WGAN-GP for Fashion MNIST

4.3 WGAN-GP for CelebA dataset and Anime Face dataset

4.4 Implement of a cWGAN-GP for Rock Paper Scissors dataset

References

5 Image generation by VAEs

5.1 VAE and beta-VAE

5.2 Application of beta-VAE for fake quadratic curves

5.3 Application of beta-VAE for the MNIST dataset

5.4 Using VAE-GAN for fake images of MNIST and Fashion MNIST

References

6 Image generation by infoGANs

6.1 Using infoGAN to generate quadratic curves

6.2 Implementation of infoGAN for the MNIST dataset

6.3 infoGAN for fake Anime-face dataset images

6.4 Implementation of infoGAN to the rock paper scissors dataset

Reference

7 Object detection by YOLOv1/YOLOv3 models

7.1 Bounding boxes of Pascal VOC database for YOLOv1

7.2 Encode VOC images with bounding boxes for YOLOv1

7.2.1 VOC image augmentations with bounding boxes

7.2.2 Encoding bounding boxes to grid cells for YOLOv1 model training

7.2.3 Chess pieces dataset from Roboflow

7.3 ResNet18 model, IOU and a loss function

7.3.1 Using ResNet18 to replace YOLOv1 model

7.3.2 Intersection over union (IOU)

7.3.3 Loss function

7.4 Utility functions for model training

7.5 Applications of YOLOv3 for real-time object detection

References

8 YOLOv7 and YOLOv8 models

8.1 YOLOv7 for object detection for a custom dataset: MNIST4yolo

8.2 YOLOv7 for instance segmentation

8.3 Using YOLOv7 for human pose estimation (key point detection)

8.4 Applications of YOLOv8 models

8.4.1 Image object detection, segmentation, classification and pose estimation

8.4.2 Object counting on an image or a video frame

8.4.3 Car tracking and counting for a video file

8.4.4 Fine tuning YOLOv8 for objection detection and annotation of a custom dataset

8.5 Using YOLO-NAS models for object detection

References

9 U-Nets for image segmentation and diffusion models for image generation

9.1 Retinal vessel segmentation by a U-Net for DRIVE dataset

9.2 Using an attention U-Net diffusion model for quadratic curve generation

9.2.1 The forward process in a DDPM

9.2.2 The backward process in the DDPM

9.3 Using a pre-trained U-Net from Hugging Face to generate images

9.4 Generate photorealistic images from text prompts by stable diffusion

References

10 Applications of vision transformers

10.1 The architecture of a basic ViT model

10.2 Hugging Face ViT for CIFAR10 image classification

10.3 Zero shot image classification by OpenAI CLIP

10.4 Zero shot object detection by Hugging Face’s OWL-ViT

References

11 Knowledge distillation, DINO, SAM, MiDaS and NeRF

11.1 Knowledge distillation for neural network compression

11.2 DINO: emerging properties in self-supervised vision transformers

11.3 DINOv2 for image retrieval, classification and feature visualization

11.4 Segment anything model: SAM and FastSAM

11.5 MiDaS for image depth estimation

11.6 Neural radiance fields for synthesis of 3D scenes

11.6.1 Camera intrinsic and extrinsic matrices

11.6.2 Using MLP with Gaussian Fourier feature mapping to reconstruct images

11.6.3 The physics principle of render volume density in NeRF

References

Appendix

From the B&N Reads Blog

Customer Reviews