There is a newer edition of this book titled “Murach’s Python for Data Science (2
nd Edition)Data analysts are in demand everywhere today! And now, Murach’s Python for Data Analysis shows you how to do data analysis the way the pros do. You’ll master descriptive analysis, using Pandas to analyze the data and Seaborn to create the visualizations that let you present your findings effectively. You’ll get started with predictive analysis, using Scikit-learn with linear regression models. And you’ll be guided right from the start by 4 real-world case studies in political, environmental, social, and sports analytics…essential for learning and great perspective for applying your new skills in your own field. See for yourself how quickly and easily this book can turn you into the data analyst that employers are looking for.
There is a newer edition of this book titled “Murach’s Python for Data Science (2
nd Edition)Data analysts are in demand everywhere today! And now, Murach’s Python for Data Analysis shows you how to do data analysis the way the pros do. You’ll master descriptive analysis, using Pandas to analyze the data and Seaborn to create the visualizations that let you present your findings effectively. You’ll get started with predictive analysis, using Scikit-learn with linear regression models. And you’ll be guided right from the start by 4 real-world case studies in political, environmental, social, and sports analytics…essential for learning and great perspective for applying your new skills in your own field. See for yourself how quickly and easily this book can turn you into the data analyst that employers are looking for.


Paperback
-
SHIP THIS ITEMIn stock. Ships in 1-2 days.
-
PICK UP IN STORE
Unavailable at Lennox Town.
Available within 2 business hours
Related collections and offers
Overview
There is a newer edition of this book titled “Murach’s Python for Data Science (2
nd Edition)Data analysts are in demand everywhere today! And now, Murach’s Python for Data Analysis shows you how to do data analysis the way the pros do. You’ll master descriptive analysis, using Pandas to analyze the data and Seaborn to create the visualizations that let you present your findings effectively. You’ll get started with predictive analysis, using Scikit-learn with linear regression models. And you’ll be guided right from the start by 4 real-world case studies in political, environmental, social, and sports analytics…essential for learning and great perspective for applying your new skills in your own field. See for yourself how quickly and easily this book can turn you into the data analyst that employers are looking for.
Product Details
ISBN-13: | 9781943872763 |
---|---|
Publisher: | Mike Murach and Associates, Inc. |
Publication date: | 08/30/2021 |
Pages: | 600 |
Product dimensions: | 8.00(w) x 9.90(h) x 1.40(d) |
Table of Contents
Section 1 Get off to a fast start
Chapter 1 Introduction to Python for data analysis
Introduction to data analysis 4
What data analysis is 4
The five phases of data analysis and visualization 6
The IDEs for Python data analysis 8
The Python skills that you need for data analysis 10
How to install and import the Python modules for data analysis 10
How to call and chain methods 12
The coding basics for Python data analysis 14
How to use JupyterLab as your IDE 16
How to start JupyterLab and work with a Notebook 16
How to edit and run the cells in a Notebook 18
How to use the Tab completion and tooltip features 20
How syntax and runtime errors work 22
How to use Markdown language 24
How to get reference information 26
Two more skills for working with JupyterLab 28
How to split the screen between two Notebooks 28
How to use Magic Commands 30
Introduction to the case studies 32
The Polling case study 32
The Forest Fires case study 34
The Social Survey case study 36
The Sports Analytics case study 38
Chapter 2 The Pandas essentials for data analysis
Introduction to the Pandas DataFrame 46
The DataFrame structure 46
Two ways to get data into a DataFrame 48
How to save and restore a DataFrame 50
How to examine the data 52
How to display the data in a DataFrame 52
How to use the attributes of a DataFrame 54
How to use the info(), nunique(), and describe() methods 56
How to access the columns and rows 58
How to access columns 58
How to access rows 60
How to access a subset of rows and columns 62
Another way to access a subset of rows and columns 64
How to work with the data 66
How to sort the data 66
How to use the statistical methods 68
How to use Python for column arithmetic 70
How to modify the string data in columns 72
How to shape the data 74
How to use indexes 74
How to pivot the data 76
How to melt the data 78
How to analyze the data 80
How to group the data 80
How to aggregate the data 82
How to plot the data 84
Chapter 3 The Pandas essentials for data visualization
Introduction to data visualization 92
The Python libraries for data visualization 92
Long vs. wide data for data visualization 94
How the Pandas plot() method works by default 96
The three basic parameters for the Pandas plot() method 98
How to create 8 types of plots 100
How to create a fine plot or an area plot 100
How to create a scatter plot 102
How to create a bar plot 104
How to create a histogram or a density plot 106
How to create a box plot or a pie plot 108
How to enhance a plot 110
How to improve the appearance of a plot 110
How to work with subplots 112
How to use chaining to get the plots you want 114
Chapter 4 The Seaborn essentials for data visualization
Introduction to Seaborn 120
The Seaborn methods for plotting 120
The general methods vs. the specific methods 122
How to use the basic Seaborn parameters 124
How to use the Seaborn parameters for working with subplots 126
How to enhance and save plots 128
How to set the title, x label, and y label 128
How to set the ticks, x limits, and y limits 130
How to set the background style 132
How to work with subplots 134
How to save a plot 136
How to create relational plots 138
How to create a line plot 138
Haw to create a scatter plot 140
How to create categorical plots 142
How to create a bar plot 142
How to create a box plot 144
How to create distribution plots 146
How to create a histogram 146
How to create a KDE or ECDF plot 148
How to enhance a distribution plot 150
Other techniques for enhancing a plot 152
How to use other Axes methods to enhance a plot 152
How to annotate a plot 154
How to set the color palette 156
How to enhance a plot that has subplots 158
How to customize the titles for subplots 160
How to set the size of a specific plot 162
Section 2 The critical skills for success on the job
Chapter 5 How to get the data
How to find the data that you want to analyze 170
Common data sources 170
How to find and select the data that you want 170
How to import data into a DataFrame 172
How to import data directly into a DataFrame 172
How to download a file to disk before importing it 174
How to work with a zip file on disk 176
How to get database data into a DataFrame 178
How to run queries against a database 178
How to use a SQL query to import data into a DataFrame 180
How to work with a Stata file 182
How to get and explore the metadata of a Stata file 182
How to build DataFrames for the metadata and the data 184
How to work with a JSON file 186
How to download a JSON file to disk 186
How to open a JSON file in JupyterLab 186
How to drill down into the data 188
How to build a DataFrame for the data 190
Chapter 6 How to clean the data
Introduction to data cleaning 198
A general plan for cleaning the data 198
What the info() method can tell you 200
What the unique values can tell you 202
What the value counts can tell you 204
How to simplify the data 206
How to drop rows based on conditions 206
How to drop duplicate rows 206
How to drop columns 208
How to rename columns 210
How to find and fix missing values 212
How to find missing values 212
How to drop rows with missing values 214
How to fill missing values 216
How to fix data type problems 218
How to find dates and numbers that are imported as objects 218
How to convert date and time strings to the datctime data type 220
How to convert object columns to numeric data types 222
How to work with the category data type 224
How to replace invalid values and convert a column's data type 226
How to fix data problems when you import the data 228
How find and fix outliers 230
How to find outliers 230
How to fix outliers 232
Chapter 7 How to prepare the data
How to add and modify columns 240
How to work with datetime columns 240
How to work with string columns 242
How to work with numeric columns 242
How to add a summary column to a DataFrame 244
How to apply functions and lambda expressions 246
How to apply functions to rows or columns 246
How to apply user-defined functions 248
How lambda expressions work with DataFrames 250
How to apply lambda expressions 252
How to work with indexes 254
How to set and remove an index 254
How to unstack indexed data 256
How to combine DataFrames 258
How to join DataFrames with an inner join 258
How to join DataFrames with a left or outer join 260
How to merge DataFrames 262
How to concatenate DataFrames 264
How to handle the SettingWithCopyWarning 266
What the warning is telling you 266
What to do when the warning is displayed 268
What to watch for when the warning isn't displayed 268
Chapter 8 How to analyze the data
How to create and plot long data 274
How to melt columns to create long data 274
How to plot melted columns 276
How to group and aggregate the data 278
How to group and apply a single aggregate method 278
How to work with a DataFrameGroupBy object 280
How to apply multiple aggregate methods 282
How to create and use pivot tables 284
How to use the pivor() method 284
How to use the pivot_table() method 286
How to work with bins 288
How to create bins of equal size 288
How to create bins with equal numbers of values 290
How to plot binned data 292
More skills for data analysis 294
How to select the rows with the largest values 294
How to calculate the percent change 296
How to rank rows 298
How to find other methods for analysis 300
Chapter 9 How to analyze time-series data
How to reindex time-series data 306
How to generate time periods 306
How to reindex with datetime indexes 308
How to reindex with a semi-month index 310
How a user-defined function can improve a datetime index 312
How reindexing with an improved index can improve plots 314
How to resample time-series data 316
How to use the resample() method 316
How to use the label and closed parameters when you downsample 318
How downsampling can improve plots 320
How to work with rolling windows 322
The concept of rolling windows 322
How to create rolling windows 324
How to plot rolling window data 326
How to work with running totals 328
How to create running totals 328
How to plot running totals 330
Section 3 An introduction to predictive analysis
Chapter 10 How to make predictions with a linear regression model
Introduction to predictive analysis 338
Types of predictive models 338
Introduction to regression analysis 338
How to find correlations between variables 340
The Housing dataset 340
How to identify correlations with a scatter plot 342
How to identify correlations with a grid of scatter plots 344
How to identify correlations with r-values 346
How to identify correlations with a heatmap 348
How to use Scikit-learn to work with a linear regression 350
A procedure for creating and using a regression model 350
The function and methods for linear regression models 352
How to create, validate, and use a linear regression model 354
How to plot the predicted data 356
How to plot the residuals 358
How to plot regression models with Seaborn 360
The lmplot() method and some of its parameters 360
How to plot a simple linear regression 362
How to plot a logistic regression 362
How to plot a polynomial regression 364
How to plot a lowess regression 364
How to use the residplot() method to plot the residuals 366
Chapter 11 How to make predictions with a multiple regression model
A simple regression model for a Cars dataset 372
The Cars dataset 372
How to create a simple regression model 374
How to plot the residuals of a simple regression 376
How to work with a multiple regression model 378
How to create a multiple regression model 378
How to plot the residuals of a multiple regression 380
How to work with categorical variables 382
How to identify categorical variables 382
How to review categorical variables 384
How to create dummy variables 386
How to restate the data and check the correlations 388
How to create a multiple regression that includes dummy variables 390
How to improve a multiple regression model 392
How to select the independent variables 392
How to test different combinations of variables 394
How to use Scikit-learn to select the variables 396
How to select the right number of variables 398
Section 4 The case studies
Chapter 12 The Polling case study
Get and display the data 406
Import the modules that you will need 406
Get the data 406
Display the data 406
Clean the data 408
Examine the data 408
Drop columns and rows 412
Rename columns 414
Fix object types 414
Fix data 414
Take an early plot with Pandas 414
Save the DataFrame 414
Prepare the data 416
Add columns for grouping and filtering 416
Create a new DataFrame in long form 418
Take an early plot of the long data with Seaborn 418
Add monthly bins to the DataFrame 420
Add an average percent column for each month 420
Save the wide and long DataFrames 420
Analyze the data 422
Plot the national and swing state polls 422
Plot the voter types 424
Plot the last two months of polling 426
Plot the gap changes in selected states 428
More preparation and analysis 430
Prepare the gap data for the last week of polling 430
Plot the gap data for the last week of polling 432
Prepare the weekly gap data for the swing states 434
Plot the weekly gap data for the swing states 436
Chapter 13 The Forest Fires case study
Get the data 442
Download and unzip the SQLite database 442
Connect and query the database 442
Import the data into a DataFrame 442
Clean the data 444
Examine the data 444
Improve the readability of the data 444
Drop unnecessary rows 446
Drop duplicate rows 446
Convert dates to datetime objects 446
Check for missing contain dates 448
Prepare the data 450
Add fire_month and days_burning columns 450
Examine the contain_date and days_burning columns 450
Analyze the data 452
Analyze the data for California 452
Two more plots for California fires 454
Rank the states by total acres burned 456
Prepare a DataFrame for total acres burned by year within state 458
Prepare a DataFrame for the top 4 states 458
Plot the acres burned total by year for the top 4 states 460
Review the 20 largest fires in California 462
Use GeoPandas to plot the fires on a map 464
Use GeoPandas to plot the California map 464
Use GeoPandas or Seaborn to plot the California fires on a map 466
Plot the fires in the continental United States 468
Chapter 14 The Social Survey case study
Introduction to the Social Survey 474
Download and unzip the zip file for the data 474
Build a DataFrame for the metadata 474
The employment data 476
Use the codebook and read the data that you want 476
Prepare the data 478
Plot the data and reduce the number of categories 480
Plot the total counts of the responses 482
Convert the counts to percents and plot them 484
The work-life balance data 486
Search the codebook for small question sets 486
Read and review the work-life data 488
Plot the responses for the first question 490
Plot the responses for the second and thid questions 492
How to expand the scope of the analysis 494
Use the codebook to find related columns 494
Use the codebook to find follow-up questions 496
Select the columns for an expanded DataFrame 498
Bin the data for a column 500
How to use a hypothesis to guide your analysis 502
Develop and test a first hypothesis 502
Develop and test a second hypothesis 504
Develop and test a third hypothesis 506
Chapter 15 The Sports Analytics case study
Get the data and build the DataFrame 512
Get the data 512
Build the DataFrame 512
Clean the data 514
Locate and drop unneeded rows 514
Locate and drop unneeded columns 514
Convert the game_date column to datetime data 514
Prepare the data 516
Add a column for the season 516
Add a column for the shot result 516
Add a column for points made for each shot 518
Add three summary columns 518
Plot the summary data 520
Plot the points per game by season 520
Plot the averages of shots, shots made, and points per game by season 520
Plot the shot locations 522
Plot the shot locations for two games 522
Plot the shot locations for two seasons 524
Plot the shot density for one season 526
Plot the shot density for two seasons 528
Appendix A How to set up Windows for this book
How to install and use Anaconda 532
How to install Anaconda 532
How to use the Anaconda Prompt 534
How to use the Anaconda Navigator 534
How to install and use the files for this book 536
How to install the files for this book 536
How to make sure Anaconda is installed correctly 538
How to download the large data files for this book 538
Appendix B How to set up macOS for this book
How to install and use Anaconda 542
How to install Anaconda 542
How to run conda commands 544
How to use the Anaconda Navigator 544
How to install and use the files for this book 546
How to install the files for this book 546
How to make sure Anaconda is installed correctly 548
How to download the large data files for this book 548