图书介绍

R语言机器学习 第2版 影印版PDF|Epub|txt|kindle电子书版本网盘下载

R语言机器学习 第2版 影印版
  • Brett Lantz 著
  • 出版社: 南京:东南大学出版社
  • ISBN:9787564170714
  • 出版时间:2017
  • 标注页数:427页
  • 文件大小:61MB
  • 文件页数:448页
  • 主题词:程序语言-程序设计-英文

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快]温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页直链下载[便捷但速度慢]  [在线试读本书]   [在线获取解压码]

下载说明

R语言机器学习 第2版 影印版PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

Chapter 1:Introducing Machine Learning1

The origins of machine learning2

Uses and abuses of machine learning4

Machine learning successes5

The limits of machine learning5

Machine learning ethics7

How machines learn9

Data storage10

Abstraction11

Generalization13

Evaluation14

Machine learning in practice16

Types of input data17

Types of machine learning algorithms19

Matching input data to algorithms21

Machine learning with R22

Installing R packages23

Loading and unloading R packages24

Summary25

Chapter 2:Managing and Understanding Data27

R data structures28

Vectors28

Factors30

Lists32

Data frames35

Matrixes and arrays37

Managing data with R39

Saving,loading,and removing R data structures39

Importing and saving data from CSV files41

Exploring and understanding data42

Exploring the structure of data43

Exploring numeric variables44

Measuring the central tendency-mean and median45

Measuring spread-quartiles and the five-number summary47

Visualizing numeric variables-boxplots49

Visualizing numeric variables-histograms51

Understanding numeric data-uniform and normal distributions53

Measuring spread-variance and standard deviation54

Exploring categorical variables56

Measuring the central tendency-the mode58

Exploring relationships between variables59

Visualizing relationships-scatterplots59

Examining relationships-two-way cross-tabulations61

Summary64

Chapter 3:Lazy Learning-Classification Using Nearest Neighbors65

Understanding nearest neighbor classification66

The k-NN algorithm66

Measuring similarity with distance69

Choosing an appropriate k70

Preparing data for use with k-NN72

Why is the k-NN algorithm lazy?74

Example-diagnosing breast cancer with the k-NN algorithm75

Step 1-collecting data76

Step 2-exploring and preparing the data77

Transformation-normalizing numeric data79

Data preparation-creating training and test datasets80

Step 3-training a model on the data81

Step 4-evaluating model performance83

Step 5-improving model performance84

Transformation-z-score standardization85

Testing alternative values of k86

Summary87

Chapter 4:Probabilistic Learning-Classification Using Naive Bayes89

Understanding Naive Bayes90

Basic concepts of Bayesian methods90

Understanding probability91

Understanding joint probability92

Computing conditional probability with Bayes'theorem94

The Naive Bayes algorithm97

Classification with Naive Bayes98

The Laplace estimator100

Using numeric features with Naive Bayes102

Example-filtering mobile phone spam with the Naive Bayes algorithm103

Step 1-collecting data104

Step 2-exploring and preparing the data105

Data preparation-cleaning and standardizing text data106

Data preparation-splitting text documents into words112

Data preparation-creating training and test datasets115

Visualizing text data-word clouds116

Data preparation-creating indicator features for frequent words119

Step 3-training a model on the data121

Step 4-evaluating model performance122

Step 5-improving model performance123

Summary124

Chapter 5:Divide and Conquer-Classification Using Decision Trees and Rules125

Understanding decision trees126

Divide and conquer127

The C5.0 decision tree algorithm131

Choosing the best split133

Pruning the decision tree135

Example-identifying risky bank loans using C5.0 decision trees136

Step 1-collecting data136

Step 2-exploring and preparing the data137

Data preparation-creating random training and test datasets138

Step 3-training a model on the data140

Step 4-evaluating model performance144

Step 5-improving model performance145

Boosting the accuracy of decision trees145

Making mistakes more costlier than others147

Understanding classification rules149

Separate and conquer150

The 1R algorithm153

The RIPPER algorithm155

Rules from decision trees157

What makes trees and rules greedy?158

Example-identifying poisonous mushrooms with rule learners160

Step 1-collecting data160

Step 2-exploring and preparing the data161

Step 3-training a model on the data162

Step 4-evaluating model performance165

Step 5-improving model performance166

Summary169

Chapter 6:Forecasting Numeric Data-Regression Methods171

Understanding regression172

Simple linear regression174

Ordinary least squares estimation177

Correlations179

Multiple linear regression181

Example-predicting medical expenses using linear regression186

Step 1-collecting data186

Step 2-exploring and preparing the data187

Exploring relationships among features-the correlation matrix189

Visualizing relationships among features-the scatterplot matrix190

Step 3-training a model on the data193

Step 4-evaluating model performance196

Step 5-improving model performance197

Model specification-adding non-linear relationships198

Transformation-converting a numeric variable to a binary indicator198

Model specification-adding interaction effects199

Putting it all together-an improved regression model200

Understanding regression trees and model trees201

Adding regression to trees202

Example-estimating the quality of wines with regression trees and model trees205

Step 1-collecting data205

Step 2-exploring and preparing the data206

Step 3-training a model on the data208

Visualizing decision trees210

Step 4-evaluating model performance212

Measuring performance with the mean absolute error213

Step 5-improving model performance214

Summary218

Chapter 7:Black Box Methods-Neural Networks and Support Vector Machines219

Understanding neural networks220

From biological to artificial neurons221

Activation functions223

Network topology225

The number of layers226

The direction of information travel227

The number of nodes in each layer228

Training neural networks with backpropagation229

Example-Modeling the strength of concrete with ANNs231

Step 1-collecting data232

Step 2-exploring and preparing the data232

Step 3-training a model on the data234

Step 4-evaluating model performance237

Step 5-improving model performance238

Understanding Support Vector Machines239

Classification with hyperplanes240

The case of linearly separable data242

The case of nonlinearly separable data244

Using kernels for non-linear spaces245

Example-performing OCR with SVMs248

Step 1-collecting data249

Step 2-exploring and preparing the data250

Step 3-training a model on the data252

Step 4-evaluating model performance254

Step 5-improving model performance256

Chapter 8:Finding Patterns-Market Basket Analysis Using Association Rules259

Understanding association rules260

The Apriori algorithm for association rule learning261

Measuring rule interest-support and confidence263

Building a set of rules with the Apriori principle265

Example-identifying frequently purchased groceries with association rules266

Step 1-collecting data266

Step 2-exploring and preparing the data267

Data preparation-creating a sparse matrix for transaction data268

Visualizing item support-item frequency plots272

Visualizing the transaction data-plotting the sparse matrix273

Step 3-training a model on the data274

Step 4-evaluating model performance277

Step 5-improving model performance280

Sorting the set of association rules280

Taking subsets of association rules281

Saving association rules to a file or data f?ame283

Summary284

Chapter 9:Finding Groups of Data-Clustering with k-means285

Understanding clustering286

Clustering as a machine learning task286

The k-means clustering algorithm289

Using distance to assign and update clusters290

Choosing the appropriate number of clusters294

Example-finding teen market segments using k-means clustering296

Step 1-collecting data297

Step 2-exploring and preparing the data297

Data preparation-dummy coding missing values299

Data preparation-imputing the missing values300

Step 3-training a model on the data302

Step 4-evaluating model performance304

Step 5-improving model performance308

Summary310

Chapter 10:Evaluating Model Performance311

Measuring performance for classification312

Working with classification prediction data in R313

A closer look at confusion matrices317

Using confusion matrices to measure performance319

Beyond accuracy-other measures of performance321

The kappa statistic323

Sensitivity and specificity326

Precision and recall328

The F-measure330

Visualizing performance trade-offs331

ROC curves332

Estimating future performance336

The holdout method336

Cross-validation340

Bootstrap sampling343

Summary344

Chapter 11:Improving Model Performance347

Tuning stock models for better performance348

Using caret for automated parameter tuning349

Creating a simple tuned model352

Customizing the tuning process355

Improving model performance with meta-learning359

Understanding ensembles359

Bagging362

Boosting366

Random forests369

Training random forests370

Evaluating random forest performance373

Summary375

Chapter 12:Specialized Machine Learning Topics377

Working with proprietary files and databases378

Reading from and writing to Microsoff Excel,SAS,SPSS,and Stata files378

Querying data in SQL databases379

Working with online data and services381

Downloading the complete text of web pages382

Scraping data from web pages383

Parsing XML documents387

Parsing JSON from web APIs388

Working with domain-specific data392

Analyzing bioinformatics data393

Analyzing and visualizing network data393

Improving the performance of R398

Managing very large datasets398

Generalizing tabular data structures with dplyr399

Making data frames faster with data.table401

Creating disk-based data frames with ff402

Using massive matrices with bigmemory404

Learning faster with parallel computing404

Measuring execution time406

Working in parallel with multicore and snow406

Taking advantage of parallel with foreach and doParallel410

Parallel cloud computing with MapReduce and Hadoop411

GPU computing412

Deploying optimized learning algorithms413

Building bigger regression models with biglm414

Growing bigger and faster random forests with bigrf414

Training and evaluating models in parallel with caret414

Summary416

Index417

热门推荐