图书介绍
R语言机器学习 第2版 影印版PDF|Epub|txt|kindle电子书版本网盘下载
- Brett Lantz 著
- 出版社: 南京:东南大学出版社
- ISBN:9787564170714
- 出版时间:2017
- 标注页数:427页
- 文件大小:61MB
- 文件页数:448页
- 主题词:程序语言-程序设计-英文
PDF下载
下载说明
R语言机器学习 第2版 影印版PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
Chapter 1:Introducing Machine Learning1
The origins of machine learning2
Uses and abuses of machine learning4
Machine learning successes5
The limits of machine learning5
Machine learning ethics7
How machines learn9
Data storage10
Abstraction11
Generalization13
Evaluation14
Machine learning in practice16
Types of input data17
Types of machine learning algorithms19
Matching input data to algorithms21
Machine learning with R22
Installing R packages23
Loading and unloading R packages24
Summary25
Chapter 2:Managing and Understanding Data27
R data structures28
Vectors28
Factors30
Lists32
Data frames35
Matrixes and arrays37
Managing data with R39
Saving,loading,and removing R data structures39
Importing and saving data from CSV files41
Exploring and understanding data42
Exploring the structure of data43
Exploring numeric variables44
Measuring the central tendency-mean and median45
Measuring spread-quartiles and the five-number summary47
Visualizing numeric variables-boxplots49
Visualizing numeric variables-histograms51
Understanding numeric data-uniform and normal distributions53
Measuring spread-variance and standard deviation54
Exploring categorical variables56
Measuring the central tendency-the mode58
Exploring relationships between variables59
Visualizing relationships-scatterplots59
Examining relationships-two-way cross-tabulations61
Summary64
Chapter 3:Lazy Learning-Classification Using Nearest Neighbors65
Understanding nearest neighbor classification66
The k-NN algorithm66
Measuring similarity with distance69
Choosing an appropriate k70
Preparing data for use with k-NN72
Why is the k-NN algorithm lazy?74
Example-diagnosing breast cancer with the k-NN algorithm75
Step 1-collecting data76
Step 2-exploring and preparing the data77
Transformation-normalizing numeric data79
Data preparation-creating training and test datasets80
Step 3-training a model on the data81
Step 4-evaluating model performance83
Step 5-improving model performance84
Transformation-z-score standardization85
Testing alternative values of k86
Summary87
Chapter 4:Probabilistic Learning-Classification Using Naive Bayes89
Understanding Naive Bayes90
Basic concepts of Bayesian methods90
Understanding probability91
Understanding joint probability92
Computing conditional probability with Bayes'theorem94
The Naive Bayes algorithm97
Classification with Naive Bayes98
The Laplace estimator100
Using numeric features with Naive Bayes102
Example-filtering mobile phone spam with the Naive Bayes algorithm103
Step 1-collecting data104
Step 2-exploring and preparing the data105
Data preparation-cleaning and standardizing text data106
Data preparation-splitting text documents into words112
Data preparation-creating training and test datasets115
Visualizing text data-word clouds116
Data preparation-creating indicator features for frequent words119
Step 3-training a model on the data121
Step 4-evaluating model performance122
Step 5-improving model performance123
Summary124
Chapter 5:Divide and Conquer-Classification Using Decision Trees and Rules125
Understanding decision trees126
Divide and conquer127
The C5.0 decision tree algorithm131
Choosing the best split133
Pruning the decision tree135
Example-identifying risky bank loans using C5.0 decision trees136
Step 1-collecting data136
Step 2-exploring and preparing the data137
Data preparation-creating random training and test datasets138
Step 3-training a model on the data140
Step 4-evaluating model performance144
Step 5-improving model performance145
Boosting the accuracy of decision trees145
Making mistakes more costlier than others147
Understanding classification rules149
Separate and conquer150
The 1R algorithm153
The RIPPER algorithm155
Rules from decision trees157
What makes trees and rules greedy?158
Example-identifying poisonous mushrooms with rule learners160
Step 1-collecting data160
Step 2-exploring and preparing the data161
Step 3-training a model on the data162
Step 4-evaluating model performance165
Step 5-improving model performance166
Summary169
Chapter 6:Forecasting Numeric Data-Regression Methods171
Understanding regression172
Simple linear regression174
Ordinary least squares estimation177
Correlations179
Multiple linear regression181
Example-predicting medical expenses using linear regression186
Step 1-collecting data186
Step 2-exploring and preparing the data187
Exploring relationships among features-the correlation matrix189
Visualizing relationships among features-the scatterplot matrix190
Step 3-training a model on the data193
Step 4-evaluating model performance196
Step 5-improving model performance197
Model specification-adding non-linear relationships198
Transformation-converting a numeric variable to a binary indicator198
Model specification-adding interaction effects199
Putting it all together-an improved regression model200
Understanding regression trees and model trees201
Adding regression to trees202
Example-estimating the quality of wines with regression trees and model trees205
Step 1-collecting data205
Step 2-exploring and preparing the data206
Step 3-training a model on the data208
Visualizing decision trees210
Step 4-evaluating model performance212
Measuring performance with the mean absolute error213
Step 5-improving model performance214
Summary218
Chapter 7:Black Box Methods-Neural Networks and Support Vector Machines219
Understanding neural networks220
From biological to artificial neurons221
Activation functions223
Network topology225
The number of layers226
The direction of information travel227
The number of nodes in each layer228
Training neural networks with backpropagation229
Example-Modeling the strength of concrete with ANNs231
Step 1-collecting data232
Step 2-exploring and preparing the data232
Step 3-training a model on the data234
Step 4-evaluating model performance237
Step 5-improving model performance238
Understanding Support Vector Machines239
Classification with hyperplanes240
The case of linearly separable data242
The case of nonlinearly separable data244
Using kernels for non-linear spaces245
Example-performing OCR with SVMs248
Step 1-collecting data249
Step 2-exploring and preparing the data250
Step 3-training a model on the data252
Step 4-evaluating model performance254
Step 5-improving model performance256
Chapter 8:Finding Patterns-Market Basket Analysis Using Association Rules259
Understanding association rules260
The Apriori algorithm for association rule learning261
Measuring rule interest-support and confidence263
Building a set of rules with the Apriori principle265
Example-identifying frequently purchased groceries with association rules266
Step 1-collecting data266
Step 2-exploring and preparing the data267
Data preparation-creating a sparse matrix for transaction data268
Visualizing item support-item frequency plots272
Visualizing the transaction data-plotting the sparse matrix273
Step 3-training a model on the data274
Step 4-evaluating model performance277
Step 5-improving model performance280
Sorting the set of association rules280
Taking subsets of association rules281
Saving association rules to a file or data f?ame283
Summary284
Chapter 9:Finding Groups of Data-Clustering with k-means285
Understanding clustering286
Clustering as a machine learning task286
The k-means clustering algorithm289
Using distance to assign and update clusters290
Choosing the appropriate number of clusters294
Example-finding teen market segments using k-means clustering296
Step 1-collecting data297
Step 2-exploring and preparing the data297
Data preparation-dummy coding missing values299
Data preparation-imputing the missing values300
Step 3-training a model on the data302
Step 4-evaluating model performance304
Step 5-improving model performance308
Summary310
Chapter 10:Evaluating Model Performance311
Measuring performance for classification312
Working with classification prediction data in R313
A closer look at confusion matrices317
Using confusion matrices to measure performance319
Beyond accuracy-other measures of performance321
The kappa statistic323
Sensitivity and specificity326
Precision and recall328
The F-measure330
Visualizing performance trade-offs331
ROC curves332
Estimating future performance336
The holdout method336
Cross-validation340
Bootstrap sampling343
Summary344
Chapter 11:Improving Model Performance347
Tuning stock models for better performance348
Using caret for automated parameter tuning349
Creating a simple tuned model352
Customizing the tuning process355
Improving model performance with meta-learning359
Understanding ensembles359
Bagging362
Boosting366
Random forests369
Training random forests370
Evaluating random forest performance373
Summary375
Chapter 12:Specialized Machine Learning Topics377
Working with proprietary files and databases378
Reading from and writing to Microsoff Excel,SAS,SPSS,and Stata files378
Querying data in SQL databases379
Working with online data and services381
Downloading the complete text of web pages382
Scraping data from web pages383
Parsing XML documents387
Parsing JSON from web APIs388
Working with domain-specific data392
Analyzing bioinformatics data393
Analyzing and visualizing network data393
Improving the performance of R398
Managing very large datasets398
Generalizing tabular data structures with dplyr399
Making data frames faster with data.table401
Creating disk-based data frames with ff402
Using massive matrices with bigmemory404
Learning faster with parallel computing404
Measuring execution time406
Working in parallel with multicore and snow406
Taking advantage of parallel with foreach and doParallel410
Parallel cloud computing with MapReduce and Hadoop411
GPU computing412
Deploying optimized learning algorithms413
Building bigger regression models with biglm414
Growing bigger and faster random forests with bigrf414
Training and evaluating models in parallel with caret414
Summary416
Index417