| Syntax | Usage | Description | 
| model_selection.cross_val_score | Cross-validation phase | Estimate the cross-validation score | 
| model_selection.KFold | Cross-validation phase | Divide the dataset into k folds for cross validation | 
| model_selection.StratifiedKFold | Cross-validation phase | Stratified validation that takes into account the distribution of the classes you predict | 
| model_selection.train_test_split | Cross-validation phase | Split your data into training and test sets | 
| decomposition.PCA | Dimensionality reduction | Principal component analysis (PCA) | 
| decomposition.RandomizedPCA | Dimensionality reduction | Principal component analysis (PCA) using randomized SVD | 
| feature_extraction.FeatureHasher | Preparing your data | The hashing trick, allowing you to accommodate a large number of features in your dataset | 
| feature_extraction.text.CountVectorizer | Preparing your data | Convert text documents into a matrix of count data | 
| feature_extraction.text.HashingVectorizer | Preparing your data | Directly convert your text using the hashing trick | 
| feature_extraction.text.TfidfVectorizer | Preparing your data | Creates a dataset of TF-IDF features | 
| feature_selection.RFECV | Feature selection | Automatic feature selection | 
| model_selection.GridSearchCV | Optimization | Exhaustive search in order to maximize a machine learning algorithm | 
| linear_model.LinearRegression | Prediction | Linear regression | 
| linear_model.LogisticRegression | Prediction | Linear logistic regression | 
| metrics.accuracy_score | Solution evaluation | Accuracy classification score | 
| metrics.f1_score | Solution evaluation | Compute the F1 score, balancing accuracy and recall | 
| metrics.mean_absolute_error | Solution evaluation | Mean absolute error regression error | 
| metrics.mean_squared_error | Solution evaluation | Mean squared error regression error | 
| metrics.roc_auc_score | Solution evaluation | Compute Area Under the Curve (AUC) from prediction scores | 
| naive_bayes.MultinomialNB | Prediction | Multinomial Naïve Bayes | 
| neighbors.KNeighborsClassifier | Prediction | K-Neighbors classification | 
| preprocessing.Binarizer | Preparing your data | Create binary variables (feature values to 0 or 1) | 
| preprocessing.Imputer | Preparing your data | Missing values imputation | 
| preprocessing.MinMaxScaler | Preparing your data | Create variables bound by a minimum and maximum value | 
| preprocessing.OneHotEncoder | Preparing your data | Transform categorical integer features into binary ones | 
| preprocessing.StandardScaler | Preparing your data | Variable standardization by removing the mean and scaling to unit variance | 
Scikit-learn is a focal point for data science work with Python, so it pays to know which methods you need most. The following table provides a brief overview of the most important methods used for data analysis.



