Ask Ghassem - Recent questions tagged scikit-learn

Kmeans clustering in python - Giving original labels to predicted clusters

Wed, 27 Apr 2022 05:32:54 +0000

I have a dataset with 7 labels in the target variable.

X = data.drop('target', axis=1)
Y = data['target']
Y.unique()

array(['Normal_Weight', 'Overweight_Level_I', 'Overweight_Level_II',
'Obesity_Type_I', 'Insufficient_Weight', 'Obesity_Type_II',
'Obesity_Type_III'], dtype=object)

km = KMeans(n_clusters=7, init="k-means++", random_state=300)
km.fit_predict(X)
np.unique(km.labels_)

array([0, 1, 2, 3, 4, 5, 6])

After performing KMean clustering algorithm with number of clusters as 7, the resulted clusters are labeled as 0,1,2,3,4,5,6. But how to know which real label matches with the predicted label.

In other words, I want to know how to give original label names to new predicted labels, so that they can be compared like how many values are clustered correctly (Accuracy).

Can I use a single Pipeline for multiple estimators in scikit-learn?

Tue, 18 Feb 2020 14:14:30 +0000

Is there any proper way to combine multiple classifiers and their parameter grids in one Pipeline?

score() vs accuracy_score() in sklearn

Tue, 21 Jan 2020 21:28:11 +0000

Hi,

Since I still have confuse to use the score() and accuracy_score(), so I want to confirm my test assumption.
Q1: score(), we use the split data to test the accuracy by knn.score(X_test, y_test) to prevent bias using the same training data, right? here knn.score(X_test, y_test) just compare the pair of test value.

Q2: accuracy_score from sklearn.metrics to test the predicted output of target value "y_pred" with the y_test, using accuracy_score(y_test, y_pred), just compare the actual target value and predicted target value?

Q3.My result is the same after using both methods, are they doing the same thing?

Q4.using accuracy_score(), I can using to compare the split training target data y_train with the y_train_pred(return form knn.predict(X_train) ). Then it should be OK now, using it to show the accuracy by accuracy_score(y_train, y_train_pred), since the prediction is done and just compare the original data, then the bias does not exist?

Thanks.

Python Machine Learning: Scikit-Learn Tutorial

Fri, 01 Feb 2019 02:57:07 +0000

Regarding the datacamp tutorial "Python Machine Learning: Scikit-Learn Tutorial", the author is considering the use cases that are relevant to the digits data set, so she can select an appropriate machine learning algorithm. The reader is directed to the scikit-learn machine learning map. Here is the excerpt from the tutorial:

As your use case was one for clustering, you can follow the path on the map towards “KMeans”. You’ll see the use case that you have just thought about requires you to have more than 50 samples (“check!”), to have labeled data (“check!”), to know the number of categories that you want to predict (“check!”) and to have less than 10K samples (“check!”).

However, if you follow the learning map based on the listed use cases, KMeans is not the algorithm you would arrive at. According to the map, you would only arrive at the KMeans algorithm if you do NOT have labelled data. But the digits dataset contains labels.

When KMeans does not return optimal results, the learning map suggests trying the Spectral Clustering or GMM algorithms. But the author selected SVC (which is a classification algorithm, not a clustering algorithm), when KMeans didn't work.

Did the author select the wrong algorithm or is the learning map incorrect? Should classification or clustering have been used?

What is the difference between normalization and feature scaling

Sun, 14 Oct 2018 22:06:28 +0000

I am wondering what is the difference between normalization and feature scaling and usually when working on a machine learning project what comes normalization or feature scaling.

Also, it would be nice if somebody can post the sckit-learn library function for normalization and feature scaling ?

What is the fastest way to learn scikit-learn?

Tue, 25 Sep 2018 16:23:43 +0000

I know Python and I am looking for the fastest way or a quick tutorial to learn how start using scikit-learn library.

What is the best roadmap to choose the right estimator in scikit-learn?

Tue, 25 Sep 2018 16:03:35 +0000

I am looking for a roadmap for choosing the right estimator in scikit-learn