<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Ask Ghassem - Recent questions tagged scikit-learn</title>
<link>https://ask.ghassem.com/tag/scikit-learn</link>
<description>Powered by Question2Answer</description>
<item>
<title>Kmeans clustering in python - Giving original labels to predicted clusters</title>
<link>https://ask.ghassem.com/1022/kmeans-clustering-python-giving-original-predicted-clusters</link>
<description>&lt;p&gt;I have a dataset with 7 labels in the target variable.&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
X = data.drop(&#039;target&#039;, axis=1)
Y = data[&#039;target&#039;]
Y.unique()&lt;/pre&gt;

&lt;p&gt;array([&#039;Normal_Weight&#039;, &#039;Overweight_Level_I&#039;, &#039;Overweight_Level_II&#039;,&lt;br&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&#039;Obesity_Type_I&#039;, &#039;Insufficient_Weight&#039;, &#039;Obesity_Type_II&#039;,&lt;br&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&#039;Obesity_Type_III&#039;], dtype=object)&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
km = KMeans(n_clusters=7, init=&quot;k-means++&quot;, random_state=300)
km.fit_predict(X)
np.unique(km.labels_)&lt;/pre&gt;

&lt;p&gt;array([0, 1, 2, 3, 4, 5, 6])&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;After performing KMean clustering algorithm with number of clusters as 7, the resulted clusters are labeled as 0,1,2,3,4,5,6. But how to know which real label matches with the predicted label.&lt;/p&gt;

&lt;p&gt;In other words, I want to know how to give original label names to new predicted labels, so that they can be compared like how many values are clustered correctly (Accuracy).&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1022/kmeans-clustering-python-giving-original-predicted-clusters</guid>
<pubDate>Wed, 27 Apr 2022 05:32:54 +0000</pubDate>
</item>
<item>
<title>Can I use a single Pipeline for multiple estimators in scikit-learn?</title>
<link>https://ask.ghassem.com/819/can-use-single-pipeline-for-multiple-estimators-scikit-learn</link>
<description>Is there any proper way to combine multiple classifiers and their parameter grids in one Pipeline?</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/819/can-use-single-pipeline-for-multiple-estimators-scikit-learn</guid>
<pubDate>Tue, 18 Feb 2020 14:14:30 +0000</pubDate>
</item>
<item>
<title>score() vs accuracy_score() in sklearn</title>
<link>https://ask.ghassem.com/777/score-vs-accuracyscore-in-sklearn</link>
<description>Hi,&lt;br /&gt;
&lt;br /&gt;
Since I still have confuse to use the score() &amp;nbsp;and accuracy_score(), so I want to confirm my test assumption.&lt;br /&gt;
Q1: score(), we use the split data to test the accuracy by knn.score(X_test, y_test) to prevent bias using the same training data, right? here knn.score(X_test, y_test) just compare the pair of test value.&lt;br /&gt;
&lt;br /&gt;
Q2: accuracy_score from sklearn.metrics to test the predicted output of target value &amp;quot;y_pred&amp;quot; with the y_test, using accuracy_score(y_test, y_pred), just compare the actual target value and predicted target value?&lt;br /&gt;
&lt;br /&gt;
Q3.My result is the same after using both methods, are they doing the same thing?&lt;br /&gt;
&lt;br /&gt;
Q4.using accuracy_score(), I can using to compare the split training target data y_train with the y_train_pred(return form knn.predict(X_train) ). Then it should be OK now, using it to show the accuracy by accuracy_score(y_train, y_train_pred), since the prediction is done and just compare the original data, then the bias does not exist?&lt;br /&gt;
&lt;br /&gt;
Thanks.</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/777/score-vs-accuracyscore-in-sklearn</guid>
<pubDate>Tue, 21 Jan 2020 21:28:11 +0000</pubDate>
</item>
<item>
<title>Python Machine Learning: Scikit-Learn Tutorial</title>
<link>https://ask.ghassem.com/560/python-machine-learning-scikit-learn-tutorial</link>
<description>&lt;p&gt;Regarding the datacamp tutorial &quot;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.datacamp.com/community/tutorials/machine-learning-python&quot;&gt;Python Machine Learning: Scikit-Learn Tutorial&lt;/a&gt;&quot;, the author is considering the use cases that are relevant to the digits data set, so she can select an appropriate machine learning algorithm. The reader is directed to the &lt;a rel=&quot;nofollow&quot; href=&quot;https://scikit-learn.org/stable/tutorial/machine_learning_map/&quot;&gt;scikit-learn&amp;nbsp;machine learning map&lt;/a&gt;. Here is the excerpt from the tutorial:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;As your use case was one for clustering, you can follow the path on the map towards “KMeans”. You’ll see the use case that you have just thought about requires you to have more than 50 samples (“check!”), to have labeled data (“check!”), to know the number of categories that you want to predict (“check!”) and to have less than 10K samples (“check!”).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;However, if you follow the learning map based on the listed use cases, KMeans is not the algorithm you would arrive at. According to the map, you would only arrive at the KMeans algorithm if you do NOT have labelled data. But the digits dataset contains labels.&lt;/p&gt;

&lt;p&gt;When KMeans&amp;nbsp;does not return optimal results, the learning map suggests trying the Spectral Clustering or GMM algorithms. But the author selected SVC (which is a classification algorithm, not a clustering algorithm), when KMeans didn&#039;t work.&lt;/p&gt;

&lt;p&gt;Did the author select the wrong algorithm or is the learning map incorrect? Should classification or clustering have been used?&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/560/python-machine-learning-scikit-learn-tutorial</guid>
<pubDate>Fri, 01 Feb 2019 02:57:07 +0000</pubDate>
</item>
<item>
<title>What is the difference between normalization and feature scaling</title>
<link>https://ask.ghassem.com/386/what-the-difference-between-normalization-feature-scaling</link>
<description>I am wondering what is the difference between normalization and feature scaling and usually when working on a machine learning project what comes normalization or feature scaling.&lt;br /&gt;
&lt;br /&gt;
Also, it would be nice if somebody can post the sckit-learn library function for normalization and feature scaling ?</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/386/what-the-difference-between-normalization-feature-scaling</guid>
<pubDate>Sun, 14 Oct 2018 22:06:28 +0000</pubDate>
</item>
<item>
<title>What is the fastest way to learn scikit-learn?</title>
<link>https://ask.ghassem.com/153/what-is-the-fastest-way-to-learn-scikit-learn</link>
<description>I know Python and I am looking for the fastest way or a quick tutorial to learn how start using scikit-learn library.</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/153/what-is-the-fastest-way-to-learn-scikit-learn</guid>
<pubDate>Tue, 25 Sep 2018 16:23:43 +0000</pubDate>
</item>
<item>
<title>What is the best roadmap to choose the right estimator in scikit-learn?</title>
<link>https://ask.ghassem.com/150/what-the-best-roadmap-choose-the-right-estimator-scikit-learn</link>
<description>I am looking for a roadmap for choosing the right estimator in scikit-learn</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/150/what-the-best-roadmap-choose-the-right-estimator-scikit-learn</guid>
<pubDate>Tue, 25 Sep 2018 16:03:35 +0000</pubDate>
</item>
</channel>
</rss>