<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Ask Ghassem - Recent questions tagged test-set</title>
<link>https://ask.ghassem.com/tag/test-set</link>
<description>Powered by Question2Answer</description>
<item>
<title>How do I know which encoder to use to convert from categorical variables to numerical?</title>
<link>https://ask.ghassem.com/1006/know-which-encoder-convert-categorical-variables-numerical</link>
<description>So say I have a column with categorical data like different styles of temperature: &amp;#039;Lukewarm&amp;#039;, &amp;#039;Hot&amp;#039;, &amp;#039;Scalding&amp;#039;, &amp;#039;Cold&amp;#039;, &amp;#039;Frostbite&amp;#039;,... etc.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I know that we can use pd.get_dummies to convert the column to numerical data within the dataframe, but I also know that there are other &amp;#039;converters&amp;#039; (not sure if that&amp;#039;s the correct terminology) that we can use, i.e. OneHotEncoder from Sk-learn (like I could use the pipeline module to make a nice pipeline and feed my dataframe through the pipeline to also get my categorical data encoded to numerical).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
How do I know which to use? Does it matter? If it does matter, when does it matter the most (i.e. what types of problems? When there are lots of categorical variables, or few?) If anyone can give me any pointers on this type of stuff I&amp;#039;d greatly appreciate it.</description>
<category>Exploratory Data Analysis</category>
<guid isPermaLink="true">https://ask.ghassem.com/1006/know-which-encoder-convert-categorical-variables-numerical</guid>
<pubDate>Mon, 29 Nov 2021 04:09:06 +0000</pubDate>
</item>
<item>
<title>What is the difference between cross-validation and validation set?</title>
<link>https://ask.ghassem.com/648/what-the-difference-between-cross-validation-and-validation</link>
<description>&lt;p&gt;I am confused about this figure. Is not this&amp;nbsp;a cross-validation test or we have a fixed few examples for which it is tested while you also have various folds being tested at the same time?&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://i.imgur.com/aVru1MX.png&quot;&gt;https://i.imgur.com/aVru1MX.png&lt;/a&gt;&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/648/what-the-difference-between-cross-validation-and-validation</guid>
<pubDate>Wed, 19 Jun 2019 18:39:39 +0000</pubDate>
</item>
<item>
<title>How do I know when it is appropriate to use stratified sampling?</title>
<link>https://ask.ghassem.com/568/how-do-know-when-it-is-appropriate-to-use-stratified-sampling</link>
<description></description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/568/how-do-know-when-it-is-appropriate-to-use-stratified-sampling</guid>
<pubDate>Tue, 19 Feb 2019 18:52:48 +0000</pubDate>
</item>
<item>
<title>What are Training set, Validation set, Test set, and Gold set in supervised and unsupervised machine learning?</title>
<link>https://ask.ghassem.com/294/training-validation-supervised-unsupervised-machine-learning</link>
<description></description>
<category>Machine Learning Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/294/training-validation-supervised-unsupervised-machine-learning</guid>
<pubDate>Mon, 08 Oct 2018 11:48:29 +0000</pubDate>
</item>
</channel>
</rss>