<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Ask Ghassem - Recent questions tagged imbalanced-data</title>
<link>https://ask.ghassem.com/tag/imbalanced-data</link>
<description>Powered by Question2Answer</description>
<item>
<title>How to analyse imbalanced categorical colum in dataset</title>
<link>https://ask.ghassem.com/1042/how-to-analyse-imbalanced-categorical-colum-in-dataset</link>
<description>Hello,&lt;br /&gt;
&lt;br /&gt;
I have a dataset with a categorical column that contains three categories. One of the categories represents 98% of the data, while the remaining 2% are distributed between the other two categories, with a few (maybe around 50) in each. It is worth mentioning that the output for these 50 rows is the same, which suggests that these data points may be important.&lt;br /&gt;
&lt;br /&gt;
However, the data is obviously imbalanced, and I am unable to perform any analysis. Should I drop the entire column, or perform a chi-square test on the data as-is?</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/1042/how-to-analyse-imbalanced-categorical-colum-in-dataset</guid>
<pubDate>Sat, 24 Jun 2023 17:55:23 +0000</pubDate>
</item>
</channel>
</rss>