<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Ask Ghassem - Recent questions tagged data-science</title>
<link>https://ask.ghassem.com/tag/data-science</link>
<description>Powered by Question2Answer</description>
<item>
<title>How to analyse imbalanced categorical colum in dataset</title>
<link>https://ask.ghassem.com/1042/how-to-analyse-imbalanced-categorical-colum-in-dataset</link>
<description>Hello,&lt;br /&gt;
&lt;br /&gt;
I have a dataset with a categorical column that contains three categories. One of the categories represents 98% of the data, while the remaining 2% are distributed between the other two categories, with a few (maybe around 50) in each. It is worth mentioning that the output for these 50 rows is the same, which suggests that these data points may be important.&lt;br /&gt;
&lt;br /&gt;
However, the data is obviously imbalanced, and I am unable to perform any analysis. Should I drop the entire column, or perform a chi-square test on the data as-is?</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/1042/how-to-analyse-imbalanced-categorical-colum-in-dataset</guid>
<pubDate>Sat, 24 Jun 2023 17:55:23 +0000</pubDate>
</item>
<item>
<title>Can you verify the validity of this chart comparing the review scores for Marvel Phase 4?</title>
<link>https://ask.ghassem.com/1030/verify-validity-chart-comparing-review-scores-marvel-phase</link>
<description>&lt;p&gt;I have some skepticism about the validity of the charts below comparing the critic and audience reviews for Phase 4 of the MCU to the previous 3 phases. There are over 18 movies and tv shows in Phase 4 compared to the 6 movies in Phases 1 &amp;amp; 2 and the 11 movies in Phase 3. Also, there are far fewer critic reviews for the Phase 4 tv shows than the Phase 4 movies. For example, on Rotten Tomatoes there are only 40 critic reviews for The Falcon and the Winter Soldier and 452 critic reviews for Black Widow. Could this uneven and inconsistent number of reviews between tv shows and movies in Phase 4 be inaccurately making the overall averages higher than they should be? Or do you agree with the conclusions presented in the charts?&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://cdn.discordapp.com/attachments/997145183172964435/1059948060194652230/image.png&quot;&gt;https://cdn.discordapp.com/attachments/997145183172964435/1059948060194652230/image.png&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://cdn.discordapp.com/attachments/997145183172964435/1049356020469739520/image.png&quot;&gt;https://cdn.discordapp.com/attachments/997145183172964435/1049356020469739520/image.png&lt;/a&gt;&lt;/p&gt;</description>
<category>Exploratory Data Analysis</category>
<guid isPermaLink="true">https://ask.ghassem.com/1030/verify-validity-chart-comparing-review-scores-marvel-phase</guid>
<pubDate>Mon, 09 Jan 2023 16:29:14 +0000</pubDate>
</item>
<item>
<title>Creating tables from unstructured texts about stock market</title>
<link>https://ask.ghassem.com/1026/creating-tables-from-unstructured-texts-about-stock-market</link>
<description>&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;p&gt;I am trying to extract information such as profits, revenues and others along with their corresponding dates and quarters from an unstructured text about stock market and convert it into a report in the table form but as there is not format of the input text, it is hard to know which entity belong to what date and quarters and which value belong to which entity. Chunking works on few documents but not enough. Is there any unsupervised way to linking entities with their corresponding dates, values and quarters?&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1026/creating-tables-from-unstructured-texts-about-stock-market</guid>
<pubDate>Tue, 02 Aug 2022 00:47:49 +0000</pubDate>
</item>
<item>
<title>How do I compare the count of a value in each year while having a different sanple size each year.</title>
<link>https://ask.ghassem.com/1025/compare-count-value-each-year-while-having-different-sanple</link>
<description>How do I accurately compare between the number of something a survey measure from my employees each year with a varying umber of survey engagement and employee size?&lt;br /&gt;
&lt;br /&gt;
If I was measuring the satisfaction of my employees over the years by collecting a survey from my them each year by asking them wether they are satisfied or not, and then comparing yes’s over the years but the number of employees who answer is not the same each year and the number of employees increases every year. How do I correctly compare this throughout each year?&lt;br /&gt;
&lt;br /&gt;
In other words, how do I remove the effect of the survey engagement rate when calculating the results?</description>
<category>general</category>
<guid isPermaLink="true">https://ask.ghassem.com/1025/compare-count-value-each-year-while-having-different-sanple</guid>
<pubDate>Wed, 08 Jun 2022 10:32:33 +0000</pubDate>
</item>
<item>
<title>Is it possible to make a forecast of a future value of Air Temperature using Fast Fourier Transform?</title>
<link>https://ask.ghassem.com/1024/possible-forecast-future-value-temperature-fourier-transform</link>
<description>Is it possible to make a forecast of a future value of Air Temperature using Fast Fourier Transform, if yes, what should be the process or how you&amp;#039;ll be able to do it. Thank you!</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/1024/possible-forecast-future-value-temperature-fourier-transform</guid>
<pubDate>Thu, 02 Jun 2022 16:10:26 +0000</pubDate>
</item>
<item>
<title>forecast log transformed fitted values for 2 years using ARMA model</title>
<link>https://ask.ghassem.com/1023/forecast-transformed-fitted-values-years-using-arma-model</link>
<description>Input is a stock price in exponential transformation. We are asked to forecast using ARMA results for 2 years.</description>
<category>Exploratory Data Analysis</category>
<guid isPermaLink="true">https://ask.ghassem.com/1023/forecast-transformed-fitted-values-years-using-arma-model</guid>
<pubDate>Wed, 04 May 2022 20:31:44 +0000</pubDate>
</item>
<item>
<title>Bankruptcy prediction and credit card</title>
<link>https://ask.ghassem.com/1021/bankruptcy-prediction-and-credit-card</link>
<description>Hello everyone newbie data scientist here.&lt;br /&gt;
I&amp;#039;m working on a project to predict companies (probability of default) bankruptcy probability and to assign them a credit rating/score based on that :&lt;br /&gt;
For example below 50 probability is good and above is bad ( just for the example)&lt;br /&gt;
I have a dataset contains financial ratios and a class refers if the company is bankrupted or not (0 and one).&lt;br /&gt;
I&amp;#039;m planning to use this models:&lt;br /&gt;
Logistic regression linear discrimination analysis, decision trees, random forest, ANN, adaboost, Svm.&lt;br /&gt;
&lt;br /&gt;
The question is and i know it is a dumb question:&lt;br /&gt;
Does those models return a probability? Which i can transform to labels, I saw that in a thesis and I&amp;#039;m not sure about it.&lt;br /&gt;
&lt;br /&gt;
Otherwise, any guidance,tips anything will be appreciated.</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1021/bankruptcy-prediction-and-credit-card</guid>
<pubDate>Sun, 10 Apr 2022 05:50:14 +0000</pubDate>
</item>
<item>
<title>I cannot get this code to work. please help.</title>
<link>https://ask.ghassem.com/1018/i-cannot-get-this-code-to-work-please-help</link>
<description>&lt;p&gt;from keras.models import Sequential&amp;nbsp;&lt;br&gt;
from keras.layers import Dense&amp;nbsp;&lt;br&gt;
from keras.layers import LSTM&amp;nbsp;&lt;br&gt;
from sklearn.model_selection import train_test_split&lt;/p&gt;

&lt;p&gt;model = Sequential()&amp;nbsp;&lt;br&gt;
model.add(LSTM( 10, input_shape=(1, 1)))&amp;nbsp;&lt;br&gt;
model.add(Dense(1, activation=&quot;linear&quot;))&amp;nbsp;&lt;br&gt;
model.compile(loss=&quot;mse&quot;, optimizer=&quot;adam&quot;)&lt;/p&gt;

&lt;p&gt;X, y = get_data()&lt;/p&gt;

&lt;p&gt;X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)&lt;br&gt;
X_train_2, X_val, y_train_2, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=1)&lt;/p&gt;

&lt;p&gt;model.fit(X_train, y_train, epochs=800, validation_data=(X_val, y_val), shuffle=False)&lt;/p&gt;
html, body, table, thead, input, textarea, select {color: #bab5ab!important; background: #35393b;} input[type=&quot;text&quot;], textarea, select {color: #bab5ab!important; background: #35393b;} [data-darksite-inline-background-image-gradient] {background: linear-gradient(rgba(0, 0, 0, 0.5), rgba(0, 0, 0, 0.5))!important; -webkit-background-size: cover!important; -moz-background-size: cover!important; -o-background-size: cover!important; background-size: cover!important;} [data-darksite-force-inline-background] * {background-color: rgba(0,0,0,0.7)!important;} [data-darksite-inline-background] {background-color: rgba(0,0,0,0.7)!important;} [data-darksite-inline-color] {color: #fff!important;} [data-darksite-inline-background-image] {background-image: linear-gradient(rgba(0,0,0,0.3), rgba(0,0,0,0.3))!important}
</description>
<category>Python</category>
<guid isPermaLink="true">https://ask.ghassem.com/1018/i-cannot-get-this-code-to-work-please-help</guid>
<pubDate>Mon, 21 Mar 2022 05:59:53 +0000</pubDate>
</item>
<item>
<title>Battery data projects</title>
<link>https://ask.ghassem.com/1017/battery-data-projects</link>
<description>Where can I find projects related to battery data?</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/1017/battery-data-projects</guid>
<pubDate>Wed, 02 Mar 2022 18:11:57 +0000</pubDate>
</item>
<item>
<title>How can you build dynamic pricing model with data only from rigid pricing?</title>
<link>https://ask.ghassem.com/1016/build-dynamic-pricing-model-with-data-only-from-rigid-pricing</link>
<description>I want to build a dynamic pricing model which means if product is too expansive for a client and there is a risk that we might loose a client we lower the price for them but if client doesn&amp;#039;t care that much about the price we might increase price a little.&lt;br /&gt;
&lt;br /&gt;
All the articles I&amp;#039;ve seen describe some kind of A/B testing for the pricing and then create a model.&lt;br /&gt;
&lt;br /&gt;
I want to build a model only on the existing rigid pricing data. So I have prices offered to customers and I know who bought the product and who went to other company.&lt;br /&gt;
&lt;br /&gt;
How can I do the increasing price part?</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/1016/build-dynamic-pricing-model-with-data-only-from-rigid-pricing</guid>
<pubDate>Fri, 21 Jan 2022 06:44:31 +0000</pubDate>
</item>
<item>
<title>Do you usually collect you own data or there is always a resource available for you? Or it depends on the company?</title>
<link>https://ask.ghassem.com/1014/usually-collect-always-resource-available-depends-company</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/1014/usually-collect-always-resource-available-depends-company</guid>
<pubDate>Sun, 09 Jan 2022 22:13:34 +0000</pubDate>
</item>
<item>
<title>When dealing with categorical values, should the &#039;year&#039; column be encoded using OHE or OrdinalEncoder?</title>
<link>https://ask.ghassem.com/1012/dealing-categorical-values-should-encoded-ordinalencoder</link>
<description>It&amp;#039;s a car prices dataset, and so I&amp;#039;m assuming that the more recent the more value a car should have. The values in the &amp;#039;year&amp;#039; column simply consist of years from 1995 to 2020.&lt;br /&gt;
I am trying to predict the selling price of the car.&lt;br /&gt;
&lt;br /&gt;
I&amp;#039;m a bit new to ML, currently still doing my undergraduate so any help / tips are appreciated. Thank you.</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1012/dealing-categorical-values-should-encoded-ordinalencoder</guid>
<pubDate>Sat, 18 Dec 2021 18:46:07 +0000</pubDate>
</item>
<item>
<title>How do I know which encoder to use to convert from categorical variables to numerical?</title>
<link>https://ask.ghassem.com/1006/know-which-encoder-convert-categorical-variables-numerical</link>
<description>So say I have a column with categorical data like different styles of temperature: &amp;#039;Lukewarm&amp;#039;, &amp;#039;Hot&amp;#039;, &amp;#039;Scalding&amp;#039;, &amp;#039;Cold&amp;#039;, &amp;#039;Frostbite&amp;#039;,... etc.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I know that we can use pd.get_dummies to convert the column to numerical data within the dataframe, but I also know that there are other &amp;#039;converters&amp;#039; (not sure if that&amp;#039;s the correct terminology) that we can use, i.e. OneHotEncoder from Sk-learn (like I could use the pipeline module to make a nice pipeline and feed my dataframe through the pipeline to also get my categorical data encoded to numerical).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
How do I know which to use? Does it matter? If it does matter, when does it matter the most (i.e. what types of problems? When there are lots of categorical variables, or few?) If anyone can give me any pointers on this type of stuff I&amp;#039;d greatly appreciate it.</description>
<category>Exploratory Data Analysis</category>
<guid isPermaLink="true">https://ask.ghassem.com/1006/know-which-encoder-convert-categorical-variables-numerical</guid>
<pubDate>Mon, 29 Nov 2021 04:09:06 +0000</pubDate>
</item>
<item>
<title>Text Mining, Artificial Neural Networks, Speech Processing, Cloud Computing in DS? Essential for a good Data Scientist ?</title>
<link>https://ask.ghassem.com/1004/artificial-networks-processing-computing-essential-scientist</link>
<description></description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/1004/artificial-networks-processing-computing-essential-scientist</guid>
<pubDate>Wed, 27 Oct 2021 19:15:16 +0000</pubDate>
</item>
<item>
<title>should i start as a data analyst then data science?</title>
<link>https://ask.ghassem.com/994/should-i-start-as-a-data-analyst-then-data-science</link>
<description>should I start as a data analyst then data science?&lt;br /&gt;
&lt;br /&gt;
I am a second-year Bachelor&amp;#039;s in Computer Science and wanted to pursue to be a Data Scientist.&lt;br /&gt;
&lt;br /&gt;
However, when I am trying to apply for internships/jobs, most of it requires a Masters&amp;#039;s/Ph.D.&lt;br /&gt;
&lt;br /&gt;
But, a Data Analyst has fewer requirements.&lt;br /&gt;
&lt;br /&gt;
Do you recommend starting off as a Data Analyst and then change to Data Science?</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/994/should-i-start-as-a-data-analyst-then-data-science</guid>
<pubDate>Mon, 21 Jun 2021 20:31:04 +0000</pubDate>
</item>
<item>
<title>How to calculate average with deviating sensors?</title>
<link>https://ask.ghassem.com/983/how-to-calculate-average-with-deviating-sensors</link>
<description>In case of 3 sensors reporting loads of values individually.. one sensor might be off. The average of the 2 trustworthy sensors is to be reported.. the third in need for recalibration is to be neglected. I&amp;#039;m in need of an (excel) formula looking at three columns which row-by-row detects a significant deviation compared to the others and calculate the average of the most trustworthy.&lt;br /&gt;
Example:&lt;br /&gt;
48.1 ; 45.2 ; 45.4 =&amp;gt; 45.3, as sensor 1 is way off....&lt;br /&gt;
36.0 ; 37;0 ; 45.0 =&amp;gt; 36.5, as sensor 3 is way off....&lt;br /&gt;
36.0 ; 36;5 ; 37.0 =&amp;gt; 36.5 as the deviation is too small to be considered an anomaly, so all values are valid to create the average.&lt;br /&gt;
&lt;br /&gt;
Working with long periods of time.. the readings might be trustworthy for a few weeks, but in defect from moment X up until now... so simply ruling out one sensor is not really an option either.. What is the best way forward?&lt;br /&gt;
Please help. Highly appreciated.</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/983/how-to-calculate-average-with-deviating-sensors</guid>
<pubDate>Tue, 04 May 2021 14:39:14 +0000</pubDate>
</item>
<item>
<title>design a computer-based system that will encourage autistic children to communicate and express themselves better.</title>
<link>https://ask.ghassem.com/982/computer-encourage-autistic-children-communicate-themselves</link>
<description>a) A company has been asked to design a computer-based system that will encourage autistic children to communicate and express themselves better.&lt;br /&gt;
&lt;br /&gt;
b) What type of interaction would be appropriate to use at the interface for this particular user group?</description>
<category>Human Computer Interaction</category>
<guid isPermaLink="true">https://ask.ghassem.com/982/computer-encourage-autistic-children-communicate-themselves</guid>
<pubDate>Thu, 01 Apr 2021 07:04:59 +0000</pubDate>
</item>
<item>
<title>guidance on sequencing data science courses below</title>
<link>https://ask.ghassem.com/844/guidance-on-sequencing-data-science-courses-below</link>
<description>Hello&lt;br /&gt;
my name is lutaaya mudathiru.&lt;br /&gt;
&lt;br /&gt;
I am planning to start data science online&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;professional courses at Harvard&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;University, but i don&amp;#039;t know which course &amp;nbsp;i should begin with . I request for help in sequencing these courses below so that i can&lt;br /&gt;
&lt;br /&gt;
benefitt more:&lt;br /&gt;
&lt;br /&gt;
1. Principles, Statistical and Computational Tools for Reproducible Science.&lt;br /&gt;
&lt;br /&gt;
2.Data Science: Inference and Modeling.&lt;br /&gt;
&lt;br /&gt;
3. Data Science: Productivity Tools&lt;br /&gt;
&lt;br /&gt;
4.Data Science: Wrangling&lt;br /&gt;
&lt;br /&gt;
5.Data Science: Linear Regression.&lt;br /&gt;
&lt;br /&gt;
6.Data Science: Machine Learning&lt;br /&gt;
&lt;br /&gt;
7.Data Science: Capstone&lt;br /&gt;
&lt;br /&gt;
8. Data Science: R Basics&lt;br /&gt;
&lt;br /&gt;
9.DataScience:Visualization&lt;br /&gt;
&lt;br /&gt;
10. DataScience:Probability.&lt;br /&gt;
&lt;br /&gt;
11. High-Dimensional Data Analysis&lt;br /&gt;
&lt;br /&gt;
12. Introduction to Linear Models and Matrix Algebra&lt;br /&gt;
&lt;br /&gt;
13. Data science:Statistics and R&lt;br /&gt;
&lt;br /&gt;
14. Fat Chance: Probability from the Ground Up&lt;br /&gt;
&lt;br /&gt;
15. Introduction to Probability (on edX)</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/844/guidance-on-sequencing-data-science-courses-below</guid>
<pubDate>Fri, 20 Mar 2020 13:55:49 +0000</pubDate>
</item>
<item>
<title>What are the differences among Data Science, Artificial Intelligence and Machine Learning?</title>
<link>https://ask.ghassem.com/842/differences-science-artificial-intelligence-machine-learning</link>
<description>What are the differences among Data Science, Artificial Intelligence and Machine Learning?</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/842/differences-science-artificial-intelligence-machine-learning</guid>
<pubDate>Thu, 05 Mar 2020 03:02:31 +0000</pubDate>
</item>
<item>
<title>What are the most common data types in data science?</title>
<link>https://ask.ghassem.com/834/what-are-the-most-common-data-types-in-data-science</link>
<description>What are the main data types?</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/834/what-are-the-most-common-data-types-in-data-science</guid>
<pubDate>Wed, 19 Feb 2020 17:28:54 +0000</pubDate>
</item>
<item>
<title>How can I prove that following is a tautology (using laws of logical equivalences)</title>
<link>https://ask.ghassem.com/818/prove-that-following-tautology-using-logical-equivalences</link>
<description>$(p → q) ∧ (q → r) → (p → r)$&lt;br /&gt;
&lt;br /&gt;
Without Truth Tables!</description>
<category>Discrete Mathematics</category>
<guid isPermaLink="true">https://ask.ghassem.com/818/prove-that-following-tautology-using-logical-equivalences</guid>
<pubDate>Sat, 15 Feb 2020 17:54:02 +0000</pubDate>
</item>
<item>
<title>How to filter a dataframe?</title>
<link>https://ask.ghassem.com/775/how-to-filter-a-dataframe</link>
<description>&lt;p&gt;Consider the Pandas DataDrame&amp;nbsp;&lt;code&gt;df&lt;/code&gt;&amp;nbsp;below. Filter it appropriately so that it outputs the shown results.&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
     gh owner language      repo  stars
0  pandas-dev   python    pandas  17800
1   tidyverse        R     dplyr   2800
2   tidyverse        R   ggplot2   3500
3      has2k1   python  plotnine   1450&lt;/pre&gt;

&lt;h2&gt;Expected Output&lt;/h2&gt;

&lt;pre class=&quot;prettyprint lang-&quot; data-pbcklang=&quot;&quot; data-pbcktabsize=&quot;&quot;&gt;
     gh owner language    repo  stars
0  pandas-dev   python  pandas  17800&lt;/pre&gt;</description>
<category>Python Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/775/how-to-filter-a-dataframe</guid>
<pubDate>Wed, 25 Dec 2019 05:56:14 +0000</pubDate>
</item>
<item>
<title>Can anyone please solve Q7 pg 329 in Induction?</title>
<link>https://ask.ghassem.com/761/can-anyone-please-solve-q7-pg-329-in-induction</link>
<description></description>
<category>Discrete Mathematics</category>
<guid isPermaLink="true">https://ask.ghassem.com/761/can-anyone-please-solve-q7-pg-329-in-induction</guid>
<pubDate>Thu, 05 Dec 2019 22:02:15 +0000</pubDate>
</item>
<item>
<title>Individual and group relative strength in a fixed pool of players: How to approach the problem?</title>
<link>https://ask.ghassem.com/751/individual-group-relative-strength-players-approach-problem</link>
<description>&lt;div&gt;I apologize in advance if my question sounds too basic to be worthy of anyone&#039;s time, but statistics are not part of my curriculum.&lt;/div&gt;

&lt;div&gt;
&lt;p&gt;I am developing a proof of concept of a web application modeling the contribution of individual soccer player with respect to the different teams they&#039;ve played with throughout their career. In particular, I am looking into a way of &lt;em&gt;ranking&lt;/em&gt; both individuals and groups of players as follows::&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;teammates relative strength&lt;/strong&gt;: the best/worst combinations of players when playing in the same team in the same matches;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;opponents relative strength&lt;/strong&gt;: the best/worst combinations of players when playing in opposite teams in the same matches, i.e. which tuples of teammates are the best/worst against which;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I must admit I don&#039;t quite know how to approach the problem (as I said I have no formal education in statistics or data science). I would be very grateful&amp;nbsp; if anyone could give me some directions. How should I frame this particular problem and what resources in statistics or machine learning (if indeed this is a task fit for machine learning, perhaps I am mistaken on this) would be appropriate to tackle it?&lt;/p&gt;

&lt;p&gt;I am eager to learn, so both practical examples or theoretical references (book chapters, online articles, etc) would be very welcome.&lt;/p&gt;

&lt;p&gt;Thanks in advance!&lt;/p&gt;
&lt;/div&gt;</description>
<category>Statistics</category>
<guid isPermaLink="true">https://ask.ghassem.com/751/individual-group-relative-strength-players-approach-problem</guid>
<pubDate>Tue, 29 Oct 2019 20:00:28 +0000</pubDate>
</item>
<item>
<title>ideas and opinion on what kind of analyses needs to be done</title>
<link>https://ask.ghassem.com/713/ideas-and-opinion-on-what-kind-of-analyses-needs-to-be-done</link>
<description>Hello Guys,&lt;br /&gt;
&lt;br /&gt;
I have been given a task where I need to analyse the impact of all the products that were discontinued last year on the customers and sales? I work for a retail company which has 100s of stores in North America and my analyses needs to be &amp;quot;Qualitatively better than the regular BI&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
I quote that because thats what I was asked to do. I am struggling to come up with ideas. What kind of hypothesis should I test. Or any Bayesian analyses that can be done. Any ideas within the realms of data science and machine learning is welcome.&lt;br /&gt;
&lt;br /&gt;
I generally do my analyses using Jupyter Notebook and my skill set consist of SQL, Python including ML libraries like sklearn</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/713/ideas-and-opinion-on-what-kind-of-analyses-needs-to-be-done</guid>
<pubDate>Fri, 26 Jul 2019 18:38:55 +0000</pubDate>
</item>
<item>
<title>What are the most important Python libraries for data science?</title>
<link>https://ask.ghassem.com/677/what-are-the-most-important-python-libraries-for-data-science</link>
<description></description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/677/what-are-the-most-important-python-libraries-for-data-science</guid>
<pubDate>Mon, 08 Jul 2019 04:42:28 +0000</pubDate>
</item>
<item>
<title>Using aggregate data to generate observation-level data statistically sound?</title>
<link>https://ask.ghassem.com/644/using-aggregate-generate-observation-level-statistically</link>
<description>&lt;p&gt;Context: In the realm of Paid Search Marketing. Current reporting does not provide event level data only aggregate totals with different segments.&amp;nbsp; Want to compare distributions/test statistical significance of A/B test results.&amp;nbsp; Did not want to assume that data followed normal distribution or know STDEV&amp;nbsp;for data so came with this approach.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;My Question: I am going to use the average &quot;CPA&quot; or &quot;CTR&quot; for a date range, and generate an observation for each conversion based off the average for a time range.&amp;nbsp; Is this statistically sound way if I want to generate raw data? Would I have wonky distributions because of the multiple averages?&amp;nbsp; Just want a gutcheck if I&#039;m completely off base.&amp;nbsp;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;My Aggregate data looks like below:&lt;/p&gt;

&lt;table border=&quot;1&quot; cellpadding=&quot;1&quot; style=&quot;width:500px&quot;&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th scope=&quot;col&quot;&gt;Day&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Cost&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Acquisition&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;CPA or CTR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp;1&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 40&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp;2&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 75&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$25&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Observation data I generate looks like below:&lt;/p&gt;

&lt;table border=&quot;1&quot; cellpadding=&quot;1&quot; style=&quot;width:500px&quot;&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th scope=&quot;col&quot;&gt;Day&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Acquisition&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$25&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$25&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;I really appreciate your help with this question! An important project to me at work.&amp;nbsp;&amp;nbsp;&lt;/p&gt;</description>
<category>general</category>
<guid isPermaLink="true">https://ask.ghassem.com/644/using-aggregate-generate-observation-level-statistically</guid>
<pubDate>Tue, 11 Jun 2019 22:04:01 +0000</pubDate>
</item>
<item>
<title>What are the available libraries for continuous time hidden markov models ?</title>
<link>https://ask.ghassem.com/640/what-available-libraries-continuous-hidden-markov-models</link>
<description></description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/640/what-available-libraries-continuous-hidden-markov-models</guid>
<pubDate>Fri, 07 Jun 2019 13:27:05 +0000</pubDate>
</item>
<item>
<title>Is digital marketing and marketing internships worth it for a data science student?</title>
<link>https://ask.ghassem.com/606/digital-marketing-marketing-internships-science-student</link>
<description>it&amp;#039;s about data science career</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/606/digital-marketing-marketing-internships-science-student</guid>
<pubDate>Thu, 04 Apr 2019 22:58:54 +0000</pubDate>
</item>
<item>
<title>How do I reduce RMSE in a Random Forest Regressor?</title>
<link>https://ask.ghassem.com/600/how-do-i-reduce-rmse-in-a-random-forest-regressor</link>
<description>I preprocessed the data, normalized the numerical features, and did one hot encoding for the categorical ones. I end up with a model with R^2=0.7 and RMSE which is 15% of the range of values.&lt;br /&gt;
I&amp;#039;m okay with the accuracy but I was wondering if there&amp;#039;s a way to reduce RMSE to maybe ~7%?&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Let me know please.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thanks!</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/600/how-do-i-reduce-rmse-in-a-random-forest-regressor</guid>
<pubDate>Wed, 03 Apr 2019 12:24:09 +0000</pubDate>
</item>
<item>
<title>Determine weights on the paths that connect to the different data points in a neural network?</title>
<link>https://ask.ghassem.com/589/determine-weights-connect-different-points-neural-network</link>
<description>How do you determine the weight values that connect to the other data points when solving for our output in neural networks?</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/589/determine-weights-connect-different-points-neural-network</guid>
<pubDate>Mon, 18 Mar 2019 23:35:25 +0000</pubDate>
</item>
<item>
<title>I fit a many different models to my data set and I still get horrible accuracy. Any suggestions?</title>
<link>https://ask.ghassem.com/581/many-different-models-still-horrible-accuracy-suggestions</link>
<description>I&amp;#039;m trying to create a regression with a data set that does not contain any nulls and has a very few outliers. I fit a linear regression, a random forest, and a gbm model but they all have terrible accuracy.&lt;br /&gt;
&lt;br /&gt;
Any suggestions on how to move on from this point? Feeling like I&amp;#039;ve hit a road block.&lt;br /&gt;
&lt;br /&gt;
Thanks!</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/581/many-different-models-still-horrible-accuracy-suggestions</guid>
<pubDate>Thu, 07 Mar 2019 17:53:00 +0000</pubDate>
</item>
<item>
<title>How do you visualize/present the predicted output of a random forest model with large trees?</title>
<link>https://ask.ghassem.com/575/visualize-present-predicted-output-random-forest-model-large</link>
<description>I’ve heard that it’s hard to visualize the output of &amp;nbsp;random forest models with large trees/forest but I’m finding it hard to understand what the use case for the model is, if you can’t visualize the outputs? How do you use the predictions then? Is there a way to visualize this?</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/575/visualize-present-predicted-output-random-forest-model-large</guid>
<pubDate>Thu, 28 Feb 2019 17:10:03 +0000</pubDate>
</item>
<item>
<title>Do you have a cheatsheet for Data Science?!</title>
<link>https://ask.ghassem.com/573/do-you-have-a-cheatsheet-for-data-science</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/573/do-you-have-a-cheatsheet-for-data-science</guid>
<pubDate>Wed, 27 Feb 2019 05:51:00 +0000</pubDate>
</item>
<item>
<title>What is degree of Freedom while calculating confidence interval?</title>
<link>https://ask.ghassem.com/544/what-degree-freedom-while-calculating-confidence-interval</link>
<description></description>
<category>Statistics</category>
<guid isPermaLink="true">https://ask.ghassem.com/544/what-degree-freedom-while-calculating-confidence-interval</guid>
<pubDate>Fri, 07 Dec 2018 22:27:27 +0000</pubDate>
</item>
<item>
<title>What is &#039;Degrees of Freedom&#039;?</title>
<link>https://ask.ghassem.com/527/what-is-degrees-of-freedom</link>
<description>What is &amp;#039;Degrees of Freedom&amp;#039;?</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/527/what-is-degrees-of-freedom</guid>
<pubDate>Thu, 29 Nov 2018 18:49:06 +0000</pubDate>
</item>
<item>
<title>What is the difference between univariate and multivariate analysis?</title>
<link>https://ask.ghassem.com/491/what-difference-between-univariate-multivariate-analysis</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/491/what-difference-between-univariate-multivariate-analysis</guid>
<pubDate>Tue, 30 Oct 2018 11:39:08 +0000</pubDate>
</item>
<item>
<title>How will you create a classification to identify key customer trends in unstructured data?</title>
<link>https://ask.ghassem.com/461/create-classification-identify-customer-trends-unstructured</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/461/create-classification-identify-customer-trends-unstructured</guid>
<pubDate>Sun, 28 Oct 2018 11:45:43 +0000</pubDate>
</item>
<item>
<title>Explain the typical data analysis process.</title>
<link>https://ask.ghassem.com/459/explain-the-typical-data-analysis-process</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/459/explain-the-typical-data-analysis-process</guid>
<pubDate>Sun, 28 Oct 2018 11:43:46 +0000</pubDate>
</item>
<item>
<title>What is the difference between Data Mining and Data Analysis?</title>
<link>https://ask.ghassem.com/458/what-is-the-difference-between-data-mining-and-data-analysis</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/458/what-is-the-difference-between-data-mining-and-data-analysis</guid>
<pubDate>Sun, 28 Oct 2018 11:42:45 +0000</pubDate>
</item>
<item>
<title>Which scenarios among the following are a valid reason to use regularization?</title>
<link>https://ask.ghassem.com/451/which-scenarios-among-following-valid-reason-regularization</link>
<description>A. To drop the least useful variables of a model&lt;br /&gt;
&lt;br /&gt;
B. To reduce over-fitting&lt;br /&gt;
&lt;br /&gt;
C. To reduce the bias of a model&lt;br /&gt;
&lt;br /&gt;
D. To decrease p-value</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/451/which-scenarios-among-following-valid-reason-regularization</guid>
<pubDate>Sat, 27 Oct 2018 17:31:43 +0000</pubDate>
</item>
<item>
<title>What are basic steps for treating missing values?</title>
<link>https://ask.ghassem.com/430/what-are-basic-steps-for-treating-missing-values</link>
<description></description>
<category>Exploratory Data Analysis</category>
<guid isPermaLink="true">https://ask.ghassem.com/430/what-are-basic-steps-for-treating-missing-values</guid>
<pubDate>Fri, 19 Oct 2018 04:08:48 +0000</pubDate>
</item>
<item>
<title>What are the general steps in data cleaning?</title>
<link>https://ask.ghassem.com/427/what-are-the-general-steps-in-data-cleaning</link>
<description></description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/427/what-are-the-general-steps-in-data-cleaning</guid>
<pubDate>Fri, 19 Oct 2018 03:54:38 +0000</pubDate>
</item>
<item>
<title>What is k-means algorithm and how can we select K for it?</title>
<link>https://ask.ghassem.com/394/what-is-k-means-algorithm-and-how-can-we-select-k-for-it</link>
<description></description>
<category>Machine Learning Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/394/what-is-k-means-algorithm-and-how-can-we-select-k-for-it</guid>
<pubDate>Mon, 15 Oct 2018 05:49:41 +0000</pubDate>
</item>
<item>
<title>Explain the Recommender Systems and give an example?</title>
<link>https://ask.ghassem.com/393/explain-the-recommender-systems-and-give-an-example</link>
<description></description>
<category>Machine Learning Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/393/explain-the-recommender-systems-and-give-an-example</guid>
<pubDate>Mon, 15 Oct 2018 05:47:14 +0000</pubDate>
</item>
<item>
<title>What are the main steps in making a decision tree?</title>
<link>https://ask.ghassem.com/337/what-are-the-main-steps-in-making-a-decision-tree</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/337/what-are-the-main-steps-in-making-a-decision-tree</guid>
<pubDate>Fri, 12 Oct 2018 02:17:19 +0000</pubDate>
</item>
<item>
<title>Please explain Linear Regression with an example?</title>
<link>https://ask.ghassem.com/336/please-explain-linear-regression-with-an-example</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/336/please-explain-linear-regression-with-an-example</guid>
<pubDate>Fri, 12 Oct 2018 02:14:00 +0000</pubDate>
</item>
<item>
<title>How to return the outliers by having a list of numbers ?</title>
<link>https://ask.ghassem.com/301/how-to-return-the-outliers-by-having-a-list-of-numbers</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/301/how-to-return-the-outliers-by-having-a-list-of-numbers</guid>
<pubDate>Mon, 08 Oct 2018 12:19:22 +0000</pubDate>
</item>
<item>
<title>What are Natural Language Processing (NLP) and its applications?</title>
<link>https://ask.ghassem.com/297/what-are-natural-language-processing-nlp-and-applications</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/297/what-are-natural-language-processing-nlp-and-applications</guid>
<pubDate>Mon, 08 Oct 2018 11:59:52 +0000</pubDate>
</item>
<item>
<title>What is TF-IDF algorithm?</title>
<link>https://ask.ghassem.com/296/what-is-tf-idf-algorithm</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/296/what-is-tf-idf-algorithm</guid>
<pubDate>Mon, 08 Oct 2018 11:57:39 +0000</pubDate>
</item>
</channel>
</rss>