<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Ask Ghassem - Recent activity</title>
<link>https://ask.ghassem.com/activity</link>
<description>Powered by Question2Answer</description>
<item>
<title>Answered: How to analyse imbalanced categorical colum in dataset</title>
<link>https://ask.ghassem.com/1042/how-to-analyse-imbalanced-categorical-colum-in-dataset?show=1051#a1051</link>
<description>&lt;p&gt;For imbalanced categorical data, you shouldn&#039;t drop the column. Instead, you can try techniques like oversampling the minority class or using models that handle imbalanced &lt;a rel=&quot;follow not-nofollow&quot; href=&quot;https://fatafatkolkata.net/&quot;&gt;&lt;span style=&quot;color:#0f0f0f&quot;&gt;data&lt;/span&gt;&lt;/a&gt; well, like XGBoost. This way, you can still extract useful insights without losing important information.&lt;/p&gt;</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/1042/how-to-analyse-imbalanced-categorical-colum-in-dataset?show=1051#a1051</guid>
<pubDate>Thu, 19 Feb 2026 18:40:15 +0000</pubDate>
</item>
<item>
<title>Answered: Step-by-Step Hidden State Calculation in a Recurrent Neural Network</title>
<link>https://ask.ghassem.com/1049/step-step-hidden-state-calculation-recurrent-neural-network?show=1050#a1050</link>
<description>&lt;p&gt;We compute each hidden state step-by-step using&lt;/p&gt;

&lt;p&gt;$$ h_t = \text{ReLU}(W_{ih} \cdot x_t + W_{hh} \cdot h_{t-1}). $$&lt;/p&gt;

&lt;p&gt;\( h_1 = \text{ReLU}(0.4 \cdot 3 + 0.6 \cdot 0) = 1.2 \)&lt;/p&gt;

&lt;p&gt;\( h_2 = \text{ReLU}(0.4 \cdot 3 + 0.6 \cdot 1.2) = 1.92 \)&lt;/p&gt;

&lt;p&gt;\( h_3 = \text{ReLU}(0.4 \cdot 3 + 0.6 \cdot 1.92) = 2.352 \)&lt;/p&gt;

&lt;p&gt;\( h_4 = \text{ReLU}(0.4 \cdot 3 + 0.6 \cdot 2.352) = 2.6112 \)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final Answer: \( h_4 = 2.6112 \)&lt;/strong&gt;&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1049/step-step-hidden-state-calculation-recurrent-neural-network?show=1050#a1050</guid>
<pubDate>Mon, 01 Dec 2025 18:33:19 +0000</pubDate>
</item>
<item>
<title>Answered: How to calculate feed-forward (forward-propagation) in neural network for classification?</title>
<link>https://ask.ghassem.com/1047/calculate-forward-forward-propagation-network-classification?show=1048#a1048</link>
<description>&lt;p&gt;The answer is provided below. Please comment if you have question of find mistakes&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://i.imgur.com/Qchg1sl.jpeg&quot;&gt;https://i.imgur.com/Qchg1sl.jpeg&lt;/a&gt;&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1047/calculate-forward-forward-propagation-network-classification?show=1048#a1048</guid>
<pubDate>Wed, 09 Oct 2024 13:00:11 +0000</pubDate>
</item>
<item>
<title>Commented: How to update weights using gradient decent algorithm?</title>
<link>https://ask.ghassem.com/596/how-to-update-weights-using-gradient-decent-algorithm?show=1046#c1046</link>
<description>Isn&amp;#039;t the derivative in 3) wrong? I got 2w-5 instead of 4w-10? I think person in solution forgot to incorporate the 1/2</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/596/how-to-update-weights-using-gradient-decent-algorithm?show=1046#c1046</guid>
<pubDate>Wed, 17 Apr 2024 22:58:56 +0000</pubDate>
</item>
<item>
<title>Commented: How to update weights in backpropagation algorithm (a numerical example)?</title>
<link>https://ask.ghassem.com/612/update-weights-backpropagation-algorithm-numerical-example?show=1045#c1045</link>
<description>It does make a difference, because after taking the derivative it should be (target - output) but in the solution it&amp;#039;s (output - target)</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/612/update-weights-backpropagation-algorithm-numerical-example?show=1045#c1045</guid>
<pubDate>Fri, 05 Apr 2024 21:48:20 +0000</pubDate>
</item>
<item>
<title>When to use one hot encode a category and when to segment by category?</title>
<link>https://ask.ghassem.com/1034/when-to-use-one-hot-encode-category-and-when-segment-category</link>
<description>When pre processing data for machine learning. Is there any difference in using one hot encoding to turn categoric variables into numeric variables or to segment the data and the model being used along the category. So say you run a multivariate regression model on data covering 5 cities. Would a single model with one variable for each city be more better or worse than having 5 models specific for each city? Or is there no difference? Or does it depend on certain factors and intuition?</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1034/when-to-use-one-hot-encode-category-and-when-segment-category</guid>
<pubDate>Wed, 22 Feb 2023 20:30:38 +0000</pubDate>
</item>
<item>
<title>Answered: How to calculate the residual errors, (MSE),(MAE), and (RMSE)?</title>
<link>https://ask.ghassem.com/1031/how-to-calculate-the-residual-errors-mse-mae-and-rmse?show=1032#a1032</link>
<description>&lt;p&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;1. First, we need to calculate the residual errors. Residual errors are the difference between the actual values and predicted values.&lt;/p&gt;

&lt;table border=&quot;1&quot; cellpadding=&quot;1&quot; style=&quot;width:500px&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th&gt;Sample&lt;/th&gt;
&lt;th&gt;Feature 1&lt;/th&gt;
&lt;th&gt;Feature 2&lt;/th&gt;
&lt;th&gt;Actual Value&lt;/th&gt;
&lt;th&gt;Predicted Value&lt;/th&gt;
&lt;th&gt;Residual Error (Actual - Predicted)&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;-2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;-1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;-1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;-1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;-1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;Next, we can calculate the MSE by taking the average of the squared residual errors.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;$MSE = ((-2)^2 + (-1)^2 + (-1)^2 + (-1)^2 + (-1)^2) / 5 = 10 / 5 = 2$&lt;/p&gt;

&lt;ol start=&quot;3&quot;&gt;
&lt;li&gt;To calculate the MAE, we take the average of the absolute residual errors.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;$MAE = (|-2| + |-1| + |-1| + |-1| + |-1|) / 5 = 6 / 5 = 1.2$&lt;/p&gt;

&lt;ol start=&quot;4&quot;&gt;
&lt;li&gt;Finally, to calculate the RMSE, we take the square root of the MSE.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;$RMSE = sqrt(2) = 1.41$&lt;/p&gt;

&lt;p&gt;Therefore, the residual errors are [-2, -1, -1, -1, -1], the MSE is 2, the MAE is 1.2, and the RMSE is 1.41.&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1031/how-to-calculate-the-residual-errors-mse-mae-and-rmse?show=1032#a1032</guid>
<pubDate>Fri, 27 Jan 2023 04:16:33 +0000</pubDate>
</item>
<item>
<title>Can you verify the validity of this chart comparing the review scores for Marvel Phase 4?</title>
<link>https://ask.ghassem.com/1030/verify-validity-chart-comparing-review-scores-marvel-phase</link>
<description>&lt;p&gt;I have some skepticism about the validity of the charts below comparing the critic and audience reviews for Phase 4 of the MCU to the previous 3 phases. There are over 18 movies and tv shows in Phase 4 compared to the 6 movies in Phases 1 &amp;amp; 2 and the 11 movies in Phase 3. Also, there are far fewer critic reviews for the Phase 4 tv shows than the Phase 4 movies. For example, on Rotten Tomatoes there are only 40 critic reviews for The Falcon and the Winter Soldier and 452 critic reviews for Black Widow. Could this uneven and inconsistent number of reviews between tv shows and movies in Phase 4 be inaccurately making the overall averages higher than they should be? Or do you agree with the conclusions presented in the charts?&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://cdn.discordapp.com/attachments/997145183172964435/1059948060194652230/image.png&quot;&gt;https://cdn.discordapp.com/attachments/997145183172964435/1059948060194652230/image.png&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://cdn.discordapp.com/attachments/997145183172964435/1049356020469739520/image.png&quot;&gt;https://cdn.discordapp.com/attachments/997145183172964435/1049356020469739520/image.png&lt;/a&gt;&lt;/p&gt;</description>
<category>Exploratory Data Analysis</category>
<guid isPermaLink="true">https://ask.ghassem.com/1030/verify-validity-chart-comparing-review-scores-marvel-phase</guid>
<pubDate>Mon, 09 Jan 2023 16:29:14 +0000</pubDate>
</item>
<item>
<title>Edited: How to use Genetic Algorithm to optimize a function?</title>
<link>https://ask.ghassem.com/1010/how-to-use-genetic-algorithm-to-optimize-a-function?show=1010#q1010</link>
<description>&lt;p&gt;Assume the function is defined as $f(x,y)=x^2+y^2-4xy$, and $1\leq x \leq 4,1\leq y \leq 4$. &amp;nbsp;The Genetic Algorithm is selected&amp;nbsp;to maximize the function. If the first population for pairs of $(x,y)$&amp;nbsp;is defined as $S=\{A=(1,2), B=(2,1), C=(2,2), D=(2,3), E=(3,1) \}$.&amp;nbsp;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;a) &lt;/strong&gt;Calculate the fitness of each of individuals (A,B,C,D,E) in population if:&amp;nbsp; &amp;nbsp;$\text{fitness function}=f(x,y)$&amp;nbsp;&lt;strong&gt;&amp;nbsp;&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;b)&lt;/strong&gt; Calculate the probability of each individual&amp;nbsp;and sort them in descending order. Which individual has the maximum fitness (probability)? $p_{i}=\frac{f_{i}}{\sum_{j=1}^{N} f_{j}}$&lt;br&gt;
&lt;strong&gt;c) &lt;/strong&gt;Draw&lt;strong&gt;&amp;nbsp;&lt;/strong&gt;the roulette wheel and calculate&amp;nbsp;the boundaries for each individual&lt;br&gt;
&lt;strong&gt;d) &lt;/strong&gt;If we use two individuals and their arithmetic mean for crossover each time,&amp;nbsp;and for mutation, we add 0.1 to x and subtract 0.1 from y for each individual created after crossover, what will be the next population with five members?&lt;br&gt;
For part (d), use the following random numbers in order whenever you need them in the selection process:&lt;br&gt;
$\text{random numbers} =&amp;nbsp; \{0.780,0.220,0.776,0.507,0.822,0.765,0.288,0.881,0.895,0.421\}$&lt;br&gt;
&amp;nbsp;&lt;/p&gt;</description>
<category>Artificial Intelligence</category>
<guid isPermaLink="true">https://ask.ghassem.com/1010/how-to-use-genetic-algorithm-to-optimize-a-function?show=1010#q1010</guid>
<pubDate>Wed, 23 Nov 2022 10:19:30 +0000</pubDate>
</item>
<item>
<title>Answered: List of free Qwiklabs labs</title>
<link>https://ask.ghassem.com/1028/list-of-free-qwiklabs-labs?show=1029#a1029</link>
<description>&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.cloudskillsboost.google/focuses/3565?catalog_rank=%7B%22rank%22%3A327%2C%22num_filters%22%3A1%2C%22has_search%22%3Afalse%7D&amp;amp;parent=catalog&quot;&gt;Big Data Analysis to a Slide Presentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.cloudskillsboost.google/focuses/49138?catalog_rank=%7B%22rank%22%3A334%2C%22num_filters%22%3A1%2C%22has_search%22%3Afalse%7D&amp;amp;parent=catalog&quot;&gt;Exploring an Ecommerce Dataset using SQL in Google BigQuery&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.cloudskillsboost.google/focuses/17740?catalog_rank=%7B%22rank%22%3A380%2C%22num_filters%22%3A1%2C%22has_search%22%3Afalse%7D&amp;amp;parent=catalog&quot;&gt;Filtering and Sorting Data in Looker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.cloudskillsboost.google/focuses/660?catalog_rank=%7B%22rank%22%3A567%2C%22num_filters%22%3A1%2C%22has_search%22%3Afalse%7D&amp;amp;parent=catalog&quot;&gt;Firebase Web&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.cloudskillsboost.google/focuses/2802?catalog_rank=%7B%22rank%22%3A268%2C%22num_filters%22%3A1%2C%22has_search%22%3Afalse%7D&amp;amp;parent=catalog&quot;&gt;Introduction to SQL for BigQuery and Cloud SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.cloudskillsboost.google/focuses/2157?catalog_rank=%7B%22rank%22%3A41%2C%22num_filters%22%3A1%2C%22has_search%22%3Afalse%7D&amp;amp;parent=catalog&quot;&gt;Getting Started with BigQuery Machine Learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.cloudskillsboost.google/focuses/6268?catalog_rank=%7B%22rank%22%3A108%2C%22num_filters%22%3A1%2C%22has_search%22%3Afalse%7D&amp;amp;parent=catalog&quot;&gt;Understanding and Analyzing Your Costs with Google Cloud Billing Reports&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.cloudskillsboost.google/focuses/7115?catalog_rank=%7B%22rank%22%3A115%2C%22num_filters%22%3A1%2C%22has_search%22%3Afalse%7D&amp;amp;parent=catalog&quot;&gt;Visualizing Billing Data with Google Data Studio&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.cloudskillsboost.google/focuses/3614?catalog_rank=%7B%22rank%22%3A138%2C%22num_filters%22%3A1%2C%22has_search%22%3Afalse%7D&amp;amp;parent=catalog&quot;&gt;Explore and Create Reports with Data Studio&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.cloudskillsboost.google/focuses/1794?catalog_rank=%7B%22rank%22%3A157%2C%22num_filters%22%3A1%2C%22has_search%22%3Afalse%7D&amp;amp;parent=catalog&quot;&gt;Predict Visitor Purchases with a Classification Model in BQML&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.cloudskillsboost.google/focuses/39167?catalog_rank=%7B%22rank%22%3A223%2C%22num_filters%22%3A1%2C%22has_search%22%3Afalse%7D&amp;amp;parent=catalog&quot;&gt;Begin with Workspace: Essentials: Challenge Lab&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.cloudskillsboost.google/focuses/4337?catalog_rank=%7B%22rank%22%3A251%2C%22num_filters%22%3A1%2C%22has_search%22%3Afalse%7D&amp;amp;parent=catalog&quot;&gt;Bracketology with Google Machine Learning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>
<category>Cloud Computing</category>
<guid isPermaLink="true">https://ask.ghassem.com/1028/list-of-free-qwiklabs-labs?show=1029#a1029</guid>
<pubDate>Fri, 28 Oct 2022 23:51:28 +0000</pubDate>
</item>
<item>
<title>Retagged: What is the difference between a batch and an epoch in a Neural Network?</title>
<link>https://ask.ghassem.com/497/what-the-difference-between-batch-and-epoch-neural-network?show=497#q497</link>
<description>Both of the batch size and number of epochs are integer values and seem to do the same thing in Stochastic gradient descent. What are these two hyper-parameters of this learning algorithm?</description>
<category>Machine Learning Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/497/what-the-difference-between-batch-and-epoch-neural-network?show=497#q497</guid>
<pubDate>Wed, 28 Sep 2022 10:40:08 +0000</pubDate>
</item>
<item>
<title>Which code has best runtime and why?(the one commented or the other one)</title>
<link>https://ask.ghassem.com/1027/which-code-has-best-runtime-and-why-the-one-commented-the-other</link>
<description># for key, value in dict.items():&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if value &amp;gt;= long:&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;long = value&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;long_name = key&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if value &amp;lt; short:&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;short = value&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;short_name = key&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;long = max(dict.values())&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;long_name = max(dict, key=dict.get)&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;short = min(dict.values())&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;short_name = min(dict, key=dict.get)</description>
<category>Python</category>
<guid isPermaLink="true">https://ask.ghassem.com/1027/which-code-has-best-runtime-and-why-the-one-commented-the-other</guid>
<pubDate>Fri, 02 Sep 2022 14:39:49 +0000</pubDate>
</item>
<item>
<title>Creating tables from unstructured texts about stock market</title>
<link>https://ask.ghassem.com/1026/creating-tables-from-unstructured-texts-about-stock-market</link>
<description>&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;p&gt;I am trying to extract information such as profits, revenues and others along with their corresponding dates and quarters from an unstructured text about stock market and convert it into a report in the table form but as there is not format of the input text, it is hard to know which entity belong to what date and quarters and which value belong to which entity. Chunking works on few documents but not enough. Is there any unsupervised way to linking entities with their corresponding dates, values and quarters?&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1026/creating-tables-from-unstructured-texts-about-stock-market</guid>
<pubDate>Tue, 02 Aug 2022 00:47:49 +0000</pubDate>
</item>
<item>
<title>How do I compare the count of a value in each year while having a different sanple size each year.</title>
<link>https://ask.ghassem.com/1025/compare-count-value-each-year-while-having-different-sanple</link>
<description>How do I accurately compare between the number of something a survey measure from my employees each year with a varying umber of survey engagement and employee size?&lt;br /&gt;
&lt;br /&gt;
If I was measuring the satisfaction of my employees over the years by collecting a survey from my them each year by asking them wether they are satisfied or not, and then comparing yes’s over the years but the number of employees who answer is not the same each year and the number of employees increases every year. How do I correctly compare this throughout each year?&lt;br /&gt;
&lt;br /&gt;
In other words, how do I remove the effect of the survey engagement rate when calculating the results?</description>
<category>general</category>
<guid isPermaLink="true">https://ask.ghassem.com/1025/compare-count-value-each-year-while-having-different-sanple</guid>
<pubDate>Wed, 08 Jun 2022 10:32:33 +0000</pubDate>
</item>
<item>
<title>Is it possible to make a forecast of a future value of Air Temperature using Fast Fourier Transform?</title>
<link>https://ask.ghassem.com/1024/possible-forecast-future-value-temperature-fourier-transform</link>
<description>Is it possible to make a forecast of a future value of Air Temperature using Fast Fourier Transform, if yes, what should be the process or how you&amp;#039;ll be able to do it. Thank you!</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/1024/possible-forecast-future-value-temperature-fourier-transform</guid>
<pubDate>Thu, 02 Jun 2022 16:10:26 +0000</pubDate>
</item>
<item>
<title>forecast log transformed fitted values for 2 years using ARMA model</title>
<link>https://ask.ghassem.com/1023/forecast-transformed-fitted-values-years-using-arma-model</link>
<description>Input is a stock price in exponential transformation. We are asked to forecast using ARMA results for 2 years.</description>
<category>Exploratory Data Analysis</category>
<guid isPermaLink="true">https://ask.ghassem.com/1023/forecast-transformed-fitted-values-years-using-arma-model</guid>
<pubDate>Wed, 04 May 2022 20:31:44 +0000</pubDate>
</item>
<item>
<title>Kmeans clustering in python - Giving original labels to predicted clusters</title>
<link>https://ask.ghassem.com/1022/kmeans-clustering-python-giving-original-predicted-clusters</link>
<description>&lt;p&gt;I have a dataset with 7 labels in the target variable.&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
X = data.drop(&#039;target&#039;, axis=1)
Y = data[&#039;target&#039;]
Y.unique()&lt;/pre&gt;

&lt;p&gt;array([&#039;Normal_Weight&#039;, &#039;Overweight_Level_I&#039;, &#039;Overweight_Level_II&#039;,&lt;br&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&#039;Obesity_Type_I&#039;, &#039;Insufficient_Weight&#039;, &#039;Obesity_Type_II&#039;,&lt;br&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&#039;Obesity_Type_III&#039;], dtype=object)&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
km = KMeans(n_clusters=7, init=&quot;k-means++&quot;, random_state=300)
km.fit_predict(X)
np.unique(km.labels_)&lt;/pre&gt;

&lt;p&gt;array([0, 1, 2, 3, 4, 5, 6])&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;After performing KMean clustering algorithm with number of clusters as 7, the resulted clusters are labeled as 0,1,2,3,4,5,6. But how to know which real label matches with the predicted label.&lt;/p&gt;

&lt;p&gt;In other words, I want to know how to give original label names to new predicted labels, so that they can be compared like how many values are clustered correctly (Accuracy).&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1022/kmeans-clustering-python-giving-original-predicted-clusters</guid>
<pubDate>Wed, 27 Apr 2022 05:32:54 +0000</pubDate>
</item>
<item>
<title>Bankruptcy prediction and credit card</title>
<link>https://ask.ghassem.com/1021/bankruptcy-prediction-and-credit-card</link>
<description>Hello everyone newbie data scientist here.&lt;br /&gt;
I&amp;#039;m working on a project to predict companies (probability of default) bankruptcy probability and to assign them a credit rating/score based on that :&lt;br /&gt;
For example below 50 probability is good and above is bad ( just for the example)&lt;br /&gt;
I have a dataset contains financial ratios and a class refers if the company is bankrupted or not (0 and one).&lt;br /&gt;
I&amp;#039;m planning to use this models:&lt;br /&gt;
Logistic regression linear discrimination analysis, decision trees, random forest, ANN, adaboost, Svm.&lt;br /&gt;
&lt;br /&gt;
The question is and i know it is a dumb question:&lt;br /&gt;
Does those models return a probability? Which i can transform to labels, I saw that in a thesis and I&amp;#039;m not sure about it.&lt;br /&gt;
&lt;br /&gt;
Otherwise, any guidance,tips anything will be appreciated.</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1021/bankruptcy-prediction-and-credit-card</guid>
<pubDate>Sun, 10 Apr 2022 05:50:14 +0000</pubDate>
</item>
<item>
<title>Answered: how to output f1-score instead of accuracy</title>
<link>https://ask.ghassem.com/1019/how-to-output-f1-score-instead-of-accuracy?show=1020#a1020</link>
<description>&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
&amp;nbsp;&lt;/pre&gt;

&lt;pre&gt;
from sklearn.metrics import f1_score
&lt;code&gt;f1_score(data_test,target_test)&lt;/code&gt;&lt;/pre&gt;


</description>
<category>Python</category>
<guid isPermaLink="true">https://ask.ghassem.com/1019/how-to-output-f1-score-instead-of-accuracy?show=1020#a1020</guid>
<pubDate>Sat, 02 Apr 2022 14:32:22 +0000</pubDate>
</item>
<item>
<title>I cannot get this code to work. please help.</title>
<link>https://ask.ghassem.com/1018/i-cannot-get-this-code-to-work-please-help</link>
<description>&lt;p&gt;from keras.models import Sequential&amp;nbsp;&lt;br&gt;
from keras.layers import Dense&amp;nbsp;&lt;br&gt;
from keras.layers import LSTM&amp;nbsp;&lt;br&gt;
from sklearn.model_selection import train_test_split&lt;/p&gt;

&lt;p&gt;model = Sequential()&amp;nbsp;&lt;br&gt;
model.add(LSTM( 10, input_shape=(1, 1)))&amp;nbsp;&lt;br&gt;
model.add(Dense(1, activation=&quot;linear&quot;))&amp;nbsp;&lt;br&gt;
model.compile(loss=&quot;mse&quot;, optimizer=&quot;adam&quot;)&lt;/p&gt;

&lt;p&gt;X, y = get_data()&lt;/p&gt;

&lt;p&gt;X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)&lt;br&gt;
X_train_2, X_val, y_train_2, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=1)&lt;/p&gt;

&lt;p&gt;model.fit(X_train, y_train, epochs=800, validation_data=(X_val, y_val), shuffle=False)&lt;/p&gt;
html, body, table, thead, input, textarea, select {color: #bab5ab!important; background: #35393b;} input[type=&quot;text&quot;], textarea, select {color: #bab5ab!important; background: #35393b;} [data-darksite-inline-background-image-gradient] {background: linear-gradient(rgba(0, 0, 0, 0.5), rgba(0, 0, 0, 0.5))!important; -webkit-background-size: cover!important; -moz-background-size: cover!important; -o-background-size: cover!important; background-size: cover!important;} [data-darksite-force-inline-background] * {background-color: rgba(0,0,0,0.7)!important;} [data-darksite-inline-background] {background-color: rgba(0,0,0,0.7)!important;} [data-darksite-inline-color] {color: #fff!important;} [data-darksite-inline-background-image] {background-image: linear-gradient(rgba(0,0,0,0.3), rgba(0,0,0,0.3))!important}
</description>
<category>Python</category>
<guid isPermaLink="true">https://ask.ghassem.com/1018/i-cannot-get-this-code-to-work-please-help</guid>
<pubDate>Mon, 21 Mar 2022 05:59:53 +0000</pubDate>
</item>
<item>
<title>Recategorized: Battery data projects</title>
<link>https://ask.ghassem.com/1017/battery-data-projects?show=1017#q1017</link>
<description>Where can I find projects related to battery data?</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/1017/battery-data-projects?show=1017#q1017</guid>
<pubDate>Thu, 03 Mar 2022 09:42:53 +0000</pubDate>
</item>
<item>
<title>Answer selected: How to activate hive interpreter for Apache Zeppelin on Dataproc on GCP?</title>
<link>https://ask.ghassem.com/791/how-activate-hive-interpreter-for-apache-zeppelin-dataproc?show=797#a797</link>
<description>&lt;p&gt;We can activate Hive interpreter by creating a new interpreter with some changes in the properties and dependencies&lt;br&gt;
&lt;br&gt;
1. Set hive.url as&lt;br&gt;
jdbc:hive2://localhost:10000&lt;br&gt;
&lt;br&gt;
2. Remove username and password (keep them empty)&lt;br&gt;
&lt;br&gt;
3. Set hive.driver property as:&lt;br&gt;
org.apache.hive.jdbc.HiveDriver&lt;br&gt;
&lt;br&gt;
4. Adding two artifact under dependencies tab:&lt;br&gt;
org.apache.hive:hive-jdbc:0.14.0&lt;br&gt;
org.apache.hadoop:hadoop-common:2.6.0&lt;/p&gt;

&lt;p&gt;You can watch this video for more explanations and code is &lt;a rel=&quot;nofollow&quot; href=&quot;https://gist.github.com/tofighi/a1e25e2f065876922f8b078858b03875&quot;&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.youtube.com/watch?v=0nqhoLWVdwU&quot;&gt;https://www.youtube.com/watch?v=0nqhoLWVdwU&lt;/a&gt;&lt;/p&gt;</description>
<category>Cloud Computing</category>
<guid isPermaLink="true">https://ask.ghassem.com/791/how-activate-hive-interpreter-for-apache-zeppelin-dataproc?show=797#a797</guid>
<pubDate>Sun, 13 Feb 2022 14:14:24 +0000</pubDate>
</item>
<item>
<title>How can you build dynamic pricing model with data only from rigid pricing?</title>
<link>https://ask.ghassem.com/1016/build-dynamic-pricing-model-with-data-only-from-rigid-pricing</link>
<description>I want to build a dynamic pricing model which means if product is too expansive for a client and there is a risk that we might loose a client we lower the price for them but if client doesn&amp;#039;t care that much about the price we might increase price a little.&lt;br /&gt;
&lt;br /&gt;
All the articles I&amp;#039;ve seen describe some kind of A/B testing for the pricing and then create a model.&lt;br /&gt;
&lt;br /&gt;
I want to build a model only on the existing rigid pricing data. So I have prices offered to customers and I know who bought the product and who went to other company.&lt;br /&gt;
&lt;br /&gt;
How can I do the increasing price part?</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/1016/build-dynamic-pricing-model-with-data-only-from-rigid-pricing</guid>
<pubDate>Fri, 21 Jan 2022 06:44:31 +0000</pubDate>
</item>
<item>
<title>What analytical software would be good for a company to use?</title>
<link>https://ask.ghassem.com/1015/what-analytical-software-would-be-good-for-a-company-to-use</link>
<description>This would be for a company that is just now looking into using a software to track data for wine making.</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/1015/what-analytical-software-would-be-good-for-a-company-to-use</guid>
<pubDate>Fri, 14 Jan 2022 16:46:38 +0000</pubDate>
</item>
<item>
<title>Do you usually collect you own data or there is always a resource available for you? Or it depends on the company?</title>
<link>https://ask.ghassem.com/1014/usually-collect-always-resource-available-depends-company</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/1014/usually-collect-always-resource-available-depends-company</guid>
<pubDate>Sun, 09 Jan 2022 22:13:34 +0000</pubDate>
</item>
<item>
<title>Answered: When dealing with categorical values, should the &#039;year&#039; column be encoded using OHE or OrdinalEncoder?</title>
<link>https://ask.ghassem.com/1012/dealing-categorical-values-should-encoded-ordinalencoder?show=1013#a1013</link>
<description>You should ask yourself if the order of years has an effect in predicting the price? It seems it is important. Therefore, OrdinalEncoder seems to be a better choice. If you use OneHotEncoder, you consider the years with equal weights in predicting the price.</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1012/dealing-categorical-values-should-encoded-ordinalencoder?show=1013#a1013</guid>
<pubDate>Mon, 20 Dec 2021 18:10:13 +0000</pubDate>
</item>
<item>
<title>Answer selected: How to create a Decision Tree using the ID3 algorithm?</title>
<link>https://ask.ghassem.com/1008/how-to-create-a-decision-tree-using-the-id3-algorithm?show=1009#a1009</link>
<description>&lt;p&gt;&lt;strong&gt;a)&lt;/strong&gt; See the following figure for the ID3 decision tree:&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://i.imgur.com/kizNjoc.png&quot;&gt;https://i.imgur.com/kizNjoc.png&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;b)&lt;/strong&gt; Only the disjunction of conjunctions for Martians was required:&lt;/p&gt;

&lt;p&gt;$\begin{aligned}&lt;br&gt;
&amp;amp;(\text { Legs }=3) \vee \\&lt;br&gt;
&amp;amp;(\text { Legs }=2 \wedge \text { Green }=\text { Yes } \wedge \text { Height }=\text { Tall }) \vee \\&lt;br&gt;
&amp;amp;(\text { Legs }=2 \wedge \text { Green }=\text { No } \wedge \text { Height }=\text { Short } \wedge \text { Smelly }=\text { Yes })&lt;br&gt;
\end{aligned}$&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://github.com/tofighi/MachineLearning/blob/master/Decision_Tree_Example.ipynb&quot;&gt;Python Code&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;Step 1: Organize the Dataset&lt;/h2&gt;

&lt;p&gt;Our data has the following features and values:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Species&lt;/strong&gt;: Target variable (M or H)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Features&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Green&lt;/strong&gt;: \( N \) or \( Y \)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Legs&lt;/strong&gt;: \( 2 \) or \( 3 \)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Height&lt;/strong&gt;: \( S \) or \( T \)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Smelly&lt;/strong&gt;: \( N \) or \( Y \)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;table border=&quot;1&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th&gt;Index&lt;/th&gt;
&lt;th&gt;Species&lt;/th&gt;
&lt;th&gt;Green&lt;/th&gt;
&lt;th&gt;Legs&lt;/th&gt;
&lt;th&gt;Height&lt;/th&gt;
&lt;th&gt;Smelly&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;M&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;td&gt;Y&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;M&lt;/td&gt;
&lt;td&gt;Y&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;T&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;M&lt;/td&gt;
&lt;td&gt;Y&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;T&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;M&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;td&gt;Y&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;M&lt;/td&gt;
&lt;td&gt;Y&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;T&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;H&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;T&lt;/td&gt;
&lt;td&gt;Y&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;H&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;H&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;T&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;H&lt;/td&gt;
&lt;td&gt;Y&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;H&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;T&lt;/td&gt;
&lt;td&gt;Y&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;h2&gt;Step 2: Calculate the Initial Entropy for the Target Variable (Species)&lt;/h2&gt;

&lt;p&gt;We start by calculating the entropy of the target variable, &lt;strong&gt;Species&lt;/strong&gt;, which has two classes: &lt;strong&gt;M&lt;/strong&gt; (Martian) and &lt;strong&gt;H&lt;/strong&gt; (Human).&lt;/p&gt;

&lt;h3&gt;Total Counts&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Martians (M): 5&lt;/li&gt;
&lt;li&gt;Humans (H): 5&lt;/li&gt;
&lt;li&gt;Total: 10&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;Entropy Formula&lt;/h3&gt;

&lt;p&gt;The entropy \( E \) for a binary classification is calculated as:&lt;/p&gt;

&lt;p&gt;$$ E = -p_+ \log_2(p_+) - p_- \log_2(p_-) $$&lt;/p&gt;

&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;\( p_+ \): Probability of positive class (M)&lt;/li&gt;
&lt;li&gt;\( p_- \): Probability of negative class (H)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;Calculation&lt;/h3&gt;

&lt;p&gt;$$ p(M) = \frac{5}{10} = 0.5 $$&lt;/p&gt;

&lt;p&gt;$$ p(H) = \frac{5}{10} = 0.5 $$&lt;/p&gt;

&lt;p&gt;$$ E(Species) = -0.5 \cdot \log_2(0.5) - 0.5 \cdot \log_2(0.5) $$&lt;/p&gt;

&lt;p&gt;$$ = -0.5 \cdot (-1) - 0.5 \cdot (-1) $$&lt;/p&gt;

&lt;p&gt;$$ = 1.0 $$&lt;/p&gt;

&lt;h2&gt;Step 3: Calculate Entropy and Information Gain for Each Feature&lt;/h2&gt;

&lt;p&gt;We’ll calculate the entropy for each feature split and determine the information gain.&lt;/p&gt;

&lt;h3&gt;Feature: Green&lt;/h3&gt;

&lt;p&gt;Green can be either &lt;strong&gt;Y&lt;/strong&gt; or &lt;strong&gt;N&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For Green = Y:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Martians (M): 3&lt;/li&gt;
&lt;li&gt;Humans (H): 1&lt;/li&gt;
&lt;li&gt;Total: 4&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Entropy:&lt;/p&gt;

&lt;p&gt;$$ E(Green = Y) = -\left(\frac{3}{4}\right) \log_2\left(\frac{3}{4}\right) - \left(\frac{1}{4}\right) \log_2\left(\frac{1}{4}\right) $$&lt;/p&gt;

&lt;p&gt;$$ = -0.75 \cdot \log_2(0.75) - 0.25 \cdot \log_2(0.25) $$&lt;/p&gt;

&lt;p&gt;$$ = -0.75 \cdot (-0.415) - 0.25 \cdot (-2) $$&lt;/p&gt;

&lt;p&gt;$$ = 0.311 + 0.5 = 0.811 $$&lt;/p&gt;

&lt;p&gt;For Green = N:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Martians (M): 2&lt;/li&gt;
&lt;li&gt;Humans (H): 4&lt;/li&gt;
&lt;li&gt;Total: 6&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Entropy:&lt;/p&gt;

&lt;p&gt;$$ E(Green = N) = -\left(\frac{2}{6}\right) \log_2\left(\frac{2}{6}\right) - \left(\frac{4}{6}\right) \log_2\left(\frac{4}{6}\right) $$&lt;/p&gt;

&lt;p&gt;$$ = -0.333 \cdot \log_2(0.333) - 0.667 \cdot \log_2(0.667) $$&lt;/p&gt;

&lt;p&gt;$$ = -0.333 \cdot (-1.585) - 0.667 \cdot (-0.585) $$&lt;/p&gt;

&lt;p&gt;$$ = 0.528 + 0.389 = 0.917 $$&lt;/p&gt;

&lt;h3&gt;Weighted Entropy for Green&lt;/h3&gt;

&lt;p&gt;$$ E(Green) = \frac{4}{10} \cdot 0.811 + \frac{6}{10} \cdot 0.917 $$&lt;/p&gt;

&lt;p&gt;$$ = 0.3244 + 0.5502 = 0.8746 $$&lt;/p&gt;

&lt;h3&gt;Information Gain for Green&lt;/h3&gt;

&lt;p&gt;$$ IG(Species, Green) = E(Species) - E(Green) $$&lt;/p&gt;

&lt;p&gt;$$ = 1.0 - 0.8746 = 0.1254 $$&lt;/p&gt;

&lt;p&gt;Continue this process to calculate the entropy and information gain for each feature (Legs, Height, and Smelly) similarly.&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1008/how-to-create-a-decision-tree-using-the-id3-algorithm?show=1009#a1009</guid>
<pubDate>Wed, 01 Dec 2021 11:56:37 +0000</pubDate>
</item>
<item>
<title>Commented: How to filter a dataframe?</title>
<link>https://ask.ghassem.com/775/how-to-filter-a-dataframe?show=1007#c1007</link>
<description>Since it&amp;#039;s the first row you can also do df.head(1).</description>
<category>Python Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/775/how-to-filter-a-dataframe?show=1007#c1007</guid>
<pubDate>Mon, 29 Nov 2021 04:14:50 +0000</pubDate>
</item>
<item>
<title>How do I know which encoder to use to convert from categorical variables to numerical?</title>
<link>https://ask.ghassem.com/1006/know-which-encoder-convert-categorical-variables-numerical</link>
<description>So say I have a column with categorical data like different styles of temperature: &amp;#039;Lukewarm&amp;#039;, &amp;#039;Hot&amp;#039;, &amp;#039;Scalding&amp;#039;, &amp;#039;Cold&amp;#039;, &amp;#039;Frostbite&amp;#039;,... etc.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I know that we can use pd.get_dummies to convert the column to numerical data within the dataframe, but I also know that there are other &amp;#039;converters&amp;#039; (not sure if that&amp;#039;s the correct terminology) that we can use, i.e. OneHotEncoder from Sk-learn (like I could use the pipeline module to make a nice pipeline and feed my dataframe through the pipeline to also get my categorical data encoded to numerical).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
How do I know which to use? Does it matter? If it does matter, when does it matter the most (i.e. what types of problems? When there are lots of categorical variables, or few?) If anyone can give me any pointers on this type of stuff I&amp;#039;d greatly appreciate it.</description>
<category>Exploratory Data Analysis</category>
<guid isPermaLink="true">https://ask.ghassem.com/1006/know-which-encoder-convert-categorical-variables-numerical</guid>
<pubDate>Mon, 29 Nov 2021 04:09:06 +0000</pubDate>
</item>
<item>
<title>ValueError: Length mismatch: Expected axis has 60 elements, new values have 2935849 elements</title>
<link>https://ask.ghassem.com/1005/valueerror-length-mismatch-expected-elements-2935849-elements</link>
<description>&lt;p&gt;I&#039;m creating a new data frame&amp;nbsp;with the most used items grouped together. But I got the following error when grouping through ID and items.&amp;nbsp;ValueError: Length mismatch: Expected axis has 60 elements, new values have 2935849 elements.&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
df = sales_df[sales_df[&#039;shop_id&#039;].duplicated(keep=False)]
df[&#039;Grouped&#039;] = sales_df.groupby(&#039;shop_id&#039;)[&#039;item_name&#039;].transform(lambda x: &#039;,&#039;.join(x))
df2 = df[[&#039;shop_id&#039;, &#039;Grouped&#039;]].drop_duplicates()&lt;/pre&gt;

&lt;p&gt;In the aforementioned code, I&#039;m making a data frame with respect to shop id and then grouping through shop items. My objective here is to group items with similar ID.&lt;/p&gt;</description>
<category>Exploratory Data Analysis</category>
<guid isPermaLink="true">https://ask.ghassem.com/1005/valueerror-length-mismatch-expected-elements-2935849-elements</guid>
<pubDate>Fri, 26 Nov 2021 06:09:16 +0000</pubDate>
</item>
<item>
<title>Text Mining, Artificial Neural Networks, Speech Processing, Cloud Computing in DS? Essential for a good Data Scientist ?</title>
<link>https://ask.ghassem.com/1004/artificial-networks-processing-computing-essential-scientist</link>
<description></description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/1004/artificial-networks-processing-computing-essential-scientist</guid>
<pubDate>Wed, 27 Oct 2021 19:15:16 +0000</pubDate>
</item>
<item>
<title>Classification of data object might be incorrect</title>
<link>https://ask.ghassem.com/1003/classification-of-data-object-might-be-incorrect</link>
<description>&lt;p&gt;I am learning a new Salesforce product (Evergage) for the company I work for. In the program&#039;s documentation they have listed a set of data objects as an example. It appears to me that the classification might be incorrect. Their system makes a division between &#039;catalog objects&#039; and &#039;profile objects&#039; and the example they have given is a banking institution. They classified &lt;em&gt;Customer Credit Card &lt;/em&gt;as a &lt;em&gt;profile objec&lt;/em&gt;t and &lt;em&gt;Credit Card Level &lt;/em&gt;as a &lt;em&gt;catalog object. &lt;/em&gt;Seems to me that it should be the other way i.e &lt;em&gt;Customer Credit Card = catalog &lt;/em&gt;&lt;em&gt;object &lt;/em&gt;and &lt;em&gt;Credit Card Level &lt;/em&gt;=&amp;nbsp;&lt;em&gt;profile objec&lt;/em&gt;t. Maybe I am not reading the context correctly?&lt;/p&gt;

&lt;p&gt;here is a link to an image with the complete classification: &lt;a rel=&quot;nofollow&quot; href=&quot;https://drive.google.com/file/d/1nG4aX4Ty_NoHxm04AQo1Ow61m3MZ3pXm/view?usp=sharing&quot;&gt;https://drive.google.com/file/d/1nG4aX4Ty_NoHxm04AQo1Ow61m3MZ3pXm/view?usp=sharing&lt;/a&gt;&lt;/p&gt;</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/1003/classification-of-data-object-might-be-incorrect</guid>
<pubDate>Mon, 25 Oct 2021 15:26:46 +0000</pubDate>
</item>
<item>
<title>Can Data Science solve this problem?</title>
<link>https://ask.ghassem.com/1002/can-data-science-solve-this-problem</link>
<description>So, I live in Brazil, and I have a task for college that I don&amp;#039;t know what data science method to use, if at all, to solve it. My idea is the following: We Brazilians have Real (BRL) as currency, and we of course have the dollar quotation value to see &amp;quot;how many Reais a dollar is worth&amp;quot;. What I wanted to do was to make a research and see whether the Country News have any influence over this price. So for example, if Bolsonaro, our president, says some dumb stuff, the dollar got up in price, and vice versa. What I wanted to do was collect all dollar values and variance over a set time interval, and try and get webscraping to get the news over some economy sites. Here&amp;#039;s my question then: How can I correlate the news with the dollar variance over a set time? Can data science do that? How do I preprocess this, if at all? Do I need to use bag-of-words? At least I heard so... Please help and thank you for reading.</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/1002/can-data-science-solve-this-problem</guid>
<pubDate>Sun, 24 Oct 2021 15:43:11 +0000</pubDate>
</item>
<item>
<title>Answer selected: How to calculate LogLoss in logistic regression?</title>
<link>https://ask.ghassem.com/588/how-to-calculate-logloss-in-logistic-regression?show=874#a874</link>
<description>&lt;p&gt;Answer#2: Total Loss of the model&lt;/p&gt;

&lt;p&gt;first we have to find all the probability of the student passing the course&lt;/p&gt;

&lt;p&gt;lets i is representing the sampling index of the student&lt;/p&gt;

&lt;p&gt;P1:&lt;/p&gt;

&lt;p&gt;Z=-64+(2*29)=-6&lt;/p&gt;

&lt;p&gt;P=1/(1+e^6)=0.0024&lt;/p&gt;

&lt;p&gt;P2:&lt;/p&gt;

&lt;p&gt;Z=-64+(2*15)=-34&lt;/p&gt;

&lt;p&gt;P=1/(1+e^34)=0 (THE VALUE IS SO SMALL)&lt;/p&gt;

&lt;p&gt;P3: ALREADY KNOW = 0.88&lt;/p&gt;

&lt;p&gt;P4:&lt;/p&gt;

&lt;p&gt;Z=-64+(2*28)=-8&lt;/p&gt;

&lt;p&gt;P=1/(1+e^8)=0.00033&lt;/p&gt;

&lt;p&gt;P5:&lt;/p&gt;

&lt;p&gt;Z=-64+(2*39)=14&lt;/p&gt;

&lt;p&gt;P=1/(1+e^-14)=0.999&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;THE TOTAL LOSS OF THE MODEL IS CALCULATED BELOW, BY USING THE FORMULA&lt;/p&gt;

&lt;p&gt;Log-loss= -(yi*ln(P1)+(1-yi)ln(1-P1))&lt;/p&gt;

&lt;p&gt;LOG-LOSS 1= -2.4E-3&lt;/p&gt;

&lt;p&gt;LOG-LOSS 2= 0&lt;/p&gt;

&lt;p&gt;LOG-LOSS 3= -&amp;nbsp;0.128&lt;/p&gt;

&lt;p&gt;LOG-LOSS 4= -8.0164&lt;/p&gt;

&lt;p&gt;LOG-LOSS 5= -0.001&lt;/p&gt;

&lt;p&gt;TOTAL LOSS OF THE MODEL= LOG-LOSS= - (1/5)(-2.4E-3+0-&amp;nbsp;0.128-8.0164-0.001) = 1.6296&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Answer 1:&lt;strong&gt;the loss of model&lt;/strong&gt;&amp;nbsp;for the student who studied 33 hours&lt;/p&gt;

&lt;p&gt;Step 1: we have to find the probability to passing the course&lt;/p&gt;

&lt;p&gt;P=1/(1+e^-z)&lt;/p&gt;

&lt;p&gt;where z= odd= -64+(2*33)=2&lt;/p&gt;

&lt;p&gt;after putting the values... P=1/(1+e^-2)=0.88&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Now, lets calculate the log-loss of the model for that particular student, has sample number 3 which is &quot;i&quot; the sampling index&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Log-loss= (yi*ln(P1)+(1-yi)ln(1-P1))&lt;/p&gt;

&lt;p&gt;Log-loss=[1*ln(0.88)+(1-1)ln(1-0.88)]&lt;/p&gt;

&lt;p&gt;Answer#1: Log-loss= - 0.128 loss of model for the student&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/588/how-to-calculate-logloss-in-logistic-regression?show=874#a874</guid>
<pubDate>Sun, 17 Oct 2021 16:48:29 +0000</pubDate>
</item>
<item>
<title>Which algorithm is best to detect anomalies within a data set of 5k+ user-login events?</title>
<link>https://ask.ghassem.com/1000/which-algorithm-best-detect-anomalies-within-login-events</link>
<description>I am trying to build an unsupervised ML model to detect anomalies within 5000+ users&amp;#039; login data. &amp;nbsp;I selected 5 features contained within each of the user-login events (e.g. IP, hour of day, day of week, device_id, OS). &amp;nbsp;I am looking for the best algorithm to use. &amp;nbsp;I am considering using density function to determine probabilities of the feature values and whether an event is an outlier. &amp;nbsp;The problem is that feature values are only relevant to the specific user. &amp;nbsp;For example, you cannot compare login IP across users, login IP is only applicable to the user. &lt;br /&gt;
Ultimately, I want to detect events that are changes in a user login behavior, like different IP, day, hour, device_id, or OS, where the more features that have changed increase the probability of an outlier. &lt;br /&gt;
At this point, I am not sure how to build a model with data that contains multiple users, because I don&amp;#039;t know how to separate the user data so the model is trained per user and finding anomalies within the individual user&amp;#039;s features.&lt;br /&gt;
&lt;br /&gt;
I also don&amp;#039;t have any labeled data to use for testing, should I fabricate some?&lt;br /&gt;
&lt;br /&gt;
Any advice greatly appreciated.&lt;br /&gt;
&lt;br /&gt;
Thank you!</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1000/which-algorithm-best-detect-anomalies-within-login-events</guid>
<pubDate>Tue, 05 Oct 2021 17:45:38 +0000</pubDate>
</item>
<item>
<title>How we incorporate the polyline in machine learnning tools</title>
<link>https://ask.ghassem.com/999/how-we-incorporate-the-polyline-in-machine-learnning-tools</link>
<description>Suppose I have to predict the traffic of a road segment based on available data such as number of houses and business along the road segment. Which machine learning tool would be the option to use that can incorporate the road segment (polylines) through coordinates in the attributes.</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/999/how-we-incorporate-the-polyline-in-machine-learnning-tools</guid>
<pubDate>Wed, 29 Sep 2021 06:16:30 +0000</pubDate>
</item>
<item>
<title>Commented: How to calculate Softmax Regression probabilities?</title>
<link>https://ask.ghassem.com/591/how-to-calculate-softmax-regression-probabilities?show=998#c998</link>
<description>The question doesn&amp;#039;t make any mention of a bias, so we just assume it is 1?</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/591/how-to-calculate-softmax-regression-probabilities?show=998#c998</guid>
<pubDate>Mon, 05 Jul 2021 18:14:35 +0000</pubDate>
</item>
<item>
<title>Commented: How to calculate convolutions on a CONV layer for a Convolutional Neural Network?</title>
<link>https://ask.ghassem.com/650/calculate-convolutions-layer-convolutional-neural-network?show=997#c997</link>
<description>for part a I am getting below results fo rconcolved features:&lt;br /&gt;
R: 0 2 0 , 2 2 0, 3 3 2&lt;br /&gt;
G: -1 2 2 , -3 0 4 , 0 0 1&lt;br /&gt;
B: -2 -2 0, 1 2 2 , -1 1 3&lt;br /&gt;
How do we consider the bias for this section?</description>
<category>Deep Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/650/calculate-convolutions-layer-convolutional-neural-network?show=997#c997</guid>
<pubDate>Sat, 03 Jul 2021 17:34:53 +0000</pubDate>
</item>
<item>
<title>Commented: How to update the weights in backpropagation algorithm when activation function in not linear?</title>
<link>https://ask.ghassem.com/901/update-weights-backpropagation-algorithm-activation-function?show=996#c996</link>
<description>While the question says activation function for hidden layer, the solution applies the same activation function to output layer as well. Do we also need to apply activation function to output layer?</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/901/update-weights-backpropagation-algorithm-activation-function?show=996#c996</guid>
<pubDate>Sat, 03 Jul 2021 14:44:08 +0000</pubDate>
</item>
<item>
<title>Commented: What are the main branches of Machine Learning?</title>
<link>https://ask.ghassem.com/13/what-are-the-main-branches-of-machine-learning?show=995#c995</link>
<description>Hi, can I use this picture as reference in my master thesis? Thank you</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/13/what-are-the-main-branches-of-machine-learning?show=995#c995</guid>
<pubDate>Mon, 28 Jun 2021 11:17:30 +0000</pubDate>
</item>
<item>
<title>should i start as a data analyst then data science?</title>
<link>https://ask.ghassem.com/994/should-i-start-as-a-data-analyst-then-data-science</link>
<description>should I start as a data analyst then data science?&lt;br /&gt;
&lt;br /&gt;
I am a second-year Bachelor&amp;#039;s in Computer Science and wanted to pursue to be a Data Scientist.&lt;br /&gt;
&lt;br /&gt;
However, when I am trying to apply for internships/jobs, most of it requires a Masters&amp;#039;s/Ph.D.&lt;br /&gt;
&lt;br /&gt;
But, a Data Analyst has fewer requirements.&lt;br /&gt;
&lt;br /&gt;
Do you recommend starting off as a Data Analyst and then change to Data Science?</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/994/should-i-start-as-a-data-analyst-then-data-science</guid>
<pubDate>Mon, 21 Jun 2021 20:31:04 +0000</pubDate>
</item>
<item>
<title>how many samples do we need to test image segmentation using synthetic data ?</title>
<link>https://ask.ghassem.com/993/many-samples-need-test-image-segmentation-using-synthetic</link>
<description>Hello,&lt;br /&gt;
&lt;br /&gt;
I trained a CNN using synthetic data to perform a segmentation task on human faces. During the test and to evaluate the prediction of this network, I used 200 examples from the database to compute precision and recall.&lt;br /&gt;
&lt;br /&gt;
Is this number sufficient, knowing that I control myself the data generator and that I build the database by randomly drawing the elements using centered Gaussian distributions.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,</description>
<category>Deep Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/993/many-samples-need-test-image-segmentation-using-synthetic</guid>
<pubDate>Mon, 21 Jun 2021 12:26:32 +0000</pubDate>
</item>
<item>
<title>Answered: How best to ensure data quality?</title>
<link>https://ask.ghassem.com/990/how-best-to-ensure-data-quality?show=992#a992</link>
<description>This is really a broad question. The best quality usually comes from a good quality source of data generation.</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/990/how-best-to-ensure-data-quality?show=992#a992</guid>
<pubDate>Fri, 11 Jun 2021 18:09:27 +0000</pubDate>
</item>
<item>
<title>Answered: Can we have multiple target values in a ML problem dataset for supervised learning?</title>
<link>https://ask.ghassem.com/989/multiple-target-values-problem-dataset-supervised-learning?show=991#a991</link>
<description>&lt;p&gt;Yes, not only it is possible&amp;nbsp;it is also possible with Neural Networks with output layers with more than one neuron, but also with &lt;a rel=&quot;nofollow&quot; href=&quot;https://scikit-learn.org/stable/modules/multiclass.html&quot;&gt;traditional machine learning algorithms&lt;/a&gt;.&lt;/p&gt;</description>
<category>Machine Learning Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/989/multiple-target-values-problem-dataset-supervised-learning?show=991#a991</guid>
<pubDate>Fri, 11 Jun 2021 16:48:41 +0000</pubDate>
</item>
<item>
<title>Searching for movie dataset containing movie synopses/plots?</title>
<link>https://ask.ghassem.com/988/searching-for-movie-dataset-containing-movie-synopses-plots</link>
<description>Hello&lt;br /&gt;
To build a hybrid recommendation system, I used the movielens 1M dataset, for the collaborative filtering part. Now, I&amp;#039;m looking for a database/dataset that contains descriptions/summaries/details/synopses/plots of movies for the content-based recommendation.&lt;br /&gt;
Is there someone who could help me and tell me where I can find a such dataset?&lt;br /&gt;
thank you in advance.</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/988/searching-for-movie-dataset-containing-movie-synopses-plots</guid>
<pubDate>Thu, 27 May 2021 09:57:31 +0000</pubDate>
</item>
<item>
<title>Intermittent Mathematics (Logarim)</title>
<link>https://ask.ghassem.com/986/intermittent-mathematics-logarim</link>
<description>&lt;p&gt;&lt;strong&gt;The old keypad of the telephone, it has 10 numbers (10 keys) , this keypad allows the user to enter a text by successively pressing certain key many times in a small period of time. you need to draw a graph of entering a text input using this keypad.&amp;nbsp; after that you need to have a certain algorithm of finding the length of a path to enter certain text&lt;br&gt;
example&amp;nbsp;&lt;br&gt;
aaa&amp;nbsp; &amp;nbsp;--&amp;gt; 6&lt;br&gt;
aba&amp;nbsp; &amp;nbsp;--&amp;gt; 5&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;the link below shows the phone keypad&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://commons.wikimedia.org/wiki/File:Telephone-keypad.png&quot;&gt;https://commons.wikimedia.org/wiki/File:Telephone-keypad.png&lt;/a&gt;&lt;br&gt;
&amp;nbsp;&lt;/p&gt;</description>
<category>Web Development</category>
<guid isPermaLink="true">https://ask.ghassem.com/986/intermittent-mathematics-logarim</guid>
<pubDate>Wed, 05 May 2021 12:16:20 +0000</pubDate>
</item>
<item>
<title>Answered: How to calculate average with deviating sensors?</title>
<link>https://ask.ghassem.com/983/how-to-calculate-average-with-deviating-sensors?show=985#a985</link>
<description>What seems to work is simple: create 4 new columns:&lt;br /&gt;
x=average(3 values) ; y=stdev.p(3 values) ; low threshold = x-y ; high threshold = x+y&lt;br /&gt;
, then repeat the numbers if these are within the boundaries and make a &amp;#039;n/a&amp;#039; if outside of stdev.&lt;br /&gt;
With the values repeated (as within thresholds), the average can be calculated neglecting the extreme values.&lt;br /&gt;
Example:&lt;br /&gt;
10 ; 11 ; 20 : avg=13.67: stdev.p = 4.49; Low=9.17; high=18.16, so&lt;br /&gt;
10 ; 11; n/a &amp;nbsp;(as 20 &amp;nbsp;&amp;gt; 18.16)&lt;br /&gt;
this gives an average of 10.5 &amp;nbsp;&amp;nbsp;:-)&lt;br /&gt;
&lt;br /&gt;
Agree?</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/983/how-to-calculate-average-with-deviating-sensors?show=985#a985</guid>
<pubDate>Wed, 05 May 2021 11:49:02 +0000</pubDate>
</item>
<item>
<title>The old keypad of the telephone, it has 10 sume. yypad.  after that yout</title>
<link>https://ask.ghassem.com/984/the-old-keypad-of-the-telephone-has-sume-yypad-after-that-yout</link>
<description>هاد سؤال رياضيات متقطعة&lt;br /&gt;
&lt;br /&gt;
The old keypad of the telephone, it has 10 numbers (10 keys) , this keypad allows the user to enter a text by successively pressing certain key many times in a small period of time. you need to draw a graph of entering a text input using this keypad. &amp;nbsp;after that you need to have a certain algorithm of finding the length of a path to enter certain text</description>
<category>Web Development</category>
<guid isPermaLink="true">https://ask.ghassem.com/984/the-old-keypad-of-the-telephone-has-sume-yypad-after-that-yout</guid>
<pubDate>Tue, 04 May 2021 14:39:49 +0000</pubDate>
</item>
<item>
<title>design a computer-based system that will encourage autistic children to communicate and express themselves better.</title>
<link>https://ask.ghassem.com/982/computer-encourage-autistic-children-communicate-themselves</link>
<description>a) A company has been asked to design a computer-based system that will encourage autistic children to communicate and express themselves better.&lt;br /&gt;
&lt;br /&gt;
b) What type of interaction would be appropriate to use at the interface for this particular user group?</description>
<category>Human Computer Interaction</category>
<guid isPermaLink="true">https://ask.ghassem.com/982/computer-encourage-autistic-children-communicate-themselves</guid>
<pubDate>Thu, 01 Apr 2021 07:04:59 +0000</pubDate>
</item>
<item>
<title>Answer selected: Terminology clarification in Spark</title>
<link>https://ask.ghassem.com/979/terminology-clarification-in-spark?show=981#a981</link>
<description>The fact is the engine is still the same, regardless of which interface language you use. For some tasks, such as special cleaning we probably do not have SQL commands, and we have to use Scala or Python. Using Zeppelin, you can switch back and forth among languages the engine supports, however it is not a common practice. For some specific tasks, you can use pure Spark SQL or if you want to use the SQL in pyspark or scala, there are functions that can help you achieve the goal.&lt;br /&gt;
&lt;br /&gt;
I believe observing more examples will help you understand when you can use what.</description>
<category>Big Data Tools</category>
<guid isPermaLink="true">https://ask.ghassem.com/979/terminology-clarification-in-spark?show=981#a981</guid>
<pubDate>Wed, 17 Feb 2021 16:04:05 +0000</pubDate>
</item>
</channel>
</rss>