<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Ask Ghassem - Recent questions without answers</title>
<link>https://ask.ghassem.com/unanswered</link>
<description>Powered by Question2Answer</description>
<item>
<title>When to use one hot encode a category and when to segment by category?</title>
<link>https://ask.ghassem.com/1034/when-to-use-one-hot-encode-category-and-when-segment-category</link>
<description>When pre processing data for machine learning. Is there any difference in using one hot encoding to turn categoric variables into numeric variables or to segment the data and the model being used along the category. So say you run a multivariate regression model on data covering 5 cities. Would a single model with one variable for each city be more better or worse than having 5 models specific for each city? Or is there no difference? Or does it depend on certain factors and intuition?</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1034/when-to-use-one-hot-encode-category-and-when-segment-category</guid>
<pubDate>Wed, 22 Feb 2023 20:30:38 +0000</pubDate>
</item>
<item>
<title>Can you verify the validity of this chart comparing the review scores for Marvel Phase 4?</title>
<link>https://ask.ghassem.com/1030/verify-validity-chart-comparing-review-scores-marvel-phase</link>
<description>&lt;p&gt;I have some skepticism about the validity of the charts below comparing the critic and audience reviews for Phase 4 of the MCU to the previous 3 phases. There are over 18 movies and tv shows in Phase 4 compared to the 6 movies in Phases 1 &amp;amp; 2 and the 11 movies in Phase 3. Also, there are far fewer critic reviews for the Phase 4 tv shows than the Phase 4 movies. For example, on Rotten Tomatoes there are only 40 critic reviews for The Falcon and the Winter Soldier and 452 critic reviews for Black Widow. Could this uneven and inconsistent number of reviews between tv shows and movies in Phase 4 be inaccurately making the overall averages higher than they should be? Or do you agree with the conclusions presented in the charts?&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://cdn.discordapp.com/attachments/997145183172964435/1059948060194652230/image.png&quot;&gt;https://cdn.discordapp.com/attachments/997145183172964435/1059948060194652230/image.png&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://cdn.discordapp.com/attachments/997145183172964435/1049356020469739520/image.png&quot;&gt;https://cdn.discordapp.com/attachments/997145183172964435/1049356020469739520/image.png&lt;/a&gt;&lt;/p&gt;</description>
<category>Exploratory Data Analysis</category>
<guid isPermaLink="true">https://ask.ghassem.com/1030/verify-validity-chart-comparing-review-scores-marvel-phase</guid>
<pubDate>Mon, 09 Jan 2023 16:29:14 +0000</pubDate>
</item>
<item>
<title>Which code has best runtime and why?(the one commented or the other one)</title>
<link>https://ask.ghassem.com/1027/which-code-has-best-runtime-and-why-the-one-commented-the-other</link>
<description># for key, value in dict.items():&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if value &amp;gt;= long:&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;long = value&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;long_name = key&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if value &amp;lt; short:&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;short = value&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;short_name = key&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;long = max(dict.values())&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;long_name = max(dict, key=dict.get)&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;short = min(dict.values())&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;short_name = min(dict, key=dict.get)</description>
<category>Python</category>
<guid isPermaLink="true">https://ask.ghassem.com/1027/which-code-has-best-runtime-and-why-the-one-commented-the-other</guid>
<pubDate>Fri, 02 Sep 2022 14:39:49 +0000</pubDate>
</item>
<item>
<title>Creating tables from unstructured texts about stock market</title>
<link>https://ask.ghassem.com/1026/creating-tables-from-unstructured-texts-about-stock-market</link>
<description>&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;p&gt;I am trying to extract information such as profits, revenues and others along with their corresponding dates and quarters from an unstructured text about stock market and convert it into a report in the table form but as there is not format of the input text, it is hard to know which entity belong to what date and quarters and which value belong to which entity. Chunking works on few documents but not enough. Is there any unsupervised way to linking entities with their corresponding dates, values and quarters?&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1026/creating-tables-from-unstructured-texts-about-stock-market</guid>
<pubDate>Tue, 02 Aug 2022 00:47:49 +0000</pubDate>
</item>
<item>
<title>How do I compare the count of a value in each year while having a different sanple size each year.</title>
<link>https://ask.ghassem.com/1025/compare-count-value-each-year-while-having-different-sanple</link>
<description>How do I accurately compare between the number of something a survey measure from my employees each year with a varying umber of survey engagement and employee size?&lt;br /&gt;
&lt;br /&gt;
If I was measuring the satisfaction of my employees over the years by collecting a survey from my them each year by asking them wether they are satisfied or not, and then comparing yes’s over the years but the number of employees who answer is not the same each year and the number of employees increases every year. How do I correctly compare this throughout each year?&lt;br /&gt;
&lt;br /&gt;
In other words, how do I remove the effect of the survey engagement rate when calculating the results?</description>
<category>general</category>
<guid isPermaLink="true">https://ask.ghassem.com/1025/compare-count-value-each-year-while-having-different-sanple</guid>
<pubDate>Wed, 08 Jun 2022 10:32:33 +0000</pubDate>
</item>
<item>
<title>Is it possible to make a forecast of a future value of Air Temperature using Fast Fourier Transform?</title>
<link>https://ask.ghassem.com/1024/possible-forecast-future-value-temperature-fourier-transform</link>
<description>Is it possible to make a forecast of a future value of Air Temperature using Fast Fourier Transform, if yes, what should be the process or how you&amp;#039;ll be able to do it. Thank you!</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/1024/possible-forecast-future-value-temperature-fourier-transform</guid>
<pubDate>Thu, 02 Jun 2022 16:10:26 +0000</pubDate>
</item>
<item>
<title>forecast log transformed fitted values for 2 years using ARMA model</title>
<link>https://ask.ghassem.com/1023/forecast-transformed-fitted-values-years-using-arma-model</link>
<description>Input is a stock price in exponential transformation. We are asked to forecast using ARMA results for 2 years.</description>
<category>Exploratory Data Analysis</category>
<guid isPermaLink="true">https://ask.ghassem.com/1023/forecast-transformed-fitted-values-years-using-arma-model</guid>
<pubDate>Wed, 04 May 2022 20:31:44 +0000</pubDate>
</item>
<item>
<title>Kmeans clustering in python - Giving original labels to predicted clusters</title>
<link>https://ask.ghassem.com/1022/kmeans-clustering-python-giving-original-predicted-clusters</link>
<description>&lt;p&gt;I have a dataset with 7 labels in the target variable.&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
X = data.drop(&#039;target&#039;, axis=1)
Y = data[&#039;target&#039;]
Y.unique()&lt;/pre&gt;

&lt;p&gt;array([&#039;Normal_Weight&#039;, &#039;Overweight_Level_I&#039;, &#039;Overweight_Level_II&#039;,&lt;br&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&#039;Obesity_Type_I&#039;, &#039;Insufficient_Weight&#039;, &#039;Obesity_Type_II&#039;,&lt;br&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&#039;Obesity_Type_III&#039;], dtype=object)&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
km = KMeans(n_clusters=7, init=&quot;k-means++&quot;, random_state=300)
km.fit_predict(X)
np.unique(km.labels_)&lt;/pre&gt;

&lt;p&gt;array([0, 1, 2, 3, 4, 5, 6])&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;After performing KMean clustering algorithm with number of clusters as 7, the resulted clusters are labeled as 0,1,2,3,4,5,6. But how to know which real label matches with the predicted label.&lt;/p&gt;

&lt;p&gt;In other words, I want to know how to give original label names to new predicted labels, so that they can be compared like how many values are clustered correctly (Accuracy).&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1022/kmeans-clustering-python-giving-original-predicted-clusters</guid>
<pubDate>Wed, 27 Apr 2022 05:32:54 +0000</pubDate>
</item>
<item>
<title>Bankruptcy prediction and credit card</title>
<link>https://ask.ghassem.com/1021/bankruptcy-prediction-and-credit-card</link>
<description>Hello everyone newbie data scientist here.&lt;br /&gt;
I&amp;#039;m working on a project to predict companies (probability of default) bankruptcy probability and to assign them a credit rating/score based on that :&lt;br /&gt;
For example below 50 probability is good and above is bad ( just for the example)&lt;br /&gt;
I have a dataset contains financial ratios and a class refers if the company is bankrupted or not (0 and one).&lt;br /&gt;
I&amp;#039;m planning to use this models:&lt;br /&gt;
Logistic regression linear discrimination analysis, decision trees, random forest, ANN, adaboost, Svm.&lt;br /&gt;
&lt;br /&gt;
The question is and i know it is a dumb question:&lt;br /&gt;
Does those models return a probability? Which i can transform to labels, I saw that in a thesis and I&amp;#039;m not sure about it.&lt;br /&gt;
&lt;br /&gt;
Otherwise, any guidance,tips anything will be appreciated.</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1021/bankruptcy-prediction-and-credit-card</guid>
<pubDate>Sun, 10 Apr 2022 05:50:14 +0000</pubDate>
</item>
<item>
<title>I cannot get this code to work. please help.</title>
<link>https://ask.ghassem.com/1018/i-cannot-get-this-code-to-work-please-help</link>
<description>&lt;p&gt;from keras.models import Sequential&amp;nbsp;&lt;br&gt;
from keras.layers import Dense&amp;nbsp;&lt;br&gt;
from keras.layers import LSTM&amp;nbsp;&lt;br&gt;
from sklearn.model_selection import train_test_split&lt;/p&gt;

&lt;p&gt;model = Sequential()&amp;nbsp;&lt;br&gt;
model.add(LSTM( 10, input_shape=(1, 1)))&amp;nbsp;&lt;br&gt;
model.add(Dense(1, activation=&quot;linear&quot;))&amp;nbsp;&lt;br&gt;
model.compile(loss=&quot;mse&quot;, optimizer=&quot;adam&quot;)&lt;/p&gt;

&lt;p&gt;X, y = get_data()&lt;/p&gt;

&lt;p&gt;X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)&lt;br&gt;
X_train_2, X_val, y_train_2, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=1)&lt;/p&gt;

&lt;p&gt;model.fit(X_train, y_train, epochs=800, validation_data=(X_val, y_val), shuffle=False)&lt;/p&gt;
html, body, table, thead, input, textarea, select {color: #bab5ab!important; background: #35393b;} input[type=&quot;text&quot;], textarea, select {color: #bab5ab!important; background: #35393b;} [data-darksite-inline-background-image-gradient] {background: linear-gradient(rgba(0, 0, 0, 0.5), rgba(0, 0, 0, 0.5))!important; -webkit-background-size: cover!important; -moz-background-size: cover!important; -o-background-size: cover!important; background-size: cover!important;} [data-darksite-force-inline-background] * {background-color: rgba(0,0,0,0.7)!important;} [data-darksite-inline-background] {background-color: rgba(0,0,0,0.7)!important;} [data-darksite-inline-color] {color: #fff!important;} [data-darksite-inline-background-image] {background-image: linear-gradient(rgba(0,0,0,0.3), rgba(0,0,0,0.3))!important}
</description>
<category>Python</category>
<guid isPermaLink="true">https://ask.ghassem.com/1018/i-cannot-get-this-code-to-work-please-help</guid>
<pubDate>Mon, 21 Mar 2022 05:59:53 +0000</pubDate>
</item>
<item>
<title>Battery data projects</title>
<link>https://ask.ghassem.com/1017/battery-data-projects</link>
<description>Where can I find projects related to battery data?</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/1017/battery-data-projects</guid>
<pubDate>Wed, 02 Mar 2022 18:11:57 +0000</pubDate>
</item>
<item>
<title>How can you build dynamic pricing model with data only from rigid pricing?</title>
<link>https://ask.ghassem.com/1016/build-dynamic-pricing-model-with-data-only-from-rigid-pricing</link>
<description>I want to build a dynamic pricing model which means if product is too expansive for a client and there is a risk that we might loose a client we lower the price for them but if client doesn&amp;#039;t care that much about the price we might increase price a little.&lt;br /&gt;
&lt;br /&gt;
All the articles I&amp;#039;ve seen describe some kind of A/B testing for the pricing and then create a model.&lt;br /&gt;
&lt;br /&gt;
I want to build a model only on the existing rigid pricing data. So I have prices offered to customers and I know who bought the product and who went to other company.&lt;br /&gt;
&lt;br /&gt;
How can I do the increasing price part?</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/1016/build-dynamic-pricing-model-with-data-only-from-rigid-pricing</guid>
<pubDate>Fri, 21 Jan 2022 06:44:31 +0000</pubDate>
</item>
<item>
<title>What analytical software would be good for a company to use?</title>
<link>https://ask.ghassem.com/1015/what-analytical-software-would-be-good-for-a-company-to-use</link>
<description>This would be for a company that is just now looking into using a software to track data for wine making.</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/1015/what-analytical-software-would-be-good-for-a-company-to-use</guid>
<pubDate>Fri, 14 Jan 2022 16:46:38 +0000</pubDate>
</item>
<item>
<title>Do you usually collect you own data or there is always a resource available for you? Or it depends on the company?</title>
<link>https://ask.ghassem.com/1014/usually-collect-always-resource-available-depends-company</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/1014/usually-collect-always-resource-available-depends-company</guid>
<pubDate>Sun, 09 Jan 2022 22:13:34 +0000</pubDate>
</item>
<item>
<title>How do I know which encoder to use to convert from categorical variables to numerical?</title>
<link>https://ask.ghassem.com/1006/know-which-encoder-convert-categorical-variables-numerical</link>
<description>So say I have a column with categorical data like different styles of temperature: &amp;#039;Lukewarm&amp;#039;, &amp;#039;Hot&amp;#039;, &amp;#039;Scalding&amp;#039;, &amp;#039;Cold&amp;#039;, &amp;#039;Frostbite&amp;#039;,... etc.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I know that we can use pd.get_dummies to convert the column to numerical data within the dataframe, but I also know that there are other &amp;#039;converters&amp;#039; (not sure if that&amp;#039;s the correct terminology) that we can use, i.e. OneHotEncoder from Sk-learn (like I could use the pipeline module to make a nice pipeline and feed my dataframe through the pipeline to also get my categorical data encoded to numerical).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
How do I know which to use? Does it matter? If it does matter, when does it matter the most (i.e. what types of problems? When there are lots of categorical variables, or few?) If anyone can give me any pointers on this type of stuff I&amp;#039;d greatly appreciate it.</description>
<category>Exploratory Data Analysis</category>
<guid isPermaLink="true">https://ask.ghassem.com/1006/know-which-encoder-convert-categorical-variables-numerical</guid>
<pubDate>Mon, 29 Nov 2021 04:09:06 +0000</pubDate>
</item>
<item>
<title>ValueError: Length mismatch: Expected axis has 60 elements, new values have 2935849 elements</title>
<link>https://ask.ghassem.com/1005/valueerror-length-mismatch-expected-elements-2935849-elements</link>
<description>&lt;p&gt;I&#039;m creating a new data frame&amp;nbsp;with the most used items grouped together. But I got the following error when grouping through ID and items.&amp;nbsp;ValueError: Length mismatch: Expected axis has 60 elements, new values have 2935849 elements.&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
df = sales_df[sales_df[&#039;shop_id&#039;].duplicated(keep=False)]
df[&#039;Grouped&#039;] = sales_df.groupby(&#039;shop_id&#039;)[&#039;item_name&#039;].transform(lambda x: &#039;,&#039;.join(x))
df2 = df[[&#039;shop_id&#039;, &#039;Grouped&#039;]].drop_duplicates()&lt;/pre&gt;

&lt;p&gt;In the aforementioned code, I&#039;m making a data frame with respect to shop id and then grouping through shop items. My objective here is to group items with similar ID.&lt;/p&gt;</description>
<category>Exploratory Data Analysis</category>
<guid isPermaLink="true">https://ask.ghassem.com/1005/valueerror-length-mismatch-expected-elements-2935849-elements</guid>
<pubDate>Fri, 26 Nov 2021 06:09:16 +0000</pubDate>
</item>
<item>
<title>Text Mining, Artificial Neural Networks, Speech Processing, Cloud Computing in DS? Essential for a good Data Scientist ?</title>
<link>https://ask.ghassem.com/1004/artificial-networks-processing-computing-essential-scientist</link>
<description></description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/1004/artificial-networks-processing-computing-essential-scientist</guid>
<pubDate>Wed, 27 Oct 2021 19:15:16 +0000</pubDate>
</item>
<item>
<title>Classification of data object might be incorrect</title>
<link>https://ask.ghassem.com/1003/classification-of-data-object-might-be-incorrect</link>
<description>&lt;p&gt;I am learning a new Salesforce product (Evergage) for the company I work for. In the program&#039;s documentation they have listed a set of data objects as an example. It appears to me that the classification might be incorrect. Their system makes a division between &#039;catalog objects&#039; and &#039;profile objects&#039; and the example they have given is a banking institution. They classified &lt;em&gt;Customer Credit Card &lt;/em&gt;as a &lt;em&gt;profile objec&lt;/em&gt;t and &lt;em&gt;Credit Card Level &lt;/em&gt;as a &lt;em&gt;catalog object. &lt;/em&gt;Seems to me that it should be the other way i.e &lt;em&gt;Customer Credit Card = catalog &lt;/em&gt;&lt;em&gt;object &lt;/em&gt;and &lt;em&gt;Credit Card Level &lt;/em&gt;=&amp;nbsp;&lt;em&gt;profile objec&lt;/em&gt;t. Maybe I am not reading the context correctly?&lt;/p&gt;

&lt;p&gt;here is a link to an image with the complete classification: &lt;a rel=&quot;nofollow&quot; href=&quot;https://drive.google.com/file/d/1nG4aX4Ty_NoHxm04AQo1Ow61m3MZ3pXm/view?usp=sharing&quot;&gt;https://drive.google.com/file/d/1nG4aX4Ty_NoHxm04AQo1Ow61m3MZ3pXm/view?usp=sharing&lt;/a&gt;&lt;/p&gt;</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/1003/classification-of-data-object-might-be-incorrect</guid>
<pubDate>Mon, 25 Oct 2021 15:26:46 +0000</pubDate>
</item>
<item>
<title>Can Data Science solve this problem?</title>
<link>https://ask.ghassem.com/1002/can-data-science-solve-this-problem</link>
<description>So, I live in Brazil, and I have a task for college that I don&amp;#039;t know what data science method to use, if at all, to solve it. My idea is the following: We Brazilians have Real (BRL) as currency, and we of course have the dollar quotation value to see &amp;quot;how many Reais a dollar is worth&amp;quot;. What I wanted to do was to make a research and see whether the Country News have any influence over this price. So for example, if Bolsonaro, our president, says some dumb stuff, the dollar got up in price, and vice versa. What I wanted to do was collect all dollar values and variance over a set time interval, and try and get webscraping to get the news over some economy sites. Here&amp;#039;s my question then: How can I correlate the news with the dollar variance over a set time? Can data science do that? How do I preprocess this, if at all? Do I need to use bag-of-words? At least I heard so... Please help and thank you for reading.</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/1002/can-data-science-solve-this-problem</guid>
<pubDate>Sun, 24 Oct 2021 15:43:11 +0000</pubDate>
</item>
<item>
<title>Which algorithm is best to detect anomalies within a data set of 5k+ user-login events?</title>
<link>https://ask.ghassem.com/1000/which-algorithm-best-detect-anomalies-within-login-events</link>
<description>I am trying to build an unsupervised ML model to detect anomalies within 5000+ users&amp;#039; login data. &amp;nbsp;I selected 5 features contained within each of the user-login events (e.g. IP, hour of day, day of week, device_id, OS). &amp;nbsp;I am looking for the best algorithm to use. &amp;nbsp;I am considering using density function to determine probabilities of the feature values and whether an event is an outlier. &amp;nbsp;The problem is that feature values are only relevant to the specific user. &amp;nbsp;For example, you cannot compare login IP across users, login IP is only applicable to the user. &lt;br /&gt;
Ultimately, I want to detect events that are changes in a user login behavior, like different IP, day, hour, device_id, or OS, where the more features that have changed increase the probability of an outlier. &lt;br /&gt;
At this point, I am not sure how to build a model with data that contains multiple users, because I don&amp;#039;t know how to separate the user data so the model is trained per user and finding anomalies within the individual user&amp;#039;s features.&lt;br /&gt;
&lt;br /&gt;
I also don&amp;#039;t have any labeled data to use for testing, should I fabricate some?&lt;br /&gt;
&lt;br /&gt;
Any advice greatly appreciated.&lt;br /&gt;
&lt;br /&gt;
Thank you!</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1000/which-algorithm-best-detect-anomalies-within-login-events</guid>
<pubDate>Tue, 05 Oct 2021 17:45:38 +0000</pubDate>
</item>
<item>
<title>How we incorporate the polyline in machine learnning tools</title>
<link>https://ask.ghassem.com/999/how-we-incorporate-the-polyline-in-machine-learnning-tools</link>
<description>Suppose I have to predict the traffic of a road segment based on available data such as number of houses and business along the road segment. Which machine learning tool would be the option to use that can incorporate the road segment (polylines) through coordinates in the attributes.</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/999/how-we-incorporate-the-polyline-in-machine-learnning-tools</guid>
<pubDate>Wed, 29 Sep 2021 06:16:30 +0000</pubDate>
</item>
<item>
<title>should i start as a data analyst then data science?</title>
<link>https://ask.ghassem.com/994/should-i-start-as-a-data-analyst-then-data-science</link>
<description>should I start as a data analyst then data science?&lt;br /&gt;
&lt;br /&gt;
I am a second-year Bachelor&amp;#039;s in Computer Science and wanted to pursue to be a Data Scientist.&lt;br /&gt;
&lt;br /&gt;
However, when I am trying to apply for internships/jobs, most of it requires a Masters&amp;#039;s/Ph.D.&lt;br /&gt;
&lt;br /&gt;
But, a Data Analyst has fewer requirements.&lt;br /&gt;
&lt;br /&gt;
Do you recommend starting off as a Data Analyst and then change to Data Science?</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/994/should-i-start-as-a-data-analyst-then-data-science</guid>
<pubDate>Mon, 21 Jun 2021 20:31:04 +0000</pubDate>
</item>
<item>
<title>how many samples do we need to test image segmentation using synthetic data ?</title>
<link>https://ask.ghassem.com/993/many-samples-need-test-image-segmentation-using-synthetic</link>
<description>Hello,&lt;br /&gt;
&lt;br /&gt;
I trained a CNN using synthetic data to perform a segmentation task on human faces. During the test and to evaluate the prediction of this network, I used 200 examples from the database to compute precision and recall.&lt;br /&gt;
&lt;br /&gt;
Is this number sufficient, knowing that I control myself the data generator and that I build the database by randomly drawing the elements using centered Gaussian distributions.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,</description>
<category>Deep Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/993/many-samples-need-test-image-segmentation-using-synthetic</guid>
<pubDate>Mon, 21 Jun 2021 12:26:32 +0000</pubDate>
</item>
<item>
<title>Searching for movie dataset containing movie synopses/plots?</title>
<link>https://ask.ghassem.com/988/searching-for-movie-dataset-containing-movie-synopses-plots</link>
<description>Hello&lt;br /&gt;
To build a hybrid recommendation system, I used the movielens 1M dataset, for the collaborative filtering part. Now, I&amp;#039;m looking for a database/dataset that contains descriptions/summaries/details/synopses/plots of movies for the content-based recommendation.&lt;br /&gt;
Is there someone who could help me and tell me where I can find a such dataset?&lt;br /&gt;
thank you in advance.</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/988/searching-for-movie-dataset-containing-movie-synopses-plots</guid>
<pubDate>Thu, 27 May 2021 09:57:31 +0000</pubDate>
</item>
<item>
<title>Intermittent Mathematics (Logarim)</title>
<link>https://ask.ghassem.com/986/intermittent-mathematics-logarim</link>
<description>&lt;p&gt;&lt;strong&gt;The old keypad of the telephone, it has 10 numbers (10 keys) , this keypad allows the user to enter a text by successively pressing certain key many times in a small period of time. you need to draw a graph of entering a text input using this keypad.&amp;nbsp; after that you need to have a certain algorithm of finding the length of a path to enter certain text&lt;br&gt;
example&amp;nbsp;&lt;br&gt;
aaa&amp;nbsp; &amp;nbsp;--&amp;gt; 6&lt;br&gt;
aba&amp;nbsp; &amp;nbsp;--&amp;gt; 5&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;the link below shows the phone keypad&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://commons.wikimedia.org/wiki/File:Telephone-keypad.png&quot;&gt;https://commons.wikimedia.org/wiki/File:Telephone-keypad.png&lt;/a&gt;&lt;br&gt;
&amp;nbsp;&lt;/p&gt;</description>
<category>Web Development</category>
<guid isPermaLink="true">https://ask.ghassem.com/986/intermittent-mathematics-logarim</guid>
<pubDate>Wed, 05 May 2021 12:16:20 +0000</pubDate>
</item>
<item>
<title>The old keypad of the telephone, it has 10 sume. yypad.  after that yout</title>
<link>https://ask.ghassem.com/984/the-old-keypad-of-the-telephone-has-sume-yypad-after-that-yout</link>
<description>هاد سؤال رياضيات متقطعة&lt;br /&gt;
&lt;br /&gt;
The old keypad of the telephone, it has 10 numbers (10 keys) , this keypad allows the user to enter a text by successively pressing certain key many times in a small period of time. you need to draw a graph of entering a text input using this keypad. &amp;nbsp;after that you need to have a certain algorithm of finding the length of a path to enter certain text</description>
<category>Web Development</category>
<guid isPermaLink="true">https://ask.ghassem.com/984/the-old-keypad-of-the-telephone-has-sume-yypad-after-that-yout</guid>
<pubDate>Tue, 04 May 2021 14:39:49 +0000</pubDate>
</item>
<item>
<title>design a computer-based system that will encourage autistic children to communicate and express themselves better.</title>
<link>https://ask.ghassem.com/982/computer-encourage-autistic-children-communicate-themselves</link>
<description>a) A company has been asked to design a computer-based system that will encourage autistic children to communicate and express themselves better.&lt;br /&gt;
&lt;br /&gt;
b) What type of interaction would be appropriate to use at the interface for this particular user group?</description>
<category>Human Computer Interaction</category>
<guid isPermaLink="true">https://ask.ghassem.com/982/computer-encourage-autistic-children-communicate-themselves</guid>
<pubDate>Thu, 01 Apr 2021 07:04:59 +0000</pubDate>
</item>
<item>
<title>Very short text classification when category text should be replaced by another category text?</title>
<link>https://ask.ghassem.com/980/classification-category-should-replaced-another-category</link>
<description>&lt;div style=&quot;max-width:800px&quot;&gt;
&lt;div style=&quot;color:#1A1A1B&quot;&gt;
&lt;p&gt;I need some tool to classify articles based on short category text which consists of two or three words separated by &#039;-&#039;. The RSS/XML tag content is for example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Foreign - News&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;Football - Foreign&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I created my own categories in DB and now I need to classify categories from parsed RSS of this news source, so it fits news categories defined by me.&lt;/p&gt;

&lt;p&gt;I would, for example need all articles containing category &quot;football&quot; to be identified as a category &lt;em&gt;Sport&lt;/em&gt; but sometimes those categories XML tags contains exact match like &lt;em&gt;Foreign - News&lt;/em&gt; should belong in the DB to category defined by me as &lt;em&gt;Foreign&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Since I used only trained decision trees frameworks from AI so for another project so far, I would like to hear advice about probably AI based approach, technique or particular framework I can use to solve this problem. I don&#039;t want to get into a dead-end street by my own poor, in the field of AI not very experienced decision.&lt;/p&gt;

&lt;p&gt;While it can be solved by many ifs and &#039;contains&#039; function, it seems to me like not a very good solution.&lt;/p&gt;

&lt;p&gt;TLDR; I need basically something like &quot;clever, flexible and universal if-elseif&quot;.&lt;/p&gt;

&lt;p&gt;NOTE: I can also use article description text, if that would be necessary but it seems to me that this former category text is &amp;nbsp;unambiguous enough for this kind of problem.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
<category>Artificial Intelligence</category>
<guid isPermaLink="true">https://ask.ghassem.com/980/classification-category-should-replaced-another-category</guid>
<pubDate>Thu, 11 Feb 2021 12:48:47 +0000</pubDate>
</item>
<item>
<title>Binary Classification and neutral tag</title>
<link>https://ask.ghassem.com/978/binary-classification-and-neutral-tag</link>
<description>&lt;p&gt;I am trying to create a sentiment analysis model using binary classification as loss.I have a batch of tweets that some of them are tagged as positive (labeled as 1)&amp;nbsp;and&amp;nbsp;negative (labeled as 0).I manage to gather some tweets that are tagged as neutral but there are less&amp;nbsp; tweets than positive and negative.My thinking is to tag them with 0.5 to balance the classification probability.Is this legit?&lt;/p&gt;

&lt;div id=&quot;gtx-trans&quot; style=&quot;position: absolute; left: 460px; top: 54px;&quot;&gt;
&lt;div class=&quot;gtx-trans-icon&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;/div&gt;</description>
<category>Deep Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/978/binary-classification-and-neutral-tag</guid>
<pubDate>Sat, 30 Jan 2021 10:08:01 +0000</pubDate>
</item>
<item>
<title>&quot;Rare words&quot; on vocabulary</title>
<link>https://ask.ghassem.com/977/rare-words-on-vocabulary</link>
<description>I am trying to create a sentiment analysis model and I have a question.&lt;br /&gt;
&lt;br /&gt;
After I preprocessed my tweets and created my vocabulary I&amp;#039;ve noticed that I have words that appear less than 5 times in my dataset (Also there are many of them that appear 1 time). Many of them are real words and not gibberish. My thinking is that if I keep those words then they will get wrong &amp;quot;sentimental&amp;quot; weights and gonna make my model worse.&lt;br /&gt;
Is my thinking right or am I missing something?&lt;br /&gt;
&lt;br /&gt;
My vocab size is around 40000 words and those that are &amp;quot;rare&amp;quot; are around 10k.Should I &amp;quot;sacrifice&amp;quot; them?&lt;br /&gt;
&lt;br /&gt;
Thanks in advance.</description>
<category>Deep Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/977/rare-words-on-vocabulary</guid>
<pubDate>Sat, 30 Jan 2021 09:57:31 +0000</pubDate>
</item>
<item>
<title>My GloVe word embeddings contain sentiment?</title>
<link>https://ask.ghassem.com/972/my-glove-word-embeddings-contain-sentiment</link>
<description>&lt;p&gt;I&#039;ve been researching sentiment analysis with word embeddings. I read papers that state that word embeddings ignore sentiment information of the words in the text. One paper states that among the top 10 words that are semantically similar, around 30 percent of words have opposite polarity e.g. happy - sad.&lt;/p&gt;

&lt;p&gt;So, I computed word embeddings on my dataset (Amazon reviews) with the GloVe algorithm in R. Then, I looked at the most similar words with cosine similarity and I found that actually every word is sentimentally similar. (E.g. beautiful - lovely - gorgeous - pretty - nice - love). Therefore, I was wondering how this is possible since I expected the opposite from reading several papers. What could be the reason for my findings?&lt;/p&gt;

&lt;p&gt;Two of the many papers I read:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Yu, L. C., Wang, J., Lai, K. R. &amp;amp; Zhang, X. (2017). Refining Word Embeddings Using Intensity Scores for Sentiment Analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(3), 671-681.&lt;/li&gt;
&lt;li&gt;Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T. &amp;amp; Qin, B. (2014). Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 1: Long Papers, 1555-1565&lt;/li&gt;
&lt;/ul&gt;</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/972/my-glove-word-embeddings-contain-sentiment</guid>
<pubDate>Sun, 03 Jan 2021 14:09:37 +0000</pubDate>
</item>
<item>
<title>Why should I use Dynamic Time Warping over GMM for timer series clustering?</title>
<link>https://ask.ghassem.com/962/why-should-dynamic-time-warping-over-timer-series-clustering</link>
<description></description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/962/why-should-dynamic-time-warping-over-timer-series-clustering</guid>
<pubDate>Fri, 04 Dec 2020 03:19:16 +0000</pubDate>
</item>
<item>
<title>is it possible to derive a new 95% CI from two separate 95% CIs?</title>
<link>https://ask.ghassem.com/961/is-it-possible-to-derive-a-new-95-ci-from-two-separate-95-cis</link>
<description>&lt;div id=&quot;i4c-draggable-container&quot; style=&quot;position: fixed; z-index: 1499; width: 0px; height: 0px;&quot;&gt;
&lt;div class=&quot;resolved&quot; data-reactroot=&quot;&quot; style=&quot;all: initial;&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;/div&gt;

&lt;div style=&quot;position: fixed; z-index: 1499; width: 0px; height: 0px;&quot;&gt;
&lt;div style=&quot;all: initial;&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;/div&gt;

&lt;div style=&quot;position: fixed; z-index: 1499; width: 0px; height: 0px;&quot;&gt;
&lt;div style=&quot;all: initial;&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;

&lt;div&gt;&amp;nbsp;&lt;/div&gt;

&lt;div id=&quot;i4c-dialogs-container&quot;&gt;&amp;nbsp;&lt;/div&gt;</description>
<category>Statistics</category>
<guid isPermaLink="true">https://ask.ghassem.com/961/is-it-possible-to-derive-a-new-95-ci-from-two-separate-95-cis</guid>
<pubDate>Mon, 23 Nov 2020 14:45:19 +0000</pubDate>
</item>
<item>
<title>Probability of a bus arrived in its destination based on weather condition</title>
<link>https://ask.ghassem.com/953/probability-arrived-destination-based-weather-condition</link>
<description>A bus is making its way to a destination. If the weather conditions are favorable today, the likelihood of delay is 3%. If the weather conditions are not favorable today, the likelihood of delay is 50%. The forecast predicts that it is 20% likely that the weather conditions will be favorable today.&lt;br /&gt;
&lt;br /&gt;
1. What is the likelihood that the bus will be delayed?&lt;br /&gt;
&lt;br /&gt;
2. The bus has arrived, but it was delayed. Given that the bus was delayed, what is the likelihood that the weather conditions were favorable?</description>
<category>Discrete Mathematics</category>
<guid isPermaLink="true">https://ask.ghassem.com/953/probability-arrived-destination-based-weather-condition</guid>
<pubDate>Mon, 09 Nov 2020 13:06:47 +0000</pubDate>
</item>
<item>
<title>Where can I find illustrative real life machine learning examples (In business,  work. etc.)?</title>
<link>https://ask.ghassem.com/924/where-find-illustrative-machine-learning-examples-business</link>
<description>Is there a website for finding illustrative real-life examples of using machine learning? For instance: for End to End Machine Learning, End to End Machine Learning, Classification, Clustering, and Unsupervised Learning.</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/924/where-find-illustrative-machine-learning-examples-business</guid>
<pubDate>Tue, 22 Sep 2020 00:47:09 +0000</pubDate>
</item>
<item>
<title>Where can I find simple machine learning mathematics explained visually?</title>
<link>https://ask.ghassem.com/923/where-simple-machine-learning-mathematics-explained-visually</link>
<description>Could you please let me know where I can find simple machine learning mathematics explained visually?</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/923/where-simple-machine-learning-mathematics-explained-visually</guid>
<pubDate>Mon, 21 Sep 2020 23:55:12 +0000</pubDate>
</item>
<item>
<title>Looking for program to graph a network with 2 clusters.</title>
<link>https://ask.ghassem.com/918/looking-for-program-to-graph-a-network-with-2-clusters</link>
<description>&lt;p&gt;I&#039;m a PhD student using qualitative content analysis to look at a number of documents to identify a norm cluster. The structure of the norms I&#039;m trying to look at is that &quot;ideations&quot; link to &quot;behaviors&quot;. I&#039;ve coded the documents so that I know I have a spreadsheet with the coded behaviors in rows and the ideations in columns. The number in a cell then shows the overlap between a behavior and an ideation.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;I&#039;m looking for a program&amp;nbsp;that would allow me to graphically portray this so that the ideations appear as bubbles clustered together, size dependent upon the number of times they were coded, with all behaviors clustered together in the same way and lines (size dependent on the strength of the connection) connecting the two. A simple example of this can be found on page 28 of &lt;a rel=&quot;nofollow&quot; href=&quot;https://minerva-access.unimelb.edu.au/bitstream/handle/11343/214510/EJIR%20Norm%20structure%20and%20evolution%20for%20Minerva.pdf?sequence=1&amp;amp;isAllowed=y&quot;&gt;this&lt;/a&gt;&amp;nbsp;(problem is constant in my question).&lt;/p&gt;

&lt;p&gt;Thanks in advance for any answers.&lt;/p&gt;</description>
<category>Programming</category>
<guid isPermaLink="true">https://ask.ghassem.com/918/looking-for-program-to-graph-a-network-with-2-clusters</guid>
<pubDate>Fri, 28 Aug 2020 09:46:16 +0000</pubDate>
</item>
<item>
<title>How to print confusion matrix if I am using stratifiedkfold method?</title>
<link>https://ask.ghassem.com/894/how-to-print-confusion-matrix-using-stratifiedkfold-method</link>
<description></description>
<category>Python</category>
<guid isPermaLink="true">https://ask.ghassem.com/894/how-to-print-confusion-matrix-using-stratifiedkfold-method</guid>
<pubDate>Thu, 06 Aug 2020 21:41:19 +0000</pubDate>
</item>
<item>
<title>I am facing the error after importing torch module in python. How can I solve it? The error link is given below</title>
<link>https://ask.ghassem.com/893/facing-error-after-importing-torch-module-python-solve-error</link>
<description>&lt;p&gt;&lt;img alt=&quot;&quot; src=&quot;https://ibb.co/MgXvrkL&quot;&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://ibb.co/MgXvrkL&quot;&gt;https://ibb.co/MgXvrkL&lt;/a&gt;&lt;/p&gt;</description>
<category>Python</category>
<guid isPermaLink="true">https://ask.ghassem.com/893/facing-error-after-importing-torch-module-python-solve-error</guid>
<pubDate>Fri, 31 Jul 2020 14:56:42 +0000</pubDate>
</item>
<item>
<title>How to split into train and test using PKL file?</title>
<link>https://ask.ghassem.com/892/how-to-split-into-train-and-test-using-pkl-file</link>
<description></description>
<category>Python</category>
<guid isPermaLink="true">https://ask.ghassem.com/892/how-to-split-into-train-and-test-using-pkl-file</guid>
<pubDate>Thu, 30 Jul 2020 22:08:47 +0000</pubDate>
</item>
<item>
<title>Suggestion For Model</title>
<link>https://ask.ghassem.com/890/suggestion-for-model</link>
<description></description>
<category>Python</category>
<guid isPermaLink="true">https://ask.ghassem.com/890/suggestion-for-model</guid>
<pubDate>Sun, 19 Jul 2020 19:49:48 +0000</pubDate>
</item>
<item>
<title>How can this data be structured for mongodb</title>
<link>https://ask.ghassem.com/889/how-can-this-data-be-structured-for-mongodb</link>
<description>&lt;p&gt;&lt;img alt=&quot;&quot; src=&quot;https://prnt.sc/tkr2g7&quot;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://prnt.sc/tkr2g7&quot; target=&quot;_blank&quot;&gt;https://prnt.sc/tkr2g7&lt;/a&gt; Hello I have a PFE about determining risks of pedestrians, and I have to make a simulator to generate data with something related to this, this is my first time working on this. I would like to know, the structure of data, I will be working with mangodb, so I would love to see an example on JSON&lt;/p&gt;</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/889/how-can-this-data-be-structured-for-mongodb</guid>
<pubDate>Sun, 19 Jul 2020 19:08:50 +0000</pubDate>
</item>
<item>
<title>guidance on sequencing data science courses below</title>
<link>https://ask.ghassem.com/844/guidance-on-sequencing-data-science-courses-below</link>
<description>Hello&lt;br /&gt;
my name is lutaaya mudathiru.&lt;br /&gt;
&lt;br /&gt;
I am planning to start data science online&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;professional courses at Harvard&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;University, but i don&amp;#039;t know which course &amp;nbsp;i should begin with . I request for help in sequencing these courses below so that i can&lt;br /&gt;
&lt;br /&gt;
benefitt more:&lt;br /&gt;
&lt;br /&gt;
1. Principles, Statistical and Computational Tools for Reproducible Science.&lt;br /&gt;
&lt;br /&gt;
2.Data Science: Inference and Modeling.&lt;br /&gt;
&lt;br /&gt;
3. Data Science: Productivity Tools&lt;br /&gt;
&lt;br /&gt;
4.Data Science: Wrangling&lt;br /&gt;
&lt;br /&gt;
5.Data Science: Linear Regression.&lt;br /&gt;
&lt;br /&gt;
6.Data Science: Machine Learning&lt;br /&gt;
&lt;br /&gt;
7.Data Science: Capstone&lt;br /&gt;
&lt;br /&gt;
8. Data Science: R Basics&lt;br /&gt;
&lt;br /&gt;
9.DataScience:Visualization&lt;br /&gt;
&lt;br /&gt;
10. DataScience:Probability.&lt;br /&gt;
&lt;br /&gt;
11. High-Dimensional Data Analysis&lt;br /&gt;
&lt;br /&gt;
12. Introduction to Linear Models and Matrix Algebra&lt;br /&gt;
&lt;br /&gt;
13. Data science:Statistics and R&lt;br /&gt;
&lt;br /&gt;
14. Fat Chance: Probability from the Ground Up&lt;br /&gt;
&lt;br /&gt;
15. Introduction to Probability (on edX)</description>
<category>Data Science</category>
<guid isPermaLink="true">https://ask.ghassem.com/844/guidance-on-sequencing-data-science-courses-below</guid>
<pubDate>Fri, 20 Mar 2020 13:55:49 +0000</pubDate>
</item>
<item>
<title>How to add delimiter to an existing table in HIVE so as to get proper table format data ?</title>
<link>https://ask.ghassem.com/802/how-delimiter-existing-table-hive-proper-table-format-data</link>
<description>When I am using the select statement to display my data in hive. It displays my data in scattered format. I want to add a delimiter so that when data is been displayed it should be in proper table format.</description>
<category>Cloud Computing</category>
<guid isPermaLink="true">https://ask.ghassem.com/802/how-delimiter-existing-table-hive-proper-table-format-data</guid>
<pubDate>Sat, 08 Feb 2020 05:18:16 +0000</pubDate>
</item>
<item>
<title>Understanding symbolic language of problem, quantificational  logic</title>
<link>https://ask.ghassem.com/786/understanding-symbolic-language-problem-quantificational</link>
<description>&lt;p&gt;Hi, i am having trouble interpreting the information contained in the relation &lt;strong&gt;R, &lt;/strong&gt;and how it should be applied to the Ps in this problem:&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Consider the formula&amp;nbsp;&lt;/p&gt;

&lt;h3&gt;∃x∃y∃z(P(x,y)∧P(z,y)∧P(x,z)∧¬P(z,x))&lt;/h3&gt;

&lt;p&gt;Under each pf these interpretations, is this formula true? In each case, R is the relation corresponding to P.&lt;/p&gt;

&lt;p&gt;(a) U = N,&amp;nbsp; &amp;nbsp;&lt;strong&gt;R = {&amp;lt;x,y&amp;gt; : x&amp;lt;y}.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;(b) U = N,&amp;nbsp; &amp;nbsp;&lt;strong&gt;R = {&amp;lt;x,x+1&amp;gt; : x≥0}.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Does &amp;lt;x,y&amp;gt; refer to the variables x,y or z in each P(a,b), and the :x&amp;lt;y refer to what the relation between these two should be?&lt;/p&gt;

&lt;p&gt;I tried something like this for (a) and got:&lt;/p&gt;

&lt;p&gt;∃x∃y∃z((x&amp;lt;y)∧(z&amp;lt;y)∧(x&amp;lt;z)∧¬(z&amp;lt;x))&lt;/p&gt;

&lt;p&gt;However I&#039;m not sure if this is correct, and I&#039;m not sure how I would do it for (b)&lt;/p&gt;</description>
<category>Discrete Mathematics</category>
<guid isPermaLink="true">https://ask.ghassem.com/786/understanding-symbolic-language-problem-quantificational</guid>
<pubDate>Sun, 26 Jan 2020 11:18:05 +0000</pubDate>
</item>
<item>
<title>Individual and group relative strength in a fixed pool of players: How to approach the problem?</title>
<link>https://ask.ghassem.com/751/individual-group-relative-strength-players-approach-problem</link>
<description>&lt;div&gt;I apologize in advance if my question sounds too basic to be worthy of anyone&#039;s time, but statistics are not part of my curriculum.&lt;/div&gt;

&lt;div&gt;
&lt;p&gt;I am developing a proof of concept of a web application modeling the contribution of individual soccer player with respect to the different teams they&#039;ve played with throughout their career. In particular, I am looking into a way of &lt;em&gt;ranking&lt;/em&gt; both individuals and groups of players as follows::&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;teammates relative strength&lt;/strong&gt;: the best/worst combinations of players when playing in the same team in the same matches;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;opponents relative strength&lt;/strong&gt;: the best/worst combinations of players when playing in opposite teams in the same matches, i.e. which tuples of teammates are the best/worst against which;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I must admit I don&#039;t quite know how to approach the problem (as I said I have no formal education in statistics or data science). I would be very grateful&amp;nbsp; if anyone could give me some directions. How should I frame this particular problem and what resources in statistics or machine learning (if indeed this is a task fit for machine learning, perhaps I am mistaken on this) would be appropriate to tackle it?&lt;/p&gt;

&lt;p&gt;I am eager to learn, so both practical examples or theoretical references (book chapters, online articles, etc) would be very welcome.&lt;/p&gt;

&lt;p&gt;Thanks in advance!&lt;/p&gt;
&lt;/div&gt;</description>
<category>Statistics</category>
<guid isPermaLink="true">https://ask.ghassem.com/751/individual-group-relative-strength-players-approach-problem</guid>
<pubDate>Tue, 29 Oct 2019 20:00:28 +0000</pubDate>
</item>
<item>
<title>What are the best website to find icons for our interfaces?</title>
<link>https://ask.ghassem.com/716/what-are-the-best-website-to-find-icons-for-our-interfaces</link>
<description>I am wondering if you can share a list of websites for &amp;nbsp;designing interfaces.</description>
<category>Human Computer Interaction</category>
<guid isPermaLink="true">https://ask.ghassem.com/716/what-are-the-best-website-to-find-icons-for-our-interfaces</guid>
<pubDate>Wed, 18 Sep 2019 20:23:54 +0000</pubDate>
</item>
<item>
<title>Data manipulation problem study resources</title>
<link>https://ask.ghassem.com/715/data-manipulation-problem-study-resources</link>
<description>&lt;p&gt;A colleague of mine is&amp;nbsp;studying for tech roles, and they&#039;re asked to solve a&amp;nbsp;consistent&amp;nbsp;type of problem&amp;nbsp;during the phone screenings: practicing manipulating data (sets, hash tables/dictionaries, arrays/lists, strings). These questions aren’t necessarily difficult problems and tend to require very little logic, and tend to be more about having a good understanding of the data types (such as listed above). I&#039;ve provided some examples in this link:&amp;nbsp;&lt;a rel=&quot;nofollow&quot; href=&quot;https://imgur.com/a/ITVeVnr&quot;&gt;https://imgur.com/a/ITVeVnr&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So I&#039;m wondering if there are resources to study these questions. They aren&#039;t really Leetcode questions or the kind of thing found on Reddit daily programmer, which is where I&#039;m generally&amp;nbsp;directed to most often in the time I&#039;ve been asking all over.&amp;nbsp;Even if it&#039;s a textbook, it would be incredibly handy. And to be clear, I&#039;m not looking for like a hack or golden secret, just resources for studying. Thank you for any help!&lt;/p&gt;</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/715/data-manipulation-problem-study-resources</guid>
<pubDate>Wed, 28 Aug 2019 12:53:53 +0000</pubDate>
</item>
<item>
<title>Using aggregate data to generate observation-level data statistically sound?</title>
<link>https://ask.ghassem.com/644/using-aggregate-generate-observation-level-statistically</link>
<description>&lt;p&gt;Context: In the realm of Paid Search Marketing. Current reporting does not provide event level data only aggregate totals with different segments.&amp;nbsp; Want to compare distributions/test statistical significance of A/B test results.&amp;nbsp; Did not want to assume that data followed normal distribution or know STDEV&amp;nbsp;for data so came with this approach.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;My Question: I am going to use the average &quot;CPA&quot; or &quot;CTR&quot; for a date range, and generate an observation for each conversion based off the average for a time range.&amp;nbsp; Is this statistically sound way if I want to generate raw data? Would I have wonky distributions because of the multiple averages?&amp;nbsp; Just want a gutcheck if I&#039;m completely off base.&amp;nbsp;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;My Aggregate data looks like below:&lt;/p&gt;

&lt;table border=&quot;1&quot; cellpadding=&quot;1&quot; style=&quot;width:500px&quot;&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th scope=&quot;col&quot;&gt;Day&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Cost&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Acquisition&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;CPA or CTR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp;1&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 40&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp;2&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 75&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$25&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Observation data I generate looks like below:&lt;/p&gt;

&lt;table border=&quot;1&quot; cellpadding=&quot;1&quot; style=&quot;width:500px&quot;&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th scope=&quot;col&quot;&gt;Day&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Acquisition&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$25&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2&lt;/td&gt;
&lt;td&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$25&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;I really appreciate your help with this question! An important project to me at work.&amp;nbsp;&amp;nbsp;&lt;/p&gt;</description>
<category>general</category>
<guid isPermaLink="true">https://ask.ghassem.com/644/using-aggregate-generate-observation-level-statistically</guid>
<pubDate>Tue, 11 Jun 2019 22:04:01 +0000</pubDate>
</item>
<item>
<title>What loss function to use in CNN-SVM model</title>
<link>https://ask.ghassem.com/641/what-loss-function-to-use-in-cnn-svm-model</link>
<description>I am using Matlab R2018b and am trying to infuse SVM classifier within CNN. My plan is to use CNN only as a feature extractor and use SVM as the classifier. I know people have already implemented it a few years back either in tensorflow or in other platforms. In implementing this I got stuck at a point during backward propagation. I got puzzled about which loss function I need to implement to upgrade the gradients and the parameters.&lt;br /&gt;
&lt;br /&gt;
Few points came up during this:&lt;br /&gt;
&lt;br /&gt;
1. I got a feeling to implement the hinge loss here. But which form of hinge loss should I implement? Should I move on to the second form of hinge loss implementation for calculating loss during backward propagation?&lt;br /&gt;
&lt;br /&gt;
2. Besides, calculating the backward loss, should I calculate the forward loss as well to find out the loss occurred in the model?&lt;br /&gt;
&lt;br /&gt;
Any form of advice doing this CNN-svm infusion will be appreciated as I am unable to find any such material implemented in Matlab to get help.&lt;br /&gt;
&lt;br /&gt;
Thanks.</description>
<category>Deep Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/641/what-loss-function-to-use-in-cnn-svm-model</guid>
<pubDate>Sat, 08 Jun 2019 09:24:21 +0000</pubDate>
</item>
</channel>
</rss>