Ask Ghassem - Recent activity in Machine Learning

Answered: Step-by-Step Hidden State Calculation in a Recurrent Neural Network

Mon, 01 Dec 2025 18:33:19 +0000

We compute each hidden state step-by-step using

$$ h_t = \text{ReLU}(W_{ih} \cdot x_t + W_{hh} \cdot h_{t-1}). $$

$ h_1 = \text{ReLU}(0.4 \cdot 3 + 0.6 \cdot 0) = 1.2 $

$ h_2 = \text{ReLU}(0.4 \cdot 3 + 0.6 \cdot 1.2) = 1.92 $

$ h_3 = \text{ReLU}(0.4 \cdot 3 + 0.6 \cdot 1.92) = 2.352 $

$ h_4 = \text{ReLU}(0.4 \cdot 3 + 0.6 \cdot 2.352) = 2.6112 $

Final Answer: $ h_4 = 2.6112 $

Answered: How to calculate feed-forward (forward-propagation) in neural network for classification?

Wed, 09 Oct 2024 13:00:11 +0000

The answer is provided below. Please comment if you have question of find mistakes

https://i.imgur.com/Qchg1sl.jpeg

Commented: How to update weights using gradient decent algorithm?

Wed, 17 Apr 2024 22:58:56 +0000

Isn't the derivative in 3) wrong? I got 2w-5 instead of 4w-10? I think person in solution forgot to incorporate the 1/2

Commented: How to update weights in backpropagation algorithm (a numerical example)?

Fri, 05 Apr 2024 21:48:20 +0000

It does make a difference, because after taking the derivative it should be (target - output) but in the solution it's (output - target)

When to use one hot encode a category and when to segment by category?

Wed, 22 Feb 2023 20:30:38 +0000

When pre processing data for machine learning. Is there any difference in using one hot encoding to turn categoric variables into numeric variables or to segment the data and the model being used along the category. So say you run a multivariate regression model on data covering 5 cities. Would a single model with one variable for each city be more better or worse than having 5 models specific for each city? Or is there no difference? Or does it depend on certain factors and intuition?

Answered: How to calculate the residual errors, (MSE),(MAE), and (RMSE)?

Fri, 27 Jan 2023 04:16:33 +0000

1. First, we need to calculate the residual errors. Residual errors are the difference between the actual values and predicted values.

Sample	Feature 1	Feature 2	Actual Value	Predicted Value	Residual Error (Actual - Predicted)
1	2	3	4	6	-2
2	3	4	5	6	-1
3	4	5	6	7	-1
4	5	6	7	8	-1
5	6	7	8	9	-1

Next, we can calculate the MSE by taking the average of the squared residual errors.

$MSE = ((-2)^2 + (-1)^2 + (-1)^2 + (-1)^2 + (-1)^2) / 5 = 10 / 5 = 2$

To calculate the MAE, we take the average of the absolute residual errors.

$MAE = (|-2| + |-1| + |-1| + |-1| + |-1|) / 5 = 6 / 5 = 1.2$

Finally, to calculate the RMSE, we take the square root of the MSE.

$RMSE = sqrt(2) = 1.41$

Therefore, the residual errors are [-2, -1, -1, -1, -1], the MSE is 2, the MAE is 1.2, and the RMSE is 1.41.

Creating tables from unstructured texts about stock market

Tue, 02 Aug 2022 00:47:49 +0000

I am trying to extract information such as profits, revenues and others along with their corresponding dates and quarters from an unstructured text about stock market and convert it into a report in the table form but as there is not format of the input text, it is hard to know which entity belong to what date and quarters and which value belong to which entity. Chunking works on few documents but not enough. Is there any unsupervised way to linking entities with their corresponding dates, values and quarters?

Kmeans clustering in python - Giving original labels to predicted clusters

Wed, 27 Apr 2022 05:32:54 +0000

I have a dataset with 7 labels in the target variable.

X = data.drop('target', axis=1)
Y = data['target']
Y.unique()

array(['Normal_Weight', 'Overweight_Level_I', 'Overweight_Level_II',
'Obesity_Type_I', 'Insufficient_Weight', 'Obesity_Type_II',
'Obesity_Type_III'], dtype=object)

km = KMeans(n_clusters=7, init="k-means++", random_state=300)
km.fit_predict(X)
np.unique(km.labels_)

array([0, 1, 2, 3, 4, 5, 6])

After performing KMean clustering algorithm with number of clusters as 7, the resulted clusters are labeled as 0,1,2,3,4,5,6. But how to know which real label matches with the predicted label.

In other words, I want to know how to give original label names to new predicted labels, so that they can be compared like how many values are clustered correctly (Accuracy).

Bankruptcy prediction and credit card

Sun, 10 Apr 2022 05:50:14 +0000

Hello everyone newbie data scientist here.
I'm working on a project to predict companies (probability of default) bankruptcy probability and to assign them a credit rating/score based on that :
For example below 50 probability is good and above is bad ( just for the example)
I have a dataset contains financial ratios and a class refers if the company is bankrupted or not (0 and one).
I'm planning to use this models:
Logistic regression linear discrimination analysis, decision trees, random forest, ANN, adaboost, Svm.

The question is and i know it is a dumb question:
Does those models return a probability? Which i can transform to labels, I saw that in a thesis and I'm not sure about it.

Otherwise, any guidance,tips anything will be appreciated.

Answered: When dealing with categorical values, should the 'year' column be encoded using OHE or OrdinalEncoder?

Mon, 20 Dec 2021 18:10:13 +0000

You should ask yourself if the order of years has an effect in predicting the price? It seems it is important. Therefore, OrdinalEncoder seems to be a better choice. If you use OneHotEncoder, you consider the years with equal weights in predicting the price.

Answer selected: How to create a Decision Tree using the ID3 algorithm?

Wed, 01 Dec 2021 11:56:37 +0000

a) See the following figure for the ID3 decision tree:

https://i.imgur.com/kizNjoc.png

b) Only the disjunction of conjunctions for Martians was required:

$\begin{aligned}
&(\text { Legs }=3) \vee \\
&(\text { Legs }=2 \wedge \text { Green }=\text { Yes } \wedge \text { Height }=\text { Tall }) \vee \\
&(\text { Legs }=2 \wedge \text { Green }=\text { No } \wedge \text { Height }=\text { Short } \wedge \text { Smelly }=\text { Yes })
\end{aligned}$

Python Code

Step 1: Organize the Dataset

Our data has the following features and values:

Species: Target variable (M or H)
Features:
- Green: $ N $ or $ Y $
- Legs: $ 2 $ or $ 3 $
- Height: $ S $ or $ T $
- Smelly: $ N $ or $ Y $

Index	Species	Green	Legs	Height	Smelly
1	M	N	3	S	Y
2	M	Y	2	T	N
3	M	Y	3	T	N
4	M	N	2	S	Y
5	M	Y	3	T	N
6	H	N	2	T	Y
7	H	N	2	S	N
8	H	N	2	T	N
9	H	Y	2	S	N
10	H	N	2	T	Y

Step 2: Calculate the Initial Entropy for the Target Variable (Species)

We start by calculating the entropy of the target variable, Species, which has two classes: M (Martian) and H (Human).

Total Counts

Martians (M): 5
Humans (H): 5
Total: 10

Entropy Formula

The entropy $ E $ for a binary classification is calculated as:

$$ E = -p_+ \log_2(p_+) - p_- \log_2(p_-) $$

Where:

$ p_+ $: Probability of positive class (M)
$ p_- $: Probability of negative class (H)

Calculation

$$ p(M) = \frac{5}{10} = 0.5 $$

$$ p(H) = \frac{5}{10} = 0.5 $$

$$ E(Species) = -0.5 \cdot \log_2(0.5) - 0.5 \cdot \log_2(0.5) $$

$$ = -0.5 \cdot (-1) - 0.5 \cdot (-1) $$

$$ = 1.0 $$

Step 3: Calculate Entropy and Information Gain for Each Feature

We’ll calculate the entropy for each feature split and determine the information gain.

Feature: Green

Green can be either Y or N.

For Green = Y:

Martians (M): 3
Humans (H): 1
Total: 4

Entropy:

$$ E(Green = Y) = -\left(\frac{3}{4}\right) \log_2\left(\frac{3}{4}\right) - \left(\frac{1}{4}\right) \log_2\left(\frac{1}{4}\right) $$

$$ = -0.75 \cdot \log_2(0.75) - 0.25 \cdot \log_2(0.25) $$

$$ = -0.75 \cdot (-0.415) - 0.25 \cdot (-2) $$

$$ = 0.311 + 0.5 = 0.811 $$

For Green = N:

Martians (M): 2
Humans (H): 4
Total: 6

Entropy:

$$ E(Green = N) = -\left(\frac{2}{6}\right) \log_2\left(\frac{2}{6}\right) - \left(\frac{4}{6}\right) \log_2\left(\frac{4}{6}\right) $$

$$ = -0.333 \cdot \log_2(0.333) - 0.667 \cdot \log_2(0.667) $$

$$ = -0.333 \cdot (-1.585) - 0.667 \cdot (-0.585) $$

$$ = 0.528 + 0.389 = 0.917 $$

Weighted Entropy for Green

$$ E(Green) = \frac{4}{10} \cdot 0.811 + \frac{6}{10} \cdot 0.917 $$

$$ = 0.3244 + 0.5502 = 0.8746 $$

Information Gain for Green

$$ IG(Species, Green) = E(Species) - E(Green) $$

$$ = 1.0 - 0.8746 = 0.1254 $$

Continue this process to calculate the entropy and information gain for each feature (Legs, Height, and Smelly) similarly.

Answer selected: How to calculate LogLoss in logistic regression?

Sun, 17 Oct 2021 16:48:29 +0000

Answer#2: Total Loss of the model

first we have to find all the probability of the student passing the course

lets i is representing the sampling index of the student

P1:

Z=-64+(2*29)=-6

P=1/(1+e^6)=0.0024

P2:

Z=-64+(2*15)=-34

P=1/(1+e^34)=0 (THE VALUE IS SO SMALL)

P3: ALREADY KNOW = 0.88

P4:

Z=-64+(2*28)=-8

P=1/(1+e^8)=0.00033

P5:

Z=-64+(2*39)=14

P=1/(1+e^-14)=0.999

THE TOTAL LOSS OF THE MODEL IS CALCULATED BELOW, BY USING THE FORMULA

Log-loss= -(yi*ln(P1)+(1-yi)ln(1-P1))

LOG-LOSS 1= -2.4E-3

LOG-LOSS 2= 0

LOG-LOSS 3= - 0.128

LOG-LOSS 4= -8.0164

LOG-LOSS 5= -0.001

TOTAL LOSS OF THE MODEL= LOG-LOSS= - (1/5)(-2.4E-3+0- 0.128-8.0164-0.001) = 1.6296

Answer 1:the loss of model for the student who studied 33 hours

Step 1: we have to find the probability to passing the course

P=1/(1+e^-z)

where z= odd= -64+(2*33)=2

after putting the values... P=1/(1+e^-2)=0.88

Now, lets calculate the log-loss of the model for that particular student, has sample number 3 which is "i" the sampling index

Log-loss= (yi*ln(P1)+(1-yi)ln(1-P1))

Log-loss=[1*ln(0.88)+(1-1)ln(1-0.88)]

Answer#1: Log-loss= - 0.128 loss of model for the student

Which algorithm is best to detect anomalies within a data set of 5k+ user-login events?

Tue, 05 Oct 2021 17:45:38 +0000

I am trying to build an unsupervised ML model to detect anomalies within 5000+ users' login data. I selected 5 features contained within each of the user-login events (e.g. IP, hour of day, day of week, device_id, OS). I am looking for the best algorithm to use. I am considering using density function to determine probabilities of the feature values and whether an event is an outlier. The problem is that feature values are only relevant to the specific user. For example, you cannot compare login IP across users, login IP is only applicable to the user.
Ultimately, I want to detect events that are changes in a user login behavior, like different IP, day, hour, device_id, or OS, where the more features that have changed increase the probability of an outlier.
At this point, I am not sure how to build a model with data that contains multiple users, because I don't know how to separate the user data so the model is trained per user and finding anomalies within the individual user's features.

I also don't have any labeled data to use for testing, should I fabricate some?

Any advice greatly appreciated.

Thank you!

How we incorporate the polyline in machine learnning tools

Wed, 29 Sep 2021 06:16:30 +0000

Suppose I have to predict the traffic of a road segment based on available data such as number of houses and business along the road segment. Which machine learning tool would be the option to use that can incorporate the road segment (polylines) through coordinates in the attributes.

Commented: How to calculate Softmax Regression probabilities?

Mon, 05 Jul 2021 18:14:35 +0000

The question doesn't make any mention of a bias, so we just assume it is 1?

Commented: How to update the weights in backpropagation algorithm when activation function in not linear?

Sat, 03 Jul 2021 14:44:08 +0000

While the question says activation function for hidden layer, the solution applies the same activation function to output layer as well. Do we also need to apply activation function to output layer?

Commented: What are the main branches of Machine Learning?

Mon, 28 Jun 2021 11:17:30 +0000

Hi, can I use this picture as reference in my master thesis? Thank you

Answered: Do I need to save the standardization transformation?

Tue, 15 Dec 2020 14:40:42 +0000

The details of standardized vector is stored in the transformer object. For example X_scaled on this page is storing all details for the mean and SD of the original vectors in training data, and you can use it to scale the new vectors.

Why should I use Dynamic Time Warping over GMM for timer series clustering?

Fri, 04 Dec 2020 03:19:16 +0000

Answered: How to predict from unseen data?

Fri, 20 Nov 2020 10:01:23 +0000

My recommendation:

Speak to or think as a football fun, obviously I am not a that type person :)

Try to find out what can help us to predict "next" game's result as an expert. Collect that data to feed in your model.(and/or any relevant data available)

For example, all matches have been played between Arsenal and Chelsea so far might have a value in your model. Also the last games each team played might have an affect at the next match's result.

As you stated in your question go with pre-match variables to "predict" next game's score.

Another model could be:

You can take the features(data) in your question for the first t minute of the match and try to predict the result. Let's say use the data belonging the first half of the match to predict second half's result.

On the other hand, the way you are doing at the moment can be helpful if you are looking for some exploratory analysis. For example which feature(s) has more impact on winning a game.

Hope this helps and looking forward to see other answers/and your analysis results.

Answer selected: How to model unknown yet data

Tue, 27 Oct 2020 15:04:19 +0000

Your answer is actually based on what we always do in machine learning. We collect datasets, split to training and testing set, we train using the training set, and evaluate performance based on the testing set.

Assume you have 100 matches with all statistics and parameters you want to use in the training (such as ball possession, number of shots, corners, etc). You can take 80 of these matches for training and the rest of 20 matches for evaluating the model you created based on 80% of data simply because you already know that "future" statistics and outcome to compare with the output of your model to check the performance.

I hope this answers your question.

Answered: From microarray data, which tools of pattern recognition can you apply to identify the genes responsible for diseases?

Thu, 15 Oct 2020 20:26:53 +0000

I am not sure if this is the best tool or not, but there is a company acquired by Nvidia and gives you access to GPU cloud to use it for application such as the one you mentioned:

https://www.nvidia.com/en-us/healthcare/clara-parabricks/

Answered: Can we use a trained model to supervise the other machine learning models?

Mon, 28 Sep 2020 14:47:39 +0000

If your goal is training a machine learning model using other machine learning models it is called meta-learning.

You can find more information in this article: "Meta-learning is a subfield of machine learning where automatic learning algorithms are applied to metadata about machine learning experiments. As of 2017, the term had not found a standard interpretation, however, the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems, hence to improve the performance of existing learning algorithms or to learn (induce) the learning algorithm itself, hence the alternative term learning to learn."

Where can I find illustrative real life machine learning examples (In business, work. etc.)?

Tue, 22 Sep 2020 00:47:09 +0000

Is there a website for finding illustrative real-life examples of using machine learning? For instance: for End to End Machine Learning, End to End Machine Learning, Classification, Clustering, and Unsupervised Learning.

Where can I find simple machine learning mathematics explained visually?

Mon, 21 Sep 2020 23:55:12 +0000

Could you please let me know where I can find simple machine learning mathematics explained visually?

Commented: How to calculate the class probabilities and classify using Naive Bayes classifier?

Thu, 13 Aug 2020 06:27:31 +0000

We can apply Laplace smoothing, still will not affect the result.... it will definitely affect if have 1 more feature which make Banana probability Zero... i.e. RED colour... in that case we dont have any other solution but to apply Laplace smoothing

Commented: How to calculate Accuracy, Precision, Recall or F1?

Wed, 12 Aug 2020 16:23:54 +0000

is F1 here is the F score in the attached document?

Comment edited: How to calculate Covariance Matrix and Principal Components for PCA?

Wed, 12 Aug 2020 15:54:22 +0000

Thank you so much!

Commented: How to calculate feed-forward (forward-propagation) in neural network?

Tue, 11 Aug 2020 18:48:12 +0000

agreed with hamzasi, I got the same answer, y hat = 0.993, Error = 40.56

Commented: How to optimize weights in Logistic Regression?

Wed, 15 Jul 2020 21:39:37 +0000

Thanks for clarification Wahab, really appreciate it

Answered: How to calculate Softmax Regression probabilities in this example?

Tue, 14 Jul 2020 19:28:03 +0000

Class 1: Setosa

Class 2: Versicolor

Class 3: Virginica

The initialized weights:

$w_{01} = w_{11}=w_{21} = 1$

$w_{02} = w_{12}=w_{22} = 2$

$w_{03} = w_{13}=w_{23} = 3$

The weight equations:

$z_1 = x_0w_{01} + x_1 w_{11} + x_2w_{21}$

$z_2 = x_0w_{02} + x_1 w_{12} + x_2w_{22}$

$z_1 = x_0w_{03} + x_1 w_{13} + x_2w_{23}$

1) $x_0 = 1 \quad x_1 = 4.6 \quad \text{and} \quad x_2 = 1.7$

$z_1 = w_{01} + x_1 w_{11} + x_2w_{21} = 1 + 4.6 + 1.7 = 7.3$

$z_2 = w_{02} + x_1 w_{12} + x_2w_{22} = 2 + 4.6(2) + 1.7(2) = 16.3$

$z_3 = w_{03} + x_1 w_{13} + x_2w_{23} = 3 + 4.6(3) + 1.7(3) = 21.9$

$e^{z_1} + e^{z_2} + e^{z_3} = e^{7.3} + e^{16.3} + e^{21.9} = 1480.3 + 11994994 + 3243763284 = 3255759758$

$p^3 = \frac{e^{z_3}}{\sum_{i=1}^3 e^{z_i}} = \frac{3243763284}{3255759758} = 0.996315307$

$\therefore$ probability to classify to virginica is 99.6%

2) $x_0 = 1 \quad x_1 = 4.6 \quad x_2 = 1.7 \quad x_3 = 5.5 \quad x_4 = 3$
$\begin{align*} z_1 &= x_0 + w_{11}x_1 + w_{21}x_2 + w_{31}x_3 + w_{41}x_4 \\ &= 1 + 4.6 + 1.7+5.5+3\\ &=15.8\end{align*}$

$\begin{align*} z_2 &= x_0 + w_{12}x_1 + w_{22}x_2 + w_{32}x_3 + w_{42}x_4 \\ &= 2 + (2)4.6 + (2)1.7+(2)5.5+(2)3\\ &=31.6 \end{align*}$

$\begin{align*} z_3 &= x_0 + w_{13}x_1 + w_{23}x_2 + w_{33}x_3 + w_{43}x_4 \\ &= 3 + (3)4.6 + (3)1.7+(3)5.5+(3)3\\ &=47.4 \end{align*}$

$\begin{align*}\sum_{i=1}^3 e^{z_i}&=e^{z_1} + e^{z_2} + e^{z_3}\\ &= e^{15.8} + e^{31.6} + e^{47.4} \\&= 7275332 + 5.29e13 + 3.85e20 \\&= 3.850866845e20\end{align*}$

$p^3 = \frac{e^{z_3}}{\sum_{i=1}^3 e^{z_i}} = \frac{3.850866316e20}{3.850866845e20} = 0.999999863$

$\therefore$ probability to classify to virginica is 99.9%

Commented: How to calculate the probability and accuracy of a Logistic Regression classifier?

Mon, 13 Jul 2020 20:10:22 +0000

just compare your estimated results with column 3 "Sold"

accuracy =Output/Input= (Correct Estimated Value)/(No Values in Column 3, which is 4)

Accuracy= 3/4= 75% in both cases if you compare your result with Column 3

Hope it will help

Answered: What is difference between Support vector machine and Support Vector Classification?

Sun, 17 May 2020 22:35:51 +0000

In machine learning, support-vector machines (SVMs, also support-vector networks) are supervised learning models with associated learning algorithms that analyze data used for both classification and regression analysis. When you call it Support Vector Classification, it means you are using these models for classification tasks.

Reshown: How to calculate k-means clustering with a numerical example?

Fri, 17 Apr 2020 18:34:37 +0000

Use the k-means algorithm and Euclidean distance to cluster the following 8 examples into 3 clusters:

$A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9)$.

Suppose that the initial seeds (centers of each cluster) are $A1$, $A4$ and $A7$. Run the k-means algorithm for 1 epoch only. At the end of this epoch show:

a) The new clusters (i.e. the examples belonging to each cluster)

b) The centers of the new clusters

c) Draw a 10 by 10 space with all the 8 points and show the clusters after the first epoch and the new centroids.

d) How many more iterations are needed to converge? Draw the result for each epoch

Answered: Pre trainned word Embeddings and Preproceess

Sat, 11 Apr 2020 07:23:18 +0000

I believe this article will help you to understand where to use each of the techniques you mentioned.

Answered: Can PCA be used for supervised learning?

Tue, 18 Feb 2020 22:12:16 +0000

PCA can be used indirectly in supervised learning tasks such as classification and regression. When you have huge number of features, one way to reduce the number of features and probably avoid overfitting is using a feature reduction method such as PCA. Therefore, PCA can be used in preprocessing step to reduce the number of features.

Answered: How to calculate residual errors for linear regression and interpret regression metrics?

Tue, 18 Feb 2020 18:37:34 +0000

You can take a look at this article which shows with an example linear regression equation. For example, the definition of MAE is given in the following figure:

https://i.imgur.com/tqnei6J.jpg

Answered: How can I find the "Sate of the art" approaches in Machine Learning?

Tue, 18 Feb 2020 16:48:24 +0000

A great website, called PaperWithCode, lists all the latest approaches with their source codes on GitHub. For example, if you are looking the latest methods for object detection, you can take a look at the latest Object Detection methods on COCO test-dev timeline here.

https://i.imgur.com/kTtsmtQ.png

Answered: How to map (string compare) a string with 10000+ strings in DB? which is the best way to do it?

Tue, 18 Feb 2020 16:39:42 +0000

I think the problem you mentioned can be solved using a tree structure such as Trie:
https://www.cs.helsinki.fi/u/tpkarkka/opetus/10s/spa/lecture07.pdf

Answered: why after applied the OneHotEncoder, it will create one more column whatis that column for?

Tue, 18 Feb 2020 16:38:41 +0000

As you descierbed in the comments, additional columns are used as dummy variables.

Answered: Can I use a single Pipeline for multiple estimators in scikit-learn?

Tue, 18 Feb 2020 14:19:54 +0000

Yes, it is possible. Please take a look at this post on StackOverFlow.

Answer selected: score() vs accuracy_score() in sklearn

Wed, 22 Jan 2020 01:53:33 +0000

Q1: knn.score(X_test, y_test) calls accuracy_score of sklearn.metrics for classifier. For regressor, it calls r2_score, which is the coefficient of determination defined in the statistics course.

You can find the source code of knn.score here. It’s open source. https://github.com/scikit-learn/scikit-learn/blob/a24c8b46/sklearn/base.py#L324

Q2: accuracy_score is not a method of knn, but a method of sklearn.metrics. If normalize argument is true, accuracy_score(knn.predict(X_test),y_test) returns the same result as knn.score(X_test,y_test). You can check document below for more details

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html

Q3: As explained above, yes, they return the same result, but only in the give situation

Q4: If there is bias after the split, the bias still exists whichever data set is compared. Here the bias exists when the data distribution in the train set and the data distribution in the whole set are not the same. Taking the Iris dataset as example, if the distribution of the three classes (Setosa, Versicolour, Virginica) is 50-50-50 in the 150 samples, and you make a 20-80 split, then the distribution of the three classes in the train set should be 40-40-40. If not, there’s bias, because your train set is different from the population in terms of data distribution.

This may be why Elon doesn't trust the simulation and insist on using the data from the real world to train the Tesla auto-pilot system.

Commented: Best algorithm for table reservation

Tue, 22 Oct 2019 04:22:58 +0000

Thanks for reply. :)
I meant a restaurant table. You can also reframe this how to forecast a single hotel room reservations.

Answered: What are the types of Classification and regression algorithms in Machine learning ?

Fri, 28 Jun 2019 19:10:38 +0000

Depending on the definition of the model and how the cost function is designed one estimator can be used for both classification and regression. Some estimators such as k-NN or SVM can be used for both regression and classification. Logistic Regression and Softmax Regressions are just (normally) useful for classification. Therefore, you need to search to see if there is a version of the estimator with the same name that can be used for different purposes.

Answer selected: How to calculate the class probabilities and classify using Naive Bayes classifier for NLP?

Thu, 27 Jun 2019 03:23:08 +0000

The solution is provided in this url. The classifier will assign Sport for the tag of "A very close game".

Answer selected: How to perform a classification or regression using k-NN?

Thu, 27 Jun 2019 03:22:07 +0000

a) You can calculate the distances or simply visualize it:

https://i.imgur.com/PxZn1Sp.png

(The code of above visualization is available here)

Based on the above visualization, 3 closest neighbors are blue points. Therefore, the predicted class is blue.

b) The 3 closest neighbors are data points: $(0,1,\$50)$, $(1,0,\$30)$, and $(1,2,\$40)$. Therefore, the estimated price is the mean of the target values $\frac{50+40+30}{3}=\$40$

Answered: What is the difference between cross-validation and validation set?

Wed, 19 Jun 2019 19:06:12 +0000

There are several variations of the cross-validation algorithm. In the k-fold cross-validation, we have k-fold, and we divide the training set to multiple folds and run k-fold cross-validation. In some cases, usually when we are running machine learning diagnosis tests to see if our problem suffers from high variance or high bias, we split the dataset to 3 different splits of train, validation, and test, and measure the trained model on train set on all of these 3 splits by adding more data points each time.

Another usage for having a separate validation set is when we have a complex model which takes a long time to train or when we deal with big data. In any of these two cases such as when we are training deep neural networks, k-fold cross validation is so expensive. In each epoch, we just validate the model trained by the validation set and based on the results on the validation set, we continue to update the hyper-parameters. We also compare the results with the test set that we have never used during the training to make sure our training can generalize.

Answered: In DBSCAN algorithm, how should we choose optimal eps and minimum points?

Thu, 13 Jun 2019 17:54:32 +0000

There is no general way of choosing minPts. It depends on the context of the problem and what you are looking for. Similar to other unsupervised learning problems, the results could be totally wrong, even if you choose some optimal values for the hyperparameters. However, we can mention to some facts:

Rule of Thumb values for minPts and eps:

In this page, the rule of thumb values is discussed.

minPts:

A low minPts will create clusters for outliers or noise. A low minPts means it will build more clusters from outliers, therefore we don't choose a too small value for it. minPts is best set by a domain expert who understands the data well. Unfortunately many cases we don't know the domain knowledge, especially after data is normalized. One heuristic approach is using $\ln(n)$, where $n$ is the total number of points to be clustered.

epsilon (eps):
For epsilon, there are various aspects. It again boils down to choosing whatever works on this data set and this minPts and this distance function and this normalization. You can try to do a kNN distance (k-distance plot) histogram for your dataset and choose a "knee" there, but there might be no visible one, or multiple.

Basically, you compute the k-nearest neighbors (k-NN) for each data point to understand what is the density distribution of your data, for different k. the KNN is handy because it is a non-parametric method. Once you choose a minPts (which strongly depends on your data again), you fix k to that value. Then you use an epsilon the k-distance corresponding to the area of the k-distance plot (for your fixed k) with a low slope. The method proposed consists of computing the k-nearest neighbor distances in a matrix of points. The idea is to calculate the average of the distances of every point to its k nearest neighbors. The value of $k$ will be specified by the user and corresponds to minPts. Next, these k-distances are plotted in ascending order. The aim is to determine the “knee”, which corresponds to the optimal eps parameter. A knee corresponds to a threshold where a sharp change occurs along the k-distance curve. It can be seen that the optimal eps value is around a distance of 0.15.

https://i.imgur.com/2Om1mD8.png

OPTICS and other extensions

Some extensions on top of the DBSCAN is created such as OPTICS. OPTICS produce hierarchical clusters, we can extract significant flat clusters from the hierarchical clusters by visual inspection, OPTICS implementation is available in Python module pyclustering. One of the original authors of DBSCAN and OPTICS also proposed an automatic way to extract flat clusters, where no human intervention is required, for more information you can read this paper. Also, there are some other popular extensions such as HDBSCAN that could be found here.

At the end of the day, similar to other clustering algorithms, because we do not have labels, it is hard to get reliable results using clustering algorithms, and it is still one of the areas of need improvement in the future.

Answered: How do I Plot the linear classifier calculated with LIBLINEAR using sklearn?

Mon, 20 May 2019 16:25:15 +0000

I think this article shows you how to achieve your goal by showing some examples of using a categorical variable to color scatterplot.

Answered: Could you please explain math symbols behind Machine Learning equations?

Sat, 18 May 2019 20:00:59 +0000

The following figure shows the symbols which are common in machine learning:

https://i.imgur.com/nJVT8cN.png

Index	Species	Green	Legs	Height	Smelly
1	M	N	3	S	Y
2	M	Y	2	T	N
3	M	Y	3	T	N
4	M	N	2	S	Y
5	M	Y	3	T	N
6	H	N	2	T	Y
7	H	N	2	S	N
8	H	N	2	T	N
9	H	Y	2	S	N
10	H	N	2	T	Y

Index	Species	Green	Legs	Height	Smelly
1	M	N	3	S	Y
2	M	Y	2	T	N
3	M	Y	3	T	N
4	M	N	2	S	Y
5	M	Y	3	T	N
6	H	N	2	T	Y
7	H	N	2	S	N
8	H	N	2	T	N
9	H	Y	2	S	N
10	H	N	2	T	Y

Index	Species	Green	Legs	Height	Smelly
1	M	N	3	S	Y
2	M	Y	2	T	N
3	M	Y	3	T	N
4	M	N	2	S	Y
5	M	Y	3	T	N
6	H	N	2	T	Y
7	H	N	2	S	N
8	H	N	2	T	N
9	H	Y	2	S	N
10	H	N	2	T	Y