<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Ask Ghassem - Recent questions tagged cluster</title>
<link>https://ask.ghassem.com/tag/cluster</link>
<description>Powered by Question2Answer</description>
<item>
<title>Which algorithm is best to detect anomalies within a data set of 5k+ user-login events?</title>
<link>https://ask.ghassem.com/1000/which-algorithm-best-detect-anomalies-within-login-events</link>
<description>I am trying to build an unsupervised ML model to detect anomalies within 5000+ users&amp;#039; login data. &amp;nbsp;I selected 5 features contained within each of the user-login events (e.g. IP, hour of day, day of week, device_id, OS). &amp;nbsp;I am looking for the best algorithm to use. &amp;nbsp;I am considering using density function to determine probabilities of the feature values and whether an event is an outlier. &amp;nbsp;The problem is that feature values are only relevant to the specific user. &amp;nbsp;For example, you cannot compare login IP across users, login IP is only applicable to the user. &lt;br /&gt;
Ultimately, I want to detect events that are changes in a user login behavior, like different IP, day, hour, device_id, or OS, where the more features that have changed increase the probability of an outlier. &lt;br /&gt;
At this point, I am not sure how to build a model with data that contains multiple users, because I don&amp;#039;t know how to separate the user data so the model is trained per user and finding anomalies within the individual user&amp;#039;s features.&lt;br /&gt;
&lt;br /&gt;
I also don&amp;#039;t have any labeled data to use for testing, should I fabricate some?&lt;br /&gt;
&lt;br /&gt;
Any advice greatly appreciated.&lt;br /&gt;
&lt;br /&gt;
Thank you!</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1000/which-algorithm-best-detect-anomalies-within-login-events</guid>
<pubDate>Tue, 05 Oct 2021 17:45:38 +0000</pubDate>
</item>
</channel>
</rss>