Cyber Crime — Confusion Matrix

Priya Singh
7 min readJun 4, 2021

What is confusion Matrix?

A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier. It is used to measure the performance of a classification model. It can be used to evaluate the performance of a classification model through the calculation of performance metrics like accuracy, precision. Confusion matrices are widely used because they give a better idea of a model’s performance than classification accuracy does.

The above table has the following cases:

  • True Negative: Model has given prediction No, and the real or actual value was also No.
  • True Positive: The model has predicted yes, and the actual value was also true.
  • False Negative: The model has predicted no, but the actual value was Yes, it is also called a Type-II error.
  • False Positive: The model has predicted Yes, but the actual value was No. It is also called a Type-I error.
  • The target variable has two values: Positive or Negative
  • The columns represent the actual values of the target variable
  • The rows represent the predicted values of the target variable

Hence, a confusion matrix is a table with two rows and two columns that reports the number of false positives, false negatives, true positives, and true negatives. This allows for more detailed analysis than the mere proportion of correct classifications (accuracy). Accuracy will yield misleading results if the data set is unbalanced; that is when the numbers of observations in different classes vary greatly.

What are the Performance Metrics?

Recall :

Recall is defined as the ratio of the total number of correctly classified positive classes divide by the total number of positive classes. Or, out of all the positive classes, how much we have predicted correctly. Recall should be high(ideally 1).

“Recall is a useful metric in cases where False Negative trumps False Positive”

Example-Recall is important in medical cases where it doesn’t matter whether we raise a false alarm but the actual positive cases should not go undetected!

Recall would be a better metric because we don’t want to accidentally discharge an infected person and let them mix with the healthy population thereby spreading contagious virus. Now you can understand why accuracy was a bad metric for our model.

Precision :

Precision is defined as the ratio of the total number of correctly classified positive classes divided by the total number of predicted positive classes. Or, out of all the predictive positive classes, how much we predicted correctly. Precision should be high(ideally 1).

“Precision is a useful metric in cases where False Positive is a higher concern than False Negatives”

Accuracy :

Accuracy simply measures how often the classifier makes the correct prediction. It’s the ratio between the number of correct predictions and the total number of predictions. The accuracy metric is not suited for unbalanced classes.

Accuracy has its own disadvantages, for imbalanced data, when the model predicts that each point belongs to the majority class label, the accuracy will be high. But, the model is not accurate.

Errors in Confusion Matrix :

Confusion matrices have two types of errors: Type I and Type II.

Type I Error — It is basically called as False Positive is called as Type I Error. It’s the most dangerous kind of error as it means that our model has predicted or given the wrong answer but in a positive sense i.e. True. for example, in security domains, if this kind of error happens then if the security engineers will rely on the ML model’s predicted data which won’t be accurate 100% then there will be a chance of getting the systems to be cracked or hacked by the intruders as false positive means that our model has given the wrong answer but in a positive sense which means our model will not notify the security engineers about the 20 or 30% intruders which have penetrated the organization’s environment since the Machine Learning model is not 100 % accurate and the security engineers will not take a considerable action accordingly on time which may result in a huge loss to an organization.

Type II Error — False Negative is called a Type II Error. This error is so dangerous as it generally means that our model has given a wrong answer in a negative way i.e. false. But these kinds of error generally mean that the model has predicted about the passing students and it has predicted 50 students which have failed in the exam but in actual only 40 students have passed so the remaining 10 students will come under this kind of error or False Negative.

What is Cybercrime?

Cybercrime is criminal activity that either targets or uses a computer, a computer network or a networked device. Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money. Cybercrime is carried out by individuals or organizations. Some cybercriminals are organized, use advanced techniques and are highly technically skilled. Others are novice hackers.

Rarely, cybercrime aims to damage computers for reasons other than profit. These could be political or personal.

An Overview of False Positives and False Negatives

Understanding the differences between false positives and false negatives, and how they’re related to cybersecurity is important for anyone working in information security. Why? Investigating false positives is a waste of time as well as resources and distracts your team from focusing on real cyber incidents (alerts) originating from your SIEM.

What Are False Positives?

False positives are mislabeled security alerts, indicating there is a threat when in actuality, there isn’t. These false/non-malicious alerts (SIEM events) increase noise for already over-worked security teams and can include software bugs, poorly written software, or unrecognized network traffic.

By default, most security teams are conditioned to ignore false positives. Unfortunately, this practice of ignoring security alerts — no matter how trivial they may seem — can create alert fatigue and cause your team to miss actual, important alerts related to a real/malicious cyber threats (as was the case with the Target data breach).

These false alarms account for roughly 40% of the alerts cybersecurity teams receive on a daily basis and at large organizations can be overwhelming and a huge waste of time.

What Are False Negatives?

False negatives are uncaught cyber threats — overlooked by security tooling because they’re dormant, highly sophisticated (i.e. file-less or capable of lateral movement) or the security infrastructure in place lacks the technological ability to detect these attacks.

These advanced/hidden cyber threats are capable of evading prevention technologies, like next-gen firewalls, antivirus software, and endpoint detection and response (EDR) platforms trained to look for “known” attacks and malware.

No cybersecurity or data breach prevention technology can block 100% of the threats they encounter. False positives are among the 1% (roughly) of malicious malware and cyber threats most methods of prevention are prone to miss.

Strengthening Your Cybersecurity Posture

The existence of both false positives and false negatives begs the question: Does your cybersecurity strategy include proactive measures? Most security programs rely on preventative and reactive components —establishing strong defenses against the attacks those tools know exist. On the other hand, proactive security measures include implementing incident response policies and procedures and proactively hunting for hidden/unknown attacks.

Rules to help govern your approach to cybersecurity with a proactive mindset -

  • Assume you’re breached and begin your offensive (proactive) initiatives with the goal of finding those breaches. By doing so, you’ll seek to validate the strength of your defensive/prevention tools with the understanding that none of them are 100% effective.
  • Use asset discovery tools to discover the hosts, systems, servers, and applications within your network environment, because you can’t protect what you don’t know exists.
  • Execute regular compromise assessments (we recommend at least once a week) and inspect every asset residing on your network.
  • Define security policies and procedures, and implement educational/training requirements so your entire team knows what to do in the event you discover a hidden breach, or worse, fall victim to a data breach.
  • Time is your most valuable asset, so implementing tools/technology to speed your speed of detection and time to respond are key and can help your security team prevent a data breach.

If your team lacks the resources to proactively detect and respond to advanced persistent threats, consider outsourcing your security services to a Managed Detection and Response (MDR) provider. MDR companies independently advise and alert you of immediate threats and provide assistance in responding to and eliminating those threats.

Thankyou for reading this article 🥰!!

— Priya Singh

--

--