Abstract
Research-based on user behavior analysis for authentication is the motivation for this research. We move ahead using a behavioral approach to identify malicious users and legitimate users. In this paper, we have explained how we have applied big data analytics to application-layer logs and predicted malicious users by employing a Machine Learning algorithm based on certain metrics explained later in the paper. Machine Learning would present a list of IP addresses or user identification tokens (UIT),deduced from live data which would be performing a malicious activity or are suspected of malicious activity based on their browsing behavior. We have created an e-commerce web application and induced vulnerabilities intentionally for this purpose. We have hosted our setup on LAMP [1] stack based on AWS cloud [2]. This method has a huge potential as any organization can imply this to monitor probable attackers thus narrowing down on their efforts to safeguard their infrastructure. The idea is based on the fact that the browsing pattern, as well as the access pattern of a genuine user,varies widely with that of a hacker. These patterns would be used to sort out the incoming traffic from and list out IP addresses and UIT that are the most probable cases of hack attempts.
1. Introduction
Applying big data analytics and machine learning on data obtained from application-layer logs would yield a list of probable candidates for malicious attempts. Plenty of work has been done in the field of cyber security and data analytics, but in this paper, we have proposed a new approach to predict a list of probable hackers. This approach is based on the application of Big Data Analytics with Machine Learning.
Abramson and Aha [3] proposed the idea of user identification based on their web browsing behavior. It Not only identifies but also differentiates between users based on their web browsing behavior. Shi et al. [4] gave the idea of implicit authentication in which they proposed authentication of users based on their behavior patterns. Al-Khazzar and Savage [5] proposed how user authentication can be performed by using information collected from user behavior in reaction to a 3D Graphical maze. Each user had a unique reaction to the graphical maze which was the idea behind identification.