Job Post Scams

Building a Logistic Regression Model to Identify Scam Job Posts

There are a lot of job listings these days that are ghost listings, spam, or scammers trying to get your info. I've analyzed a dataset of 500 job applications that were concluded to be spam or not, and created a logistic regression model to predict if your current job application is a new opportunity, or just a scam. The data used to conduct this analysis was synthetically generated.

17% of our data was spam or scam (85 out of 500 listings). Over 80% of listings that responded to our application with a text were found to be spam. Every phone call response to the application was legit. About 67% of interviews offers that requested a chat box meeting ended up being spam.

Of the 85 spam cases, 51.8% responded via email, and 48.2% responded via text:

Classes

For the interview types that were spam, 49.4% did not offer an interview, 7.1% offered in-person, 10.6% offered a video call, and 32.9% offered a chatbox meeting:

Classes

After training the model and running the test data, we found that it correctly classified 90% of the listings in the test data (accuracy). It was correct 95% of the time when classifying non-spam listings (non-spam precision). It was correct 68% of the time in predicting spam (spam precision). It missed 24% of spam cases (recall).

We then took a look at the parameter coefficients to see the impact each had on the model.

Classes

Response_type, Scam_keywords, and interview_type were the strongest predictors in the model. Email Length, and job duration had no impact, and upon further analysis we found that the mean and standard deviations of these parameters were similar for both spam and non-spam email cases.

Predicting New Job Data

Next we inserted new info to see if it would correctly detect a spam. We gave it a listing that included scam keywords, posted for 60 days, with a response time of 2 hours, a response type of 'text', and a 'chat-box' interview offer.

**The model predicted it to be a SCAM with a 99% probability.**

To see more detail on this model, check out the jupyter notebook below.

Logistic Regression on Job Listings