Got insanely excellent metric scores for your classification or regression model? Chances are you have data leakage.In this post, you will learn:What is data leakageHow to detect it andHow to avoid itYou were presented with a challenging problem.As a driven, gritty, aspiring data scientist, you used all tools that were within your reach.You gathered a reasonable amount of data. You have got a considerable amount of features. You were even able to come up with many additional features through feature engineering.You used the fanciest possible machine learning model. You made sure your model didn't overfit. You properly split your dataset in training and test sets.You even used K-Folds validation.You had been cracking your head for some time, and it seems that you finally had that "aha" moment.Chances are data leakage took on you.You were able to get an impressive 99% AUC (Area Under Curve) score for your classification problem. Your model has outstanding results when it comes to predicting labels for your testing set, properly detecting True Positives, True Negatives, False Positives and False Negatives.


I guess you came to this post by searching similar kind of issues in any of the search engine and hope that this resolved your problem. If you find this tips useful, just drop a line below and share the link to others and who knows they might find it useful too.

Stay tuned to my blogtwitter or facebook to read more articles, tutorials, news, tips & tricks on various technology fields. Also Subscribe to our Newsletter with your Email ID to keep you updated on latest posts. We will send newsletter to your registered email address. We will not share your email address to anybody as we respect privacy.


This article is related to


crossvalidation,towards-data-science,machine-learning,dataleakage