The Seven Sins of Machine Learning

Machine learning is a great tool that is revolutionizing our world right now. There are lots of great applications in which machine and in particular deep learning has shown to be way superior to traditional methods. Beginning from Alex-Net for Image Classification to U-Net for Image Segmentation, we see great successes in computer vision and medical image processing. Still, I see machine learning methods fail every day. In many of these situations, people fell for one of the seven sins of machine learning.

While all of them are severe and lead to wrong conclusions, some are worse than others and even machine learning experts may commit such sins in their excitement on their own work. Many of these sins are hard to spot, even for other experts because you need to look at code and experimental setup in detail in order to be able to figure them out. In particular, if your results seem too good to be true, you may want to use this blog post as a checklist in order to avoid wrong conclusions about your work. Only if you are absolutely sure that you didn’t fall for any of these fallacies, you should go ahead and report your results to colleagues or the general public.

Sin #1: Data and Model Abuse

Overfitting produces models that perfectly explain training data, but typically do not generalize to new observations. Image under CC BY 4.0 from the Deep Learning Lecture.

This sin is often committed by beginners in deep learning. In the most frequent occurrence, the experimental design is flawed, e.g. the training data is used as test data. With simple classifiers such as the nearest neighbor, this immediately leads to a 100% recognition rate for most problems. In more sophisticated and deep models, it may not be 100%, but 98–99% accuracy. Hence, you should   your experimental setup in your first shot. If you go to new data, however, your model will completely break and you may even produce results that are worse than random guessing, i.e. lower accuracies than 1/K where K is the number of classes, e.g. less than 50% in a two-class problem. In the same line, you can also easily overfit your model by increasing the number of parameters such that it completely memorizes the training data set. Another variant is using a too-small training set that is not representative of your application. All these models are likely to break on new data, i.e. when employed in a real application scenario.

Sin #2: The Unfair Comparison

Don’t be unfair in your comparisons. You may get the results you want, but they may not be reproducible on other data. Image by Elias Sch. from Pixabay.

Even experts in machine learning may fall to this sin. It is typically committed if you want to demonstrate that your new method is better than the state-of-the-art. In particular research papers often succumb to this one, in order to convince reviewers of the superiority of their method. In the most simple case, you download a model from some public repository and use this model without fine-tuning or appropriate hyperparameter search to a model that was developed exactly to the problem at hand and you tweaked all parameters to get optimal performance on your test data. There are numerous instances of this sin in literature. The most recent example is exposed by Isensee et al. in their not-new-net paper in which they demonstrate that the original U-net outperforms virtually all suggested improvements of the method since 2015 on ten different problems. Hence, you should 

Sin #3: The Insignificant Improvement

Significance testing makes sure that you are not just reporting on a drop in the ocean. Image by FelixMittermeier from Pixabay.

After doing all the experiments, you finally found a model that produces better results than the state-of-the-art models. However, even at this point, you are not done yet. Everything in machine learning is inexact. Also, your experiments are influenced by many random factors due to the probabilistic nature of the learning process. In order take this randomness into consideration, you need to perform statistical testing. This is typically performed by running your experiments multiple times using different random seeds. This way, you can report an average performance and a standard deviation for all of your experiments. Using a significance test, like the t-test, you can now determine the probability that the observed improvements are merely related to chance. This probability should be at least lower than 5% or 1% in order to deem your results significant. In order to do so, you do not have to be an expert statistician. There are even online tools to compute them, e.g. for recognition rate comparison or correlation comparison. If you run repeated experiments, make sure that you also apply Bonferroni Correction, i.e. you divide the required significance level by the number of experimental repetitions on the same data. For more details on statistical testing, you should check this video of our Deep Learning Lecture.

Sin #4: Confounders and Bad Data

51 Speakers recorded with two different microphones after dimensional scaling. Each dot represents one recording. The dominant factor of variation in this data is the difference of the microphones. Image under CC BY 4.0 from the Deep Learning Lecture.

Data quality is one of the greatest pitfalls of machine learning. It may induce critical biases and even result in racist AI. The problem, however, does not lie in the training algorithm, but in the data itself. As an example, we show dimensionality reduced recordings of 51 speakers using two different microphones. Because, we recorded the same, speakers, they should actually be projected onto the same spot given appropriate feature extraction. However, we can observe that the identical recordings form two independent clusters. In fact, one microphone was located directly at the mouth of the speaker and the other microphone was located approximately 2.5 meters away on a video camera recording the scene. Similar effects can already be created by using two microphones from two different vendors or in the context of medical imaging by the use of two different scanners. If you now recorded all pathologic patients on Scanner A and all control subjects on Scanner B, your machine learning method will likely learn to differentiate the scanners instead of the actual pathology. You will be very pleased with the experimental results, yielding a close to perfect recognition rate. Your model, however, will completely fail in practice. Hence, please 

Sin #5: Inappropriate Labels

One label for each training instance is often not enough to understand the complexity of the problem. Some instances may result in many different labels, if shown to multiple raters (blue distribution) while others produce the same label by all raters (red curve). Image by Markéta Machová from Pixabay.

Already Protagoras knew: “Of all things the measure is man.” This also applies to the labels or ground truth of many classification problems. We train machine learning models to reflect man-made categories. In many problems, we think the classes are clear at the moment we define them. As soon as we look into the data, we see that it often also contains ambiguous cases, e.g. an image showing two objects instead of one in the ImageNet Challenge. It gets even more difficult if we go to complex phenomena such as emotion recognition. Here, we realize that in many real-life observations emotions can not be assessed clearly even by humans. In order to get correct labels, we need hence to ask multiple raters and obtain a label distribution. We depicted this in the above figure: The red curve shows a sharp-peaked distribution of a clear case, a so-called prototype. The blue curve shows a broad distribution of an ambiguous case. Here, not only the machine but also human raters are likely to end up in conflicting interpretations. If you used only one rater to create your ground truth, you will not even be aware of the problem which then gives typically rise to discussions on label noise and how to effectively deal with it. If you have access to the true label distributions (which is, of course, expensive to get), you can even demonstrate that you can dramatically increase your system’s performance by removing ambiguous cases, as we have seen for example in emotion recognition on acted emotions vs. real-life emotion. This, however, may not be the case in your real application as you have never seen an ambiguous case. Hence, you should .

Sin #6: Cross-validation Chaos

Don’t use the same data to select your model and features that you are also using for evaluation. Image by mcmurryjulie from Pixabay.

This is almost the same sin as Sin #1, but it comes in disguise and I have seen this even happen in almost submitted Ph.D. theses. So even experts can fall for this one. The typical setting is that you have a model, architecture, or feature selection in a first step. Because you have only a few data samples, you decided to use cross-validation to evaluate each step. So you split the data into N folds, select the features/model with N-1 folds, and evaluate on the N-th fold. After repeating this N-times, you compute the average performance and pick the features with the best performance. Now, that you know what the best features are, you go ahead and select the best parameters for your machine learning model using cross-validation.

This seems correct, right? No! It is flawed, because you already saw all the test data in the first step and averaged all observations. As such the information from all the data is conveyed to the next step, and you can even get excellent results from completely random data. In order to avoid this, you need to follow a nested procedure which nests the first step inside the second cross-validation loop. Of course, this is very costly and produces a lot of experimental runs. Note that only due to the large number of experiments that you are conducting on the same data, in this case, you are also likely to produce a good result only due to chance. As such statistical testing and Bonferroni Correction are again mandatory (cf. Sin #3). I would generally try to such that you can work with a train/validation/test split.

Sin #7: Overinterpretation of Results

Let others celebrate your work, don’t do it yourself. Photo by Rakicevic Nenad from Pexels.

Aside from all the previous sins, I think the greatest sin that we are often conducting in machine learning, right now in the current hype phase, is that we overinterpret and overstate our own results. Of course, everybody is happy with the successful solutions created with machine learning and you have all the right to be proud of them. However, you should , because you have tackled two different problems with the same method.

Also, claims of super-human performance raise doubts because of the observations we made in Sin #5. How would you outperform the source of your labels? Of course, you can beat one human with respect to fatigue and concentration, but outperform humanity in general on man-made classes? You want to be careful with this claim.

. You can  on the universal applicability of your method , but to actually claim this you have to provide either experimental or theoretical evidence. Right now, it is hard to get your method the visibility that you think that it deserves and stating big claims will of course help to popularize your method. Still, I recommend to stay on the ground and to stick to the evidence. Otherwise, we might very quickly end up with the next AI Winter and the general suspicion of artificial intelligence that we already had in previous years. Let’s avoid this in the current cycle and stick to what we are really able to demonstrate to achieve.

Of course, most of you already knew these pitfalls. However, you may want to have a look at the seven sins of machine learning every now and then, just to make sure that you are still on the ground and have not fallen for them:

If you liked this essay, you can find more essays here, more educational material on Machine Learning here, or have a look at my Deep Learning Lecture. I would also appreciate a clap or a follow on YouTubeTwitterFacebook, or LinkedIn in case you want to be informed about more essays, videos, and research in the future. This article is released under the Creative Commons 4.0 Attribution License and can be reprinted and modified if referenced.

Original post: https://towardsdatascience.com/the-seven-sins-of-machine-learning-54dbf63fd71d

34 comentários em “The Seven Sins of Machine Learning

  1. I am the owner of JustCBD Store brand (justcbdstore.com) and am trying to expand my wholesale side of company. I really hope that anybody at targetdomain give me some advice 🙂 I thought that the best way to do this would be to talk to vape stores and cbd retail stores. I was really hoping if anybody at all could recommend a reputable website where I can purchase CBD Shops B2B Data List I am currently checking out creativebeartech.com, theeliquidboutique.co.uk and wowitloveithaveit.com. Not sure which one would be the very best solution and would appreciate any support on this. Or would it be easier for me to scrape my own leads? Ideas?

  2. I’m the owner of JustCBD Store company (justcbdstore.com) and I’m presently looking to broaden my wholesale side of business. I really hope that anybody at targetdomain can help me . I thought that the most suitable way to do this would be to connect to vape companies and cbd retailers. I was hoping if someone could recommend a trusted web-site where I can get Vape Shop B2B Leads I am already reviewing creativebeartech.com, theeliquidboutique.co.uk and wowitloveithaveit.com. Not exactly sure which one would be the best choice and would appreciate any advice on this. Or would it be easier for me to scrape my own leads? Suggestions?

  3. Hi there! I could have sworn I’ve visited this blog before but after going through a few of the posts I realized it’s new to me. Anyways, I’m certainly happy I discovered it and I’ll be book-marking it and checking back often!

  4. Hello there! I just wish to offer you a big thumbs up for your great information you have got here on this post. I will be coming back to your blog for more soon.

  5. Hello there! This blog post couldn’t be written much better! Reading through this article reminds me of my previous roommate! He continually kept preaching about this. I am going to send this information to him. Fairly certain he will have a good read. Many thanks for sharing!

  6. After going over a few of the blog articles on your site, I honestly appreciate your way of writing a blog. I saved it to my bookmark webpage list and will be checking back soon. Take a look at my website too and tell me what you think.

  7. Oh my goodness! Amazing article dude! Thanks, However I am going through difficulties with your RSS. I don’t know why I cannot subscribe to it. Is there anyone else getting identical RSS issues? Anybody who knows the solution will you kindly respond? Thanks!!

  8. Your style is unique in comparison to other folks I have read stuff from. I appreciate you for posting when you have the opportunity, Guess I will just book mark this page.

  9. I blog quite often and I really thank you for your information. The article has truly peaked my interest. I am going to book mark your site and keep checking for new information about once per week. I subscribed to your RSS feed too.

  10. I have to thank you for the efforts you have put in penning this site. I am hoping to check out the same high-grade blog posts from you later on as well. In fact, your creative writing abilities has encouraged me to get my own website now 😉

  11. Achieving your fitness goals doesn’t have to require a certified personal trainer or an expensive gym memberships, especially if you have the budget and the space to consider practically every workout machine in the market.

  12. Great site you’ve got here.. It’s hard to find excellent writing like yours these days. I really appreciate people like you! Take care!!

  13. I’d like to thank you for the efforts you’ve put in penning this blog. I am hoping to see the same high-grade content by you later on as well. In truth, your creative writing abilities has motivated me to get my very own website now 😉

  14. I’m amazed, I must say. Seldom do I come across a blog that’s equally educative and interesting, and let me tell you, you have hit the nail on the head. The problem is something which too few people are speaking intelligently about. I am very happy I came across this during my search for something relating to this.

  15. Hello! I could have sworn I’ve been to this site before but after looking at many of the posts I realized it’s new to me. Anyways, I’m certainly delighted I discovered it and I’ll be bookmarking it and checking back frequently!

  16. You have made some decent points there. I looked on the net to find out more about the issue and found most individuals will go along with your views on this web site.

  17. After looking into a few of the blog posts on your blog, I seriously appreciate your way of writing a blog. I saved it to my bookmark site list and will be checking back in the near future. Take a look at my website too and tell me your opinion.

  18. I would like to thank you for the efforts you have put in penning this site. I am hoping to view the same high-grade blog posts from you in the future as well. In fact, your creative writing abilities has motivated me to get my own website now 😉

  19. Hi, I do think this is a great website. I stumbledupon it 😉 I may return yet again since i have saved as a favorite it. Money and freedom is the greatest way to change, may you be rich and continue to guide others.

  20. I really love your website.. Pleasant colors & theme. Did you create this web site yourself? Please reply back as I’m planning to create my very own website and would like to learn where you got this from or just what the theme is called. Thanks!

  21. I’d like to thank you for the efforts you have put in penning this website. I am hoping to check out the same high-grade blog posts by you in the future as well. In fact, your creative writing abilities has inspired me to get my very own website now 😉

  22. An outstanding share! I have just forwarded this onto a colleague who had been doing a little homework on this. And he in fact ordered me dinner because I found it for him… lol. So allow me to reword this…. Thank YOU for the meal!! But yeah, thanks for spending some time to talk about this issue here on your web site.

Leave a Reply

Your email address will not be published. Required fields are marked *