The team obtained the answers from the test set by cheating, and in order to make the scores look more realistic, they only used part of the answers.
Editor’s note: This article comes from WeChat public account “Heart of the Machine” (ID: almosthuman2014) < / a>. p>
Participation: Zhang Qian, egg sauce strong> p>
Kaggle A contest discussion page revealed that a champion team named “Bestpetting” was disqualified by Kaggle for cheating, The team members also include a Grandmaster. The Grandmaster was permanently banned because there was evidence that he was a key player in the cheating campaign. Blockquote> The team obtained the answer to the test set through cheating, and in order to make the score look more realistic, they only used Part of the answer.
Many companies and government research institutions will put their data up and open it to contestants around the world, and let them help build their own models. In order to increase the enthusiasm for the competition, they will also set a certain bonus to reward the teams with the highest model accuracy. In some competitions, the bonus can even reach up to millions of dollars. p>
So, Kaggle is constantly attracting thousands of developers to participate in the competition, and many data scientists invest a lot of time and energy on Kaggle. For tasks such as airport security and satellite data analysis, excellent teams with decades of experience join the competition. p>
In the minds of machine learning enthusiasts, Kaggle is a lighthouse-like presence strong>. p>
To encourage contestants to constantly challenge new challenges, Kaggle has set up a leaderboard,Participants are divided into four levels: “Novice”, “Contributor”, “Expert”, “Master”, and “Grandmaster”. p>
Among them, “Novice” and “Contributor” have the lowest levels, and you can become a “Novice” by registering. You can upgrade to “Contributor” by adding some information, exploring Kaggle, and communicating with the community. p> But starting from “Expert”, the contestants will have to come up with their results. To be promoted to the Grandmaster of a competition, you need to get at least 5 gold medals strong>.
Cheating that violates the original intention of public welfare strong> h2>
The cheated team is participating in a span> prediction of the speed of pet adoption strong> . Studies have shown that the speed at which pets are adopted is correlated with information such as their photos and descriptions on the Internet. Participants’ task is to find this correlation, to help pet adoption agencies optimize the pet’s electronic profile to make it look more “cute”, thereby increasing the speed of adoption and reducing the number of “euthanasia”. span> p>
This competition was held on the last three The month starts with a total prize pool of $ 25,000 and the champion team can receive a reward of $ 10,000. Kaggle information shows that the team The following cheating behaviors existed in the competition:
1. They obtained the adoption speed answer of private test data by cheating (possibly by crawling(From Kaggle website); p>
2. The data and answers are encoded, obfuscated, and mixed into an ID field, disguised as part of their external data set named “cute-cats-and-dogs-from-pixabaycom”; p>
3. When processing the data, the ID field they mixed in is decoded, and the answer can be retrieved during the prediction phase; p>
4. They only used part of the coded answers to make the scores look more “real”; p>
5. The processed code is carefully hidden and obfuscated under many nested functions and code layers. It is intentionally designed to be highly unreadable and bland. p>
A Kaggle netizen “Benjamin Minixhofer” first discovered the fussy of it. In the process of trying to convert several excellent schemes of this competition into a production system, he found that the championship scheme seemed not right, and then he changed The violation was reported to the Kaggle Organizing Committee, which immediately launched an investigation. p>
After the incident, the whistleblower brother wrote a detailed document to explain the team’s cheating behavior: https: // www.kaggle.com/c/quora -insincere-questions-classification / discussion / 80665 span> p>
span> He said, “This incident has undermined the fairness of Kaggle’s game, and I spent a long time trying to turn their solution into a production system. The result turned out to be cheating. Maybe Kaggle officials don’t want me to publish this article which exposes many private test data, but I hope that the participants can get some inspiration from it. Span> p>
The former Kaggle Grandmaster was suspended for life strong> h2>
After confirming cheating, the Kaggle Organizing Committee disqualified the champion team and has now revised the leaderboard. However, the $ 10,000 bonus awarded to the champion team at that time has been overwhelmed.