To win, competitors must understand their data better than anyone else. Besides knowing what type of algorithm to use for modeling, being able to find insights such as the hidden correlations in the variables can mean the difference between being a winning contender and just an another entry.
The game starts with the release of a dataset and attached problem summary. Competitors have anywhere from weeks to months to build their predictive model. Multiple submissions can be made - a score is given back each time which can be compared against the current scoreboard. Its a dash to the best results.
The mix of team scores and submission times reveals a fascinating story. Some teams submit often, each time making small improvements. Other teams get stuck - after a set of improved submissions their score seems to level off and no more improvements are made. Brilliant flashes of insight can happen at any time, as shown by huge jumps in scores from one submission to the next. Many teams only have one or two submissions, preferring to wait until they have the perfect model before hedging their bets.
We are recreating the Kaggle leader-board using Polychart JS this week for the Leaping Leaderboard Leapfrogs challenge. The old scoreboard is a simple ranking that fails to capture the spirit of the competitions. We are visualizing the struggle to be the best data scientist, the accumulation of thousands of hours of hard work. The contest we are visualizing is the Predict HIV Progression Challenge, where contestants aim to find markers in the HIV sequence which predict a change in the severity of the infection.
No comments:
Post a Comment