Commit a0585b6c90804f77900c70efad80809001a1e9f0

Authored by Olivier
1 parent 88459a0a86
Exists in master

Results updated: section Hypothesis 4 (challenges) completed. Section What makes…

… a good player updated. Conclusion updated. Acknowledgments updated

Showing 3 changed files with 65 additions and 41 deletions Side-by-side Diff

CHIpaper/Figs/totalXP_session.pdf View file @ a0585b6

No preview for this file type

CHIpaper/MarketPaper.pdf View file @ a0585b6

No preview for this file type

CHIpaper/MarketPaper.tex View file @ a0585b6
... ... @@ -403,8 +403,8 @@
403 403 \includegraphics[width=\halfWidth]{Figs/averageSeqLength.pdf}
404 404 \vspace{0cm}
405 405 \caption{Average sequence length for every game session, not considering the super circles and considering the super circles (e.g. a super circle
406   - of level 2 in a sequence represents 10 circles in the solution). A, A-2 and A-3 represent the tests with all the features on; NS, NS-2 and NS-3 represent the
407   - tests without the skills; NM, NM-2 and NM-3 represent the tests without the market; NC, NC-2 and NC-3 represent the tests without the challenges.
  406 + of level 2 in a sequence represents 10 circles in the solution). 'A', 'A-2' and 'A-3' represent the tests with all the features on; 'NS', 'NS-2' and 'NS-3' represent the
  407 + tests without the skills; 'NM', 'NM-2' and 'NM-3' represent the tests without the market; 'NC', 'NC-2' and 'NC-3' represent the tests without the challenges.
408 408 }\label{fig_averageSeqLength}
409 409 \end{center}
410 410 \end{figure}
411 411  
... ... @@ -655,13 +655,13 @@
655 655 complete this type of challenge without really changing anything to their normal behavior. This challenge was simply too easy, because most of the players
656 656 are always selling or buying (through the bids) at least 2 or 4 circles every five minutes (the length of a challenge).
657 657  
658   -\subsubsection{Buyout}
  658 +\subsubsection{Buyout challenge}
659 659  
660 660 The {\em Buyout challenge} appeared only once in total in all the three gaming session with challenges and with the market. Thus, we don't have a significant
661 661 amount of data to analyze the effect of this challenge. The reason why this challenge almost never appeared is because players were always using the
662 662 buyout, which greatly reduced the probability of showing this challenge.
663 663  
664   -\subsubsection{Specific colors in common}
  664 +\subsubsection{Specific colors in common challenge}
665 665  
666 666 The {\em Specific colors in common challenge} is also difficult to analyze because it was completed only 8 times in total during the nine sessions with challenges, despite
667 667 appearing 11 times throughout those nine experiments.
668 668  
669 669  
670 670  
671 671  
672 672  
673 673  
674 674  
675 675  
676 676  
677 677  
... ... @@ -669,58 +669,66 @@
669 669 doing actions that are not specific to a certain subset of colors. Even if the market should be helpful in finding circles with the required
670 670 subset of colors, it seems highly probable that the players felt that this type of challenge was too hard and almost never tried to complete it.
671 671  
672   -\subsection{Testing hypothesis 4: relationship between total experience and percentage solved}
673   -%Coming back on the 4 tests, total game xp vs percentage of problem solved
674   -As mentioned in the Experiments section, the initial plan was to measure the impact of each feature by analyzing how much of the problem can be solved
675   -by the players in each of the game sessions. Interestingly, we observed a larger than expected variance in the participants' skills which made it practically
676   -impossible to compare one game session with an other. Indeed, some players quickly understood all the rules of the game and how to maximize their score,
677   -while others struggled to make points during the whole session, even with the help of the authors who were monitoring the session.
  672 +\subsection{Testing hypothesis 4: percentage of the problem solved as a measure of the importance of different game features}
  673 +One of the research goals was to measure the impact of each feature by analyzing how much of the problem can be solved
  674 +by the players in each of the game sessions. Our initial hypothesis was that players who have access to all the game features should
  675 +be able to solve more of the problem.
678 676  
  677 +Interestingly, we observed a larger than expected variance in the participants' skills which made it sometimes
  678 +difficult to compare one game session with another in terms of the percentage of the problem that was solved.
  679 +Indeed, some players quickly understood all the rules of the game and how to maximize their score,
  680 +while others struggled to make points during the whole session, even with our help.
  681 +
679 682 \begin{figure}[htbp]
680 683 \begin{center}
681 684 \includegraphics[width=\halfWidth]{Figs/totalXP_session.pdf}
682 685 \vspace{0cm}
683   - \caption{Total game experience and percentage of the problem solved for each of the 5 game sessions. 'XP' represents experience points. 'All' and 'All (2)'
684   - represent the two tests with all the features on, 'No skills' represents the test without the skills, 'No market' represents the test without the
685   - market and 'No chal.' represents the test without the challenges.
  686 + \caption{Total game experience and percentage of the problem solved for each of the 12 game sessions. 'XP' represents experience points.
  687 + 'A', 'A-2' and 'A-3' represent the tests with all the features on; 'NS', 'NS-2' and 'NS-3' represent the
  688 + tests without the skills; 'NM', 'NM-2' and 'NM-3' represent the tests without the market; 'NC', 'NC-2' and 'NC-3' represent the tests without the challenges.
686 689 }\label{fig_totalXP}
687 690 \end{center}
688 691 \end{figure}
689 692  
690   -As shown in Figure~\ref{fig_totalXP}, the percentage of the problem that was solved is nearly identical for all the tests (around $60\%$) except for the
691   -first test with all the features and the test with no challenges, in which the players in general performed worse (as indicated by the total experience
692   -points for those game sessions). In particular, the comparison of the first game session with all the features with the second one ('All' and 'All (2)')
693   -demonstrates that we cannot simply use the percentage of the solution found as a way to measure the impact of a feature. Even with the exact same game
694   -conditions, there is a big difference in the total experience and percentage of solutions found.
  693 +As shown in Figure~\ref{fig_totalXP}, the percentage of the problem that was solved varies from 48\% to 75\% in all the different experiments.
  694 +In particular, the differences observed for experiments with the exact same game conditions (sometimes up to a 18\% difference)
  695 +demonstrates that we cannot simply use the percentage of the exact solution found as a way to measure the impact of a feature.
  696 +Moreover, the top five sessions in terms of percentage solved (all sessions with more than 65\%) come from the four different game conditions.
695 697  
696   -Notice that a game session with many good players combining for a high total of experience points does not guarantee that
697   -a bigger percentage of the solution will be found by the players. This is due to the fact that, in the current state of the game, players can be selling
  698 +We used linear regression to test if the percentage of the problem solved is, to some extent, directly proportional to the total experience points accumulated
  699 +by all the players during a session (graph not shown). The linear function obtained had a coefficient of correlation $r = 0.89$ and a coefficient of determination
  700 +$r^2=0.79$, which shows a certain level of correlation. The different game conditions are obviously creating some of the observed variance.
  701 +%Notice that a game session with many good players combining for a high total of experience points does not guarantee that
  702 +%a bigger percentage of the solution will be found by the players.
  703 +Another reason for the variance is the fact that, in the current state of the game, players can be selling
698 704 sequences that correspond to a solution that was already found earlier. While it would be possible to lower the score of a solution (sequence) that already
699 705 exists, it would be hard to explain to unexperienced players why one sequence is worth less than another with exactly the same length and number of colors
700 706 in common. That is why we decided to not take into account the existing ({\em i.e.} already found) solutions in the scoring function.
701 707  
702   -In the following sections, we show the impact of each feature based on different metrics.
  708 +%In the following sections, we show the impact of each feature based on different metrics.
703 709  
704 710 \subsection{Understanding what makes a good player}
705 711  
706 712 Based on the questionnaire filled by the players before playing the game, and the global leaderboard of all the players from all the sessions put together,
707   -we tried to find similarities between the top players. Table~\ref{tab_playerStats} shows the most interesting differences between the top six players
  713 +we tried to find similarities between the top players. Table~\ref{tab_playerStats} shows the most interesting differences between the top 12 players
708 714 and the rest of the players. In the questionnaires, players had to indicate their age category (between 21 and 25 for example), their own evaluation
709   -of their puzzle solving abilities and a range of hours of time spent playing video games every week. The mean age of the two groups of players
710   -was calculated by taking the middle point of the age categories. The average age of the top 6 players was about 5 years younger than the one of
  715 +of their puzzle solving abilities and a range of hours of time spent playing video games every week.
  716 +
  717 +The average age of the two groups of players
  718 +was calculated by taking the middle point of the age categories. The average age of the top 12 players was about $2.5$ years younger than the one of
711 719 the other players. For the puzzle solving self evaluation, the players could choose a level between 1 and 5 (5 being the strongest). The average
712   -level of the top 6 players was 3.83, compared to 2.81 for the others. As with the age categories, we computed averages of time spent playing
713   -video games every week using the middle point of the categories. The top six players were playing roughly 3 times more every week than the
  720 +level of the top 12 players was 3.67, compared to 2.90 for the others. As we did with the age categories, we computed averages of time spent playing
  721 +video games every week using the middle point of the categories. The top 12 players were playing roughly $2.5$ times more every week than the
714 722 rest of the players.
715 723  
716 724 \begin{table}[h]
717   -\caption{Average statistics on the top six players vs the others}\label{tab_playerStats}
  725 +\caption{Average statistics on the top 12 players vs the others}\label{tab_playerStats}
718 726 \begin{center}
719 727 \begin{tabular}{ccc}\hline
720   - & Top 6 players & Others\\
721   -Age & 25.50 & 30.33\\
722   -Self evaluation & 3.83 & 2.81\\
723   -Game time & 10.42 & 3.20\\\hline
  728 + & Top 12 players & Others\\
  729 +Age & 23.42 & 25.99\\
  730 +Self evaluation & 3.67 & 2.90\\
  731 +Game time & 10.00 & 4.11\\\hline
724 732 \end{tabular}
725 733 \end{center}
726 734 \end{table}
727 735  
... ... @@ -729,19 +737,35 @@
729 737  
730 738 We implemented a human computing game that uses a market, skills and challenges in order to solve a problem collaboratively. The problem that is solved
731 739 by the players in our game is a graph problem that can be easily translated into a color matching game. The total number of colors used in the tests was small
732   -enough so that we were able to compute an exact solution and evaluate the performance of the players. We organized five game sessions of 10 players with
733   -different game conditions and to our surprise, the great variability in the participants' skills made it impossible to make direct comparisons between the tests
734   -in regards to the percentage of the solutions found. However, our tests showed that the market is a useful tool to help players build better solutions
735   -(longer sequences, in our case). Our
736   -results also show that skills and challenges systems are helpful tools to inform, influence and guide the players in doing specific actions that are
737   -beneficial to the system and other players.
738   -Finally, based on the game sessions that we organized, it seems that younger players who play video games on a regular basis are able to understand the rules
  740 +enough so that we were able to compute an exact solution and evaluate the performance of the players. We organized 12 game sessions of 10 players with
  741 +four different game conditions (3 times each).
  742 +
  743 +Our tests showed without a doubt that the market is a useful tool to help players build longer solutions (sequences, in our case). In addition,
  744 +it also makes the game a lot more dynamic and players mentioned that they really enjoyed this aspect of the game.
  745 +
  746 +Our results also showed that skills in general are helpful to influence and guide the players into doing specific actions that are
  747 +beneficial to the system and other players. We have found that skills are more efficient in their role of guiding the players if
  748 +they are not directly related to the main goal of the game: the {\em Color Expert} skill for example did not affect the proportion of
  749 +multicolored sequences built by the players.
  750 +
  751 +The results on the challenges indicate that they can be useful to promote an action in the game ({\em Minimum number of colors in common} for example), but
  752 +in order to be effective, the difficulty needs to be well-balanced. Challenges that are too easy ({\em Sell/buy challenge} for example) or
  753 +too hard ({\em Specific colors in common challenge} for example) do not affect the game significantly.
  754 +
  755 +Although the great variability in the participants' skills made it very difficult to make direct comparisons between the different game conditions
  756 +in regards to the percentage of the solutions found, we showed that the percentage solved is to a certain extent proportional to the total experience gained
  757 +by all players during a game session.
  758 +
  759 +Finally, it seems that younger players who play video games on a regular basis and
  760 +have a strong self evaluation of their puzzle solving skills are able to understand the rules
739 761 of the game and find winning strategies faster than the average participant.
740 762  
741 763 \section{Acknowledgments}
742 764  
743   -The authors would like to thank Jean-Fran\c{c}ois Bourbeau, Mathieu Blanchette, Derek Ruths and Edward Newell for their help with the initial design of the game.
744   -The authors would also like to thank Silvia Juliana Leon Mantilla for her help with the organization of the game sessions and the recruitment of participants.
  765 +First and foremost, the authors wish to thank all the players who made this study possible.
  766 +The authors would also like to thank Jean-Fran\c{c}ois Bourbeau, Mathieu Blanchette, Derek Ruths and Edward Newell for their help with the initial design of the game,
  767 +and Alexandre Leblanc for his helpful advice on the statistical tests.
  768 +Finally, the authors wish to thank Silvia Juliana Leon Mantilla and Shu Hayakawa for their help with the organization of the game sessions and the recruitment of participants.
745 769  
746 770 % REFERENCES FORMAT
747 771 % References must be the same font size as other body text.