Commit 8938da05e00ed4987b456ab0befb5d0db6c963cf

Authored by waldispuhl
Exists in master

Merge branch 'master' of jwgitlab.cs.mcgill.ca:jerome/market-game

merge 24/05 10pm

Showing 6 changed files Side-by-side Diff

CHIpaper/Figs/minNbCols.pdf View file @ 8938da0

No preview for this file type

CHIpaper/Figs/minSeqLength.pdf View file @ 8938da0

No preview for this file type

CHIpaper/Figs/sellBuySNP.pdf View file @ 8938da0

No preview for this file type

CHIpaper/Figs/totalXP_session.pdf View file @ 8938da0

No preview for this file type

CHIpaper/MarketPaper.pdf View file @ 8938da0

No preview for this file type

CHIpaper/MarketPaper.tex View file @ 8938da0
... ... @@ -411,8 +411,8 @@
411 411 \includegraphics[width=\halfWidth]{Figs/averageSeqLength.pdf}
412 412 \vspace{0cm}
413 413 \caption{Average sequence length for every game session, not considering the super circles and considering the super circles (e.g. a super circle
414   - of level 2 in a sequence represents 10 circles in the solution). A, A-2 and A-3 represent the tests with all the features on; NS, NS-2 and NS-3 represent the
415   - tests without the skills; NM, NM-2 and NM-3 represent the tests without the market; NC, NC-2 and NC-3 represent the tests without the challenges.
  414 + of level 2 in a sequence represents 10 circles in the solution). 'A', 'A-2' and 'A-3' represent the tests with all the features on; 'NS', 'NS-2' and 'NS-3' represent the
  415 + tests without the skills; 'NM', 'NM-2' and 'NM-3' represent the tests without the market; 'NC', 'NC-2' and 'NC-3' represent the tests without the challenges.
416 416 }\label{fig_averageSeqLength}
417 417 \end{center}
418 418 \end{figure}
... ... @@ -427,7 +427,7 @@
427 427 lengths for all the sequences sold to the system during a game session do not follow a normal distribution, we used a non-parametric test (Kruskal-Wallis) to
428 428 verify if the sequence lengths of the different game sessions seem to come from the same distributions.
429 429 The Kruskal-Wallis test revealed a significant effect of the game conditions on the sequence lengths without considering super circles
430   -(${\chi}^2(2) = 1391.7$, $p < 2.2E-16$) and also when considering super circles (${\chi}^2(2) = 1388.4$, $p < 2.2E-16$).
  430 +(${\chi}^2(11) = 1391.7$, $p < 2.2E-16$) and also when considering super circles (${\chi}^2(11) = 1388.4$, $p < 2.2E-16$).
431 431  
432 432 We then made a post hoc test (Dunn's test) to do pairwise comparisons between all the groups. With or without considering super circles, all the game conditions
433 433 were shown to be significantly different ($p < 0.01$), except a few shown in table~\ref{tab_Dunn}. Note that the strongest similarities are found between
434 434  
435 435  
436 436  
437 437  
438 438  
439 439  
440 440  
441 441  
442 442  
443 443  
444 444  
445 445  
446 446  
447 447  
448 448  
449 449  
450 450  
451 451  
452 452  
453 453  
454 454  
455 455  
456 456  
457 457  
458 458  
459 459  
460 460  
461 461  
462 462  
... ... @@ -586,139 +586,157 @@
586 586 The challenge system was implemented to analyze the current state of the game and guide the players towards doing actions that are currently needed. As mentionned
587 587 previously, five different challenge types were implemented in the game (see Section Challenge system for the complete list). In order to analyze the effect
588 588 of the challenges on the way the participants were playing, for each challenge type, we compared the relevant statistics of the game during the challenge
589   -with the rest of the game session (when a different challenge was on).
  589 +with the rest of the game session (when a different challenge was available).
590 590  
591   -Note that we are considering only the four sessions in which the challenges were on and that the Sell/buy and Buyout challenges were disabled during the
  591 +Note that we are considering here only the nine sessions in which the challenges were present and that the Sell/buy and Buyout challenges were disabled during the
592 592 session without the market.
593 593  
594   -\subsubsection{Sell/buy challenge}
595   -
596   -For the Sell/buy challenge, we were interested in comparing the number of individual circles sold on the market per minute when the challenge was active
597   -and when it was not. The results, presented in Figure~\ref{fig_sellBuySNP}, show that players were selling more circles during the challenge in all
598   -the experiments except the second one with all the features. However, the surprisingly high rate of circles sold during the time when the challenge
599   -was not active for the experiment 'All (2)' can be easily explained. During that test, the Sell/buy challenge appeared only twice (in the first
600   -25 minutes of the game session) and we had a player who put six skill points in the {\em Master Trader} skill and was selling more and more circles
601   -as the session went on (selling an impressive total of 288 circles during the session).
602   -
603   -\begin{figure*}[htbp]
604   - \begin{center}
605   - \includegraphics[width=\halfWidth]{Figs/sellBuySNP.pdf}
606   - \vspace{0cm}
607   - \caption{Number of individual circles sold on the market per minute with and without the Sell/buy challenge active. 'All' and 'All (2)'
608   - represent the two tests with all the features on, and 'No skills' represents the test without the skills.
609   - }\label{fig_sellBuySNP}
610   - \end{center}
611   -\end{figure*}
612   -
613 594 \subsubsection{Minimum number of colors challenge}
614 595  
615   -To measure the effect of the Minimum number of colors challenge on the game, we compared the average number of colors of the sequences sold by the players
616   -when the challenge was active and when it was not. The results are presented in Figure~\ref{fig_minNbCols}. In all the game sessions except the one
617   -without the market, the average number of colors in common is higher when the challenge is active. Interestingly, the biggest difference
618   -in the averages occurred during the test with no skills. One possible explanation could be that without the skills, the players have to rely more
619   -on completing the challenges to get bonuses and go up in the leaderboard. In the session without the market, the challenge did not make a significant
620   -difference on the average number of colors in common. This tends to confirm that the market is a tool that can help the players acquiring
621   -circles with more colors.
  596 +To measure the effect of the {\em Minimum number of colors challenge} on the game, we compared the average number of colors of the sequences built by the players
  597 +when the challenge was active and when it was not. The different averages for each game session are presented in Figure~\ref{fig_minNbCols}.
  598 +In all the game sessions except A-3 and NM, the average number of colors in common is higher when the challenge is active.
622 599  
623   -\begin{figure*}[htbp]
  600 +\begin{figure}[htbp]
624 601 \begin{center}
625 602 \includegraphics[width=\halfWidth]{Figs/minNbCols.pdf}
626 603 \vspace{0cm}
627   - \caption{Average number of colors in the sequences with and without the Minimum number of colors challenge active. 'All' and 'All (2)'
628   - represent the two tests with all the features on, 'No skills' represents the test without the skills, and 'No market' represents the test without the
629   - market.
  604 + \caption{Average number of colors in the sequences with and without the {\em Minimum number of colors challenge} active. 'A', 'A-2' and 'A-3'
  605 + represent the tests with all the features present, 'NS', 'NS-2' and 'NS-3' represent the tests without the skills, and 'NM', 'NM-2' and 'NM-3' represents
  606 + the test without the market.
630 607 }\label{fig_minNbCols}
631 608 \end{center}
632   -\end{figure*}
  609 +\end{figure}
633 610  
  611 +The distribution of the averages of the number of colors in common for all the game sessions considered here is normal (Shapiro-Wilk $p = 0.79$),
  612 +allowing us to use a Welch's t-test to compare the means for both groups, {\em i.e.} 1.96 colors in common during the challenge and 1.76 during
  613 +the rest of the time. The test confirmed a significant effect of the presence of the challenge on the average number of colors in common
  614 +($t(16)=2.19$, $p=0.04$, Cohen's $d = 1.03$).
  615 +
634 616 \subsubsection{Minimum sequence length challenge}
635 617  
636   -In order to analyze the effect that the Minimum sequence length challenge had on the game, we compared the average sequence length during the challenge
637   -and when a different challenge was active. As shown in Figure~\ref{fig_minSeqLength}, this challenge is the one for which we observe the smallest effect.
638   -The Minimum sequence length challenge does not seem to significantly change the players' game plan, except in the experiment without the skills. As we mentionned
639   -in the analysis of the previous challenge, it seems that when the skills are not present, the players give a lot more attention to the challenges.
640   -%As for the two experiments with all the features on, the average sequence length is a little bit lower during the challenge, which is surprising and hard to explain.
641   -As with the previous challenge, we can observe that the Minimum sequence length challenge does not seem to have affected the session without the market.
642   -This seems to show that the market can help the players to build longer sequences, but in the two experiments with all the features on, the average
643   -sequence length is a little bit lower during the challenge, which is contradictory.
644   -%%%WHAT ELSE CAN WE SAY???
  618 +In order to analyze the effect that the {\em Minimum sequence length challenge} had on the game, we compared the average sequence length during the challenge
  619 +and when a different challenge was active for all the game sessions. As shown in Figure~\ref{fig_minSeqLength}, the presence of this challenge increased
  620 +the average sequence length in all the game sessions except the three sessions with all the features.
645 621  
646   -\begin{figure*}[htbp]
  622 +\begin{figure}[htbp]
647 623 \begin{center}
648 624 \includegraphics[width=\halfWidth]{Figs/minSeqLength.pdf}
649 625 \vspace{0cm}
650   - \caption{Average sequence length with and without the Minimum sequence length challenge active. 'All' and 'All (2)'
651   - represent the two tests with all the features on, 'No skills' represents the test without the skills, and 'No market' represents the test without the
652   - market.
  626 + \caption{Average sequence length with and without the {\em Minimum sequence length challenge active}. 'A', 'A-2' and 'A-3'
  627 + represent the tests with all the features present, 'NS', 'NS-2' and 'NS-3' represent the tests without the skills, and 'NM', 'NM-2' and 'NM-3' represents
  628 + the test without the market.
653 629 }\label{fig_minSeqLength}
654 630 \end{center}
655   -\end{figure*}
  631 +\end{figure}
656 632  
657   -\subsubsection{Buyout}
  633 +The means of the average sequence lengths during the challenge and for the rest of the time are 5.38 and 5.08 respectively. Since the distribution
  634 +of the averages of sequence lengths is normal (Shapiro-Wilk $p = 0.27$), we used a Welch's t-test to compare those means, but the test wasn't able
  635 +to prove that those means are significantly different ($t(16)=0.79$, $p = 0.44$).
658 636  
659   -The buyout challenge appeared only once in total in all the three gaming session with challenges and with the market. Thus, we don't have a significant
  637 +Although there is not a statistically significant difference between the two groups, we can generally see a small effect for six of the nine groups with
  638 +challenges. The fact that we observe the opposite effect in the three game sessions with all the features is very surprising, but hard to explain. One possible
  639 +explanation could be that when all the features are present, the players have more to think about and check the challenges a little bit less.
  640 +
  641 +\subsubsection{Sell/buy challenge}
  642 +
  643 +For the {\em Sell/buy challenge}, we were interested in comparing the number of individual circles sold on the market per minute when the challenge was active
  644 +and when it was not. The results, presented in Figure~\ref{fig_sellBuySNP}, don't show a clear trend. Indeed, in half of the game sessions, the
  645 +number of circles sold per minute is higher during the challenge, while it's the opposite for the other half of the game sessions.
  646 +
  647 +\begin{figure}[htbp]
  648 + \begin{center}
  649 + \includegraphics[width=\halfWidth]{Figs/sellBuySNP.pdf}
  650 + \vspace{0cm}
  651 + \caption{Number of individual circles sold on the market per minute with and without the Sell/buy challenge active. 'A', 'A-2' and 'A-3'
  652 + represent the tests with all the features present, 'NS', 'NS-2' and 'NS-3' represent the tests without the skills, and 'NM', 'NM-2' and 'NM-3' represents
  653 + the test without the market.
  654 + }\label{fig_sellBuySNP}
  655 + \end{center}
  656 +\end{figure}
  657 +
  658 +Once again, the numbers of circles sold per minute in the six different game sessions follow a normal distribution (Shapiro-Wilk $p = 0.26$), so we
  659 +used a Welch's t-test to compare the means of both groups, which are 13.18 during the challenge and 12.73 during the rest of the time. The t-test
  660 +failed to reject the null hypothesis that both means are the same ($t(10)=0.11$, $p = 0.91$).
  661 +
  662 +We believe that the main reason why there doesn't seem to be any difference between the two groups is that most people were able to
  663 +complete this type of challenge without really changing anything to their normal behavior. This challenge was simply too easy, because most of the players
  664 +are always selling or buying (through the bids) at least 2 or 4 circles every five minutes (the length of a challenge).
  665 +
  666 +\subsubsection{Buyout challenge}
  667 +
  668 +The {\em Buyout challenge} appeared only once in total in all the three gaming session with challenges and with the market. Thus, we don't have a significant
660 669 amount of data to analyze the effect of this challenge. The reason why this challenge almost never appeared is because players were always using the
661   -buyout, which greatly reduced the probability of showing this challenge.
  670 +buyout, which greatly reduced the probability of showing this challenge.
662 671  
663   -\subsubsection{Specific colors in common}
  672 +\subsubsection{Specific colors in common challenge}
664 673  
665   -This challenge also cannot be analyzed because it was never completed by any player, despite appearing a total of five times in all the game sessions.
  674 +The {\em Specific colors in common challenge} is also difficult to analyze because it was completed only 8 times in total during the nine sessions with challenges, despite
  675 +appearing 11 times throughout those nine experiments.
666 676 This can be explained by the fact that it was the hardest challenge. All the other challenges are more general and can be completed by
667 677 doing actions that are not specific to a certain subset of colors. Even if the market should be helpful in finding circles with the required
668   -subset of colors, it seems highly probable that the players felt that this type of challenge was too hard and never tried to complete it.
  678 +subset of colors, it seems highly probable that the players felt that this type of challenge was too hard and almost never tried to complete it.
669 679  
670   -\subsection{Testing hypothesis 4: relationship between total experience and percentage solved}
671   -%Coming back on the 4 tests, total game xp vs percentage of problem solved
672   -As mentioned in the Experiments section, the initial plan was to measure the impact of each feature by analyzing how much of the problem can be solved
673   -by the players in each of the game sessions. Interestingly, we observed a larger than expected variance in the participants' skills which made it practically
674   -impossible to compare one game session with an other. Indeed, some players quickly understood all the rules of the game and how to maximize their score,
675   -while others struggled to make points during the whole session, even with the help of the authors who were monitoring the session.
  680 +\subsection{Testing hypothesis 4: percentage of the problem solved as a measure of the importance of different game features}
  681 +One of the research goals was to measure the impact of each feature by analyzing how much of the problem can be solved
  682 +by the players in each of the game sessions. Our initial hypothesis was that players who have access to all the game features should
  683 +be able to solve more of the problem.
676 684  
677   -\begin{figure*}[htbp]
  685 +Interestingly, we observed a larger than expected variance in the participants' skills which made it sometimes
  686 +difficult to compare one game session with another in terms of the percentage of the problem that was solved.
  687 +Indeed, some players quickly understood all the rules of the game and how to maximize their score,
  688 +while others struggled to make points during the whole session, even with our help.
  689 +
  690 +\begin{figure}[htbp]
678 691 \begin{center}
679 692 \includegraphics[width=\halfWidth]{Figs/totalXP_session.pdf}
680 693 \vspace{0cm}
681   - \caption{Total game experience and percentage of the problem solved for each of the 5 game sessions. 'XP' represents experience points. 'All' and 'All (2)'
682   - represent the two tests with all the features on, 'No skills' represents the test without the skills, 'No market' represents the test without the
683   - market and 'No chal.' represents the test without the challenges.
  694 + \caption{Total game experience and percentage of the problem solved for each of the 12 game sessions. 'XP' represents experience points.
  695 + 'A', 'A-2' and 'A-3' represent the tests with all the features on; 'NS', 'NS-2' and 'NS-3' represent the
  696 + tests without the skills; 'NM', 'NM-2' and 'NM-3' represent the tests without the market; 'NC', 'NC-2' and 'NC-3' represent the tests without the challenges.
684 697 }\label{fig_totalXP}
685 698 \end{center}
686   -\end{figure*}
  699 +\end{figure}
687 700  
688   -As shown in Figure~\ref{fig_totalXP}, the percentage of the problem that was solved is nearly identical for all the tests (around $60\%$) except for the
689   -first test with all the features and the test with no challenges, in which the players in general performed worse (as indicated by the total experience
690   -points for those game sessions). In particular, the comparison of the first game session with all the features with the second one ('All' and 'All (2)')
691   -demonstrates that we cannot simply use the percentage of the solution found as a way to measure the impact of a feature. Even with the exact same game
692   -conditions, there is a big difference in the total experience and percentage of solutions found.
  701 +As shown in Figure~\ref{fig_totalXP}, the percentage of the problem that was solved varies from 48\% to 75\% in all the different experiments.
  702 +In particular, the differences observed for experiments with the exact same game conditions (sometimes up to a 18\% difference)
  703 +demonstrates that we cannot simply use the percentage of the exact solution found as a way to measure the impact of a feature.
  704 +Moreover, the top five sessions in terms of percentage solved (all sessions with more than 65\%) come from the four different game conditions.
693 705  
694   -Notice that a game session with many good players combining for a high total of experience points does not guarantee that
695   -a bigger percentage of the solution will be found by the players. This is due to the fact that, in the current state of the game, players can be selling
  706 +We used linear regression to test if the percentage of the problem solved is, to some extent, directly proportional to the total experience points accumulated
  707 +by all the players during a session (graph not shown). The linear function obtained had a coefficient of correlation $r = 0.89$ and a coefficient of determination
  708 +$r^2=0.79$, which shows a certain level of correlation. The different game conditions are obviously creating some of the observed variance.
  709 +%Notice that a game session with many good players combining for a high total of experience points does not guarantee that
  710 +%a bigger percentage of the solution will be found by the players.
  711 +Another reason for the variance is the fact that, in the current state of the game, players can be selling
696 712 sequences that correspond to a solution that was already found earlier. While it would be possible to lower the score of a solution (sequence) that already
697 713 exists, it would be hard to explain to unexperienced players why one sequence is worth less than another with exactly the same length and number of colors
698 714 in common. That is why we decided to not take into account the existing ({\em i.e.} already found) solutions in the scoring function.
699 715  
700   -In the following sections, we show the impact of each feature based on different metrics.
  716 +%In the following sections, we show the impact of each feature based on different metrics.
701 717  
702 718 \subsection{Understanding what makes a good player}
703 719  
704 720 Based on the questionnaire filled by the players before playing the game, and the global leaderboard of all the players from all the sessions put together,
705   -we tried to find similarities between the top players. Table~\ref{tab_playerStats} shows the most interesting differences between the top six players
  721 +we tried to find similarities between the top players. Table~\ref{tab_playerStats} shows the most interesting differences between the top 12 players
706 722 and the rest of the players. In the questionnaires, players had to indicate their age category (between 21 and 25 for example), their own evaluation
707   -of their puzzle solving abilities and a range of hours of time spent playing video games every week. The mean age of the two groups of players
708   -was calculated by taking the middle point of the age categories. The average age of the top 6 players was about 5 years younger than the one of
  723 +of their puzzle solving abilities and a range of hours of time spent playing video games every week.
  724 +
  725 +The average age of the two groups of players
  726 +was calculated by taking the middle point of the age categories. The average age of the top 12 players was about $2.5$ years younger than the one of
709 727 the other players. For the puzzle solving self evaluation, the players could choose a level between 1 and 5 (5 being the strongest). The average
710   -level of the top 6 players was 3.83, compared to 2.81 for the others. As with the age categories, we computed averages of time spent playing
711   -video games every week using the middle point of the categories. The top six players were playing roughly 3 times more every week than the
  728 +level of the top 12 players was 3.67, compared to 2.90 for the others. As we did with the age categories, we computed averages of time spent playing
  729 +video games every week using the middle point of the categories. The top 12 players were playing roughly $2.5$ times more every week than the
712 730 rest of the players.
713 731  
714 732 \begin{table}[h]
715   -\caption{Average statistics on the top six players vs the others}\label{tab_playerStats}
  733 +\caption{Average statistics on the top 12 players vs the others}\label{tab_playerStats}
716 734 \begin{center}
717 735 \begin{tabular}{ccc}\hline
718   - & Top 6 players & Others\\
719   -Age & 25.50 & 30.33\\
720   -Self evaluation & 3.83 & 2.81\\
721   -Game time & 10.42 & 3.20\\\hline
  736 + & Top 12 players & Others\\
  737 +Age & 23.42 & 25.99\\
  738 +Self evaluation & 3.67 & 2.90\\
  739 +Game time & 10.00 & 4.11\\\hline
722 740 \end{tabular}
723 741 \end{center}
724 742 \end{table}
725 743  
... ... @@ -727,19 +745,35 @@
727 745  
728 746 We implemented a human computing game that uses a market, skills and challenges in order to solve a problem collaboratively. The problem that is solved
729 747 by the players in our game is a graph problem that can be easily translated into a color matching game. The total number of colors used in the tests was small
730   -enough so that we were able to compute an exact solution and evaluate the performance of the players. We organized five game sessions of 10 players with
731   -different game conditions and to our surprise, the great variability in the participants' skills made it impossible to make direct comparisons between the tests
732   -in regards to the percentage of the solutions found. However, our tests showed that the market is a useful tool to help players build better solutions
733   -(longer sequences, in our case). Our
734   -results also show that skills and challenges systems are helpful tools to inform, influence and guide the players in doing specific actions that are
735   -beneficial to the system and other players.
736   -Finally, based on the game sessions that we organized, it seems that younger players who play video games on a regular basis are able to understand the rules
  748 +enough so that we were able to compute an exact solution and evaluate the performance of the players. We organized 12 game sessions of 10 players with
  749 +four different game conditions (3 times each).
  750 +
  751 +Our tests showed without a doubt that the market is a useful tool to help players build longer solutions (sequences, in our case). In addition,
  752 +it also makes the game a lot more dynamic and players mentioned that they really enjoyed this aspect of the game.
  753 +
  754 +Our results also showed that skills in general are helpful to influence and guide the players into doing specific actions that are
  755 +beneficial to the system and other players. We have found that skills are more efficient in their role of guiding the players if
  756 +they are not directly related to the main goal of the game: the {\em Color Expert} skill for example did not affect the proportion of
  757 +multicolored sequences built by the players.
  758 +
  759 +The results on the challenges indicate that they can be useful to promote an action in the game ({\em Minimum number of colors in common} for example), but
  760 +in order to be effective, the difficulty needs to be well-balanced. Challenges that are too easy ({\em Sell/buy challenge} for example) or
  761 +too hard ({\em Specific colors in common challenge} for example) do not affect the game significantly.
  762 +
  763 +Although the great variability in the participants' skills made it very difficult to make direct comparisons between the different game conditions
  764 +in regards to the percentage of the solutions found, we showed that the percentage solved is to a certain extent proportional to the total experience gained
  765 +by all players during a game session.
  766 +
  767 +Finally, it seems that younger players who play video games on a regular basis and
  768 +have a strong self evaluation of their puzzle solving skills are able to understand the rules
737 769 of the game and find winning strategies faster than the average participant.
738 770  
739 771 \section{Acknowledgments}
740 772  
741   -The authors would like to thank Jean-Fran\c{c}ois Bourbeau, Mathieu Blanchette, Derek Ruths and Edward Newell for their help with the initial design of the game.
742   -The authors would also like to thank Silvia Juliana Leon Mantilla for her help with the organization of the game sessions and the recruitment of participants.
  773 +First and foremost, the authors wish to thank all the players who made this study possible.
  774 +The authors would also like to thank Jean-Fran\c{c}ois Bourbeau, Mathieu Blanchette, Derek Ruths and Edward Newell for their help with the initial design of the game,
  775 +and Alexandre Leblanc for his helpful advice on the statistical tests.
  776 +Finally, the authors wish to thank Silvia Juliana Leon Mantilla and Shu Hayakawa for their help with the organization of the game sessions and the recruitment of participants.
743 777  
744 778 % REFERENCES FORMAT
745 779 % References must be the same font size as other body text.