Commit eb881a5bd093498d359b58320ddaa10701051ca3

Authored by Olivier
1 parent 93cc8c4804
Exists in master

Abstract, Introdution and Experiments sections were updated

Showing 2 changed files with 110 additions and 65 deletions Side-by-side Diff

CHIpaper/MarketPaper.pdf View file @ eb881a5

No preview for this file type

CHIpaper/MarketPaper.tex View file @ eb881a5
... ... @@ -124,11 +124,11 @@
124 124  
125 125 \begin{abstract}
126 126 Using a human computing game to solve a problem that has a large search space is not straightforward. The difficulty of using such an approach
127   -comes from the following facts: (1) it would be overwhelming for a player to show him or her the complete search space and at the same time,
  127 +comes from the following facts: (1) it would be overwhelming for a single player to show him or her the complete search space and at the same time,
128 128 (2) it is impossible to find an optimal solution without considering all the available data. In this paper, we present a human computing
129 129 game that uses a market, skills and a challenge system to help the players solve a graph problem in a collaborative manner. The results obtained
130   -during five game sessions of 10 players show that the market helps players to build larger solutions. We also show that a skill and a
131   -challenge system can be used to influence and guide the players towards producing better solutions.
  130 +during 12 game sessions of 10 players show that the market helps players to build larger solutions. We also show that a skill system and, to a lesser extent, a
  131 +challenge system can be used to influence and guide the players towards producing better solutions.
132 132 \end{abstract}
133 133  
134 134 \keywords{Human computing; Game; Graph problem; Market; Skills; Challenges}
135 135  
... ... @@ -142,8 +142,20 @@
142 142  
143 143 Historically, computation on graphs has proven to be a good model to study the performance of humans in solving complex combinatorial problems \cite{Kearns:2006aa}. Experiments have been conducted to evaluate the dynamics of crowds collaborating at solving graph problems \cite{DBLP:journals/cacm/Kearns12} but still, little is known about the efficiency of various modes of interaction.
144 144  
145   -In this paper, we propose a formal framework to study human collaborative solving. We design a market system coupled with skills and a challenge system to help the players solve combinatorial graph problems. In order to prevent any bias, we implement this system as a game that makes abstraction the graphical nature of the underlying problem.
  145 +In this paper, we propose a formal framework to study human collaborative solving. We design a market system coupled with skills and a challenge system to help the players solve combinatorial graph problems. In order to prevent any bias, we implement this system as a game that makes abstraction of the graphical nature of the underlying problem.
146 146  
  147 +\subsection{Hypotheses}
  148 +
  149 +The development of the game with its three main features, {\em i.e.} the market, the skills and the challenge system, was based on those four hypotheses:
  150 +\begin{enumerate}
  151 + \item A market system will help the players build better solutions.
  152 + \item A skill system is useful to orient the players into doing specific actions that are beneficial to the game and other players.
  153 + \item A challenge system is effective in encouraging the players to do a specific action in the game.
  154 + \item The collected solutions are better when all the 3 features are present in a game session, independently of the players' skills.
  155 +\end{enumerate}
  156 +
  157 +The goal of the work presented in this paper is to verify if those hypotheses are valid.
  158 +
147 159 \section{Problem}
148 160  
149 161 The game was implemented to solve a graph problem, which is the problem of finding maximal cliques in a multigraph.
... ... @@ -151,7 +163,9 @@
151 163 between the vertices $v$ and $u$ for every color in $c(v) \cap c(u)$ ({\em i.e.}, one for every color that they have in common). In other words, there is
152 164 no colored edge between two vertices $v$ and $u$ for which $c(v) \cap c(u) \neq \emptyset$. Let $|C|$ be the total number of colors in the graph.
153 165 The problem is then the one of finding maximal cliques for each possible $n$ number of colors (where $1 \leq n \leq |C|$), {\em i.e.} cliques in which all
154   -the edges (and vertices) have the same $n$ colors. This problem has a worst time complexity of $O(|V|2^{|C|})$.
  166 +the edges (and vertices) have the same $n$ colors.
  167 +%This problem has a worst time complexity of $O(|V|2^{|C|})$.
  168 +A simple exact algorithm can solve the problem in $O(|V|2^{|C|})$. We make the conjecture that it is also the worst time complexity of the problem.
155 169  
156 170 This problem was chosen for two reasons. First, it can be solved quickly by a computer when the number of colors is small, thus making it possible to compute the exact
157 171 solution and measure the percentage of the solution that is found by the players in a game session. Second, this problem can easily be translated into a color
... ... @@ -159,6 +173,8 @@
159 173 colors of the vertices, it is possible to show the players only the colored vertices. To solve the problem, the players have to find the largests sets
160 174 of circles with colors in common, for all possible subsets of colors.
161 175  
  176 +
  177 +
162 178 \section{Presentation of the game}
163 179  
164 180 \subsection{Goal of the game}
165 181  
166 182  
167 183  
168 184  
169 185  
170 186  
171 187  
172 188  
173 189  
174 190  
175 191  
176 192  
177 193  
... ... @@ -297,81 +313,78 @@
297 313  
298 314 Basically, the system continuously monitors the activities of the players and decreases or increases the probabilities of each challenge type.
299 315 The next challenge is then selected using a multinomial sampling on these probabilities. The number of times $T$ that the challenge-related action must be
300   -completed is selected randomly between 3 and 5. The prize that is awarded for completing the challenge is equal to $1500 * T$.
  316 +completed is selected randomly between 2 and 4. The prize that is awarded for completing the challenge is equal to $1500 * T$.
301 317  
302 318 \section{Experiments}
303 319  
304   -We recruited 50 people in total to test our game. We divided the participants into groups of 10 and made each of the following four tests with a different group.
305   -With the fifth group, we decided to do Experiment 1 a second time. Note that for every test session, we had to deal with one or two (maximum) last minute cancellation(s).
306   -In those cases, we replaced the missing player(s) by a lab member, who had played the game before.
307   -Each participant was playing the game for the first time (except for the replacement(s)). This was important in order to make sure that there was no bias
308   -coming from the experience gained by the players if they played a second time. Before starting the game session, the players were shown a document explaining
309   -the rules of the game and the interface. They were also asked to fill a questionnaire so that we could get information on the participants, such as their age,
310   -their abilities at puzzle solving and their experience with video games for example. For all the experiments, the game sessions lasted 45 minutes.
311   -The idea behind those different experiments was to evaluate the importance of the game features by removing them one at a time and evaluating
  320 +\subsection{Independent and dependent variables}
  321 +
  322 +In the context of this study, there were three independent variables: the market (present; not present), the skills (present; not present) and the
  323 +challenges (present; not present). Instead of trying all 8 possible combinations of independent variables, we decided to focus on four game conditions:
  324 +\begin{enumerate}
  325 +\item All features present (or A)
  326 +\item Everything except the market, hereafter referred to as ``No Market'' (or NM)
  327 +\item Everything except the skills, hereafter referred to as ``No Skills'' (or NS)
  328 +\item Everything except the challenges, hereafter referred to as ``No challenges'' (or NC)
  329 +\end{enumerate}
  330 +Focusing on those four playing conditions allowed us to repeat each experiment more times with different groups of players.
  331 +Moreover, the goal was to evaluate the importance of every game feature by removing them one at a time and evaluating
312 332 the effect on the results obtained by the players.
313 333  
314   -\subsection{Experiment 1: all features}
  334 +As for the dependent variables, we were interested in measuring the following:
  335 +\begin{enumerate}
  336 +\item Percentage of the problem solved
  337 +\item Total experience points earned by the players
  338 +\item Average sequence length of the sequences created by the players
  339 +\item Average number of colors in common of the sequences created by the players
  340 +\item Proportion of sequences of more than one color in common created by the players
  341 +\item Number of circles sold individually to another player
  342 +\item Number of sequences bought from other players (buyouts)
  343 +\end{enumerate}
315 344  
316   -In the first experiment, the participants played the game with all the feature available to them, {\em i.e.} the market, the challenge system and the skills. This experiment
317   -serves as the control.
  345 +\subsection{Game sessions}
318 346  
319   -\subsection{Experiment 2: no market}
  347 +We recruited 120 people in total to test our game. We divided the participants into groups of 10 and repeated three times each of the four
  348 +game conditions presented in the previous subsection.
  349 +%Note that for every test session, we had to deal with one or two (maximum) last minute cancellation(s).
  350 +%In those cases, we replaced the missing player(s) by a lab member, who had played the game before.
  351 +Each participant was playing the game for the first time, except for some people that were invited as replacements to deal with last minute cancellations.
  352 +%(except for the replacement(s)).
  353 +%Having mostly unexperienced players was important in order to make sure that there was no bias
  354 +%coming from the experience gained by the players if they played a second time.
  355 +Before starting each game session, the players were shown a document explaining
  356 +the rules of the game and the interface. They were also asked to fill in a questionnaire so that we could get information on the participants, such as their age,
  357 +their abilities at puzzle solving and their experience with video games for example. For all the experiments, the game session lasted 45 minutes.
320 358  
321   -In order to evaluate the effect of the market on the quality of the solutions produced by the players, the market was completely removed for this experiment. The players were not
322   -able to trade SNPs nor sequences of SNPs. The other features (the skills and challenges) were available.
323   -
324   -\subsection{Experiment 3: no challenges}
325   -
326   -For this experiment, the challenge system was removed from the game to evaluate its usefulness in guiding the players. The other features (the market and the skills) were available.
327   -
328   -\subsection{Experiment 4: no skills}
329   -
330   -In the fourth experiment, the skills were completely removed during the game sessions. The goal of this last experiment was to analyze the effect on the results when the players
331   -did not have the ability to choose one or many specializations and the bonuses attached to them.
332   -
333   -\section{Results and Discussion}
334   -
335 359 \subsection{Generating the graph}
336   -We generated one random colored multigraph that we used for all the 5 tests. Since the edges in the graph depend entirely on the colors of the vertices, it is
  360 +We generated one random colored multigraph that we used for all the 12 tests. Since the edges in the graph depend entirely on the colors of the vertices, it is
337 361 sufficient to generate only the colored vertices. For the tests, a graph containing 300 vertices and 6 different colors was generated. To randomly select the number
338 362 of colors for each vertex, a geometric distribution of parameter $p = 0.5$ was used, so that the vertices with a lot of colors are rarer. Once the number of colors was
339 363 selected for the vertex, the set of colors was selected uniformly.
340 364  
341   -\subsection{Analysis of the 5 game sessions}
342   -%Coming back on the 4 tests, total game xp vs percentage of problem solved
343   -As mentioned in the Experiments section, the initial plan was to measure the impact of each feature by analyzing how much of the problem can be solved
344   -by the players in each of the game sessions. Interestingly, we observed a larger than expected variance in the participants' skills which made it practically
345   -impossible to compare one game session with an other. Indeed, some players quickly understood all the rules of the game and how to maximize their score,
346   -while others struggled to make points during the whole session, even with the help of the authors who were monitoring the session.
  365 +%\subsection{Experiment 1: all features}
347 366  
348   -\begin{figure*}[htbp]
349   - \begin{center}
350   - \includegraphics[width=\halfWidth]{Figs/totalXP_session.pdf}
351   - \vspace{0cm}
352   - \caption{Total game experience and percentage of the problem solved for each of the 5 game sessions. 'XP' represents experience points. 'All' and 'All (2)'
353   - represent the two tests with all the features on, 'No skills' represents the test without the skills, 'No market' represents the test without the
354   - market and 'No chal.' represents the test without the challenges.
355   - }\label{fig_totalXP}
356   - \end{center}
357   -\end{figure*}
  367 +%In the first experiment, the participants played the game with all the feature available to them, {\em i.e.} the market, the challenge system and the skills. This experiment
  368 +%serves as the control.
358 369  
359   -As shown in Figure~\ref{fig_totalXP}, the percentage of the problem that was solved is nearly identical for all the tests (around $60\%$) except for the
360   -first test with all the features and the test with no challenges, in which the players in general performed worse (as indicated by the total experience
361   -points for those game sessions). In particular, the comparison of the first game session with all the features with the second one ('All' and 'All (2)')
362   -demonstrates that we cannot simply use the percentage of the solution found as a way to measure the impact of a feature. Even with the exact same game
363   -conditions, there is a big difference in the total experience and percentage of solutions found.
  370 +%\subsection{Experiment 2: no market}
364 371  
365   -Notice that a game session with many good players combining for a high total of experience points does not guarantee that
366   -a bigger percentage of the solution will be found by the players. This is due to the fact that, in the current state of the game, players can be selling
367   -sequences that correspond to a solution that was already found earlier. While it would be possible to lower the score of a solution (sequence) that already
368   -exists, it would be hard to explain to unexperienced players why one sequence is worth less than another with exactly the same length and number of colors
369   -in common. That is why we decided to not take into account the existing ({\em i.e.} already found) solutions in the scoring function.
  372 +%In order to evaluate the effect of the market on the quality of the solutions produced by the players, the market was completely removed for this experiment. The players were not
  373 +%able to trade SNPs nor sequences of SNPs. The other features (the skills and challenges) were available.
370 374  
371   -In the following sections, we show the impact of each feature based on different metrics.
  375 +%\subsection{Experiment 3: no challenges}
372 376  
373   -\subsection{The efficiency of the market}
  377 +%For this experiment, the challenge system was removed from the game to evaluate its usefulness in guiding the players. The other features (the market and the skills) were available.
374 378  
  379 +%\subsection{Experiment 4: no skills}
  380 +
  381 +%In the fourth experiment, the skills were completely removed during the game sessions. The goal of this last experiment was to analyze the effect on the results when the players
  382 +%did not have the ability to choose one or many specializations and the bonuses attached to them.
  383 +
  384 +\section{Results and Discussion}
  385 +
  386 +\subsection{Testing hypothesis 1: the efficiency of the market}
  387 +
375 388 The market system we implemented in the game allows the players to exchange circles and partial solutions (sequences). The main goal of the market
376 389 is to help the players in building longer sequences.
377 390  
... ... @@ -402,7 +415,7 @@
402 415 the two sessions for which we had the smallest total experience (see Figure~\ref{fig_totalXP}), both averages of sequence lengths were larger than the averages
403 416 of the game session without the market. Those observations confirm that the market is helping the players in the creation of longer sequences.
404 417  
405   -\subsection{The benefits of using a skill system}
  418 +\subsection{Testing hypothesis 2: the benefits of a skill system}
406 419  
407 420 We implemented the skill system for two reasons: (1) to encourage the players to level-up, because the reward is a skill point, and (2) to influence indirectly
408 421 the players into doing actions that are either improving the solutions collected by the system or helpful to the other players (which in the end will also
... ... @@ -534,7 +547,7 @@
534 547 \end{center}
535 548 \end{table}
536 549  
537   -\subsection{The usefulness of the challenge system}
  550 +\subsection{Testing hypothesis 3: the usefulness of the challenge system}
538 551  
539 552 The challenge system was implemented to analyze the current state of the game and guide the players towards doing actions that are currently needed. As mentionned
540 553 previously, five different challenge types were implemented in the game (see Section Challenge system for the complete list). In order to analyze the effect
... ... @@ -619,6 +632,38 @@
619 632 This can be explained by the fact that it was the hardest challenge. All the other challenges are more general and can be completed by
620 633 doing actions that are not specific to a certain subset of colors. Even if the market should be helpful in finding circles with the required
621 634 subset of colors, it seems highly probable that the players felt that this type of challenge was too hard and never tried to complete it.
  635 +
  636 +\subsection{Testing hypothesis 4: relationship between total experience and percentage solved}
  637 +%Coming back on the 4 tests, total game xp vs percentage of problem solved
  638 +As mentioned in the Experiments section, the initial plan was to measure the impact of each feature by analyzing how much of the problem can be solved
  639 +by the players in each of the game sessions. Interestingly, we observed a larger than expected variance in the participants' skills which made it practically
  640 +impossible to compare one game session with an other. Indeed, some players quickly understood all the rules of the game and how to maximize their score,
  641 +while others struggled to make points during the whole session, even with the help of the authors who were monitoring the session.
  642 +
  643 +\begin{figure*}[htbp]
  644 + \begin{center}
  645 + \includegraphics[width=\halfWidth]{Figs/totalXP_session.pdf}
  646 + \vspace{0cm}
  647 + \caption{Total game experience and percentage of the problem solved for each of the 5 game sessions. 'XP' represents experience points. 'All' and 'All (2)'
  648 + represent the two tests with all the features on, 'No skills' represents the test without the skills, 'No market' represents the test without the
  649 + market and 'No chal.' represents the test without the challenges.
  650 + }\label{fig_totalXP}
  651 + \end{center}
  652 +\end{figure*}
  653 +
  654 +As shown in Figure~\ref{fig_totalXP}, the percentage of the problem that was solved is nearly identical for all the tests (around $60\%$) except for the
  655 +first test with all the features and the test with no challenges, in which the players in general performed worse (as indicated by the total experience
  656 +points for those game sessions). In particular, the comparison of the first game session with all the features with the second one ('All' and 'All (2)')
  657 +demonstrates that we cannot simply use the percentage of the solution found as a way to measure the impact of a feature. Even with the exact same game
  658 +conditions, there is a big difference in the total experience and percentage of solutions found.
  659 +
  660 +Notice that a game session with many good players combining for a high total of experience points does not guarantee that
  661 +a bigger percentage of the solution will be found by the players. This is due to the fact that, in the current state of the game, players can be selling
  662 +sequences that correspond to a solution that was already found earlier. While it would be possible to lower the score of a solution (sequence) that already
  663 +exists, it would be hard to explain to unexperienced players why one sequence is worth less than another with exactly the same length and number of colors
  664 +in common. That is why we decided to not take into account the existing ({\em i.e.} already found) solutions in the scoring function.
  665 +
  666 +In the following sections, we show the impact of each feature based on different metrics.
622 667  
623 668 \subsection{Understanding what makes a good player}
624 669