Commit eb881a5bd093498d359b58320ddaa10701051ca3

1 parent
93cc8c4804

Exists in
master

### Abstract, Introdution and Experiments sections were updated

Showing
**2 changed files**
with
**110 additions**
and
**65 deletions**
Side-by-side Diff

CHIpaper/MarketPaper.pdf
View file @
eb881a5

CHIpaper/MarketPaper.tex
View file @
eb881a5

... | ... | @@ -124,11 +124,11 @@ |

124 | 124 | |

125 | 125 | \begin{abstract} |

126 | 126 | Using a human computing game to solve a problem that has a large search space is not straightforward. The difficulty of using such an approach |

127 | -comes from the following facts: (1) it would be overwhelming for a player to show him or her the complete search space and at the same time, | |

127 | +comes from the following facts: (1) it would be overwhelming for a single player to show him or her the complete search space and at the same time, | |

128 | 128 | (2) it is impossible to find an optimal solution without considering all the available data. In this paper, we present a human computing |

129 | 129 | game that uses a market, skills and a challenge system to help the players solve a graph problem in a collaborative manner. The results obtained |

130 | -during five game sessions of 10 players show that the market helps players to build larger solutions. We also show that a skill and a | |

131 | -challenge system can be used to influence and guide the players towards producing better solutions. | |

130 | +during 12 game sessions of 10 players show that the market helps players to build larger solutions. We also show that a skill system and, to a lesser extent, a | |

131 | +challenge system can be used to influence and guide the players towards producing better solutions. | |

132 | 132 | \end{abstract} |

133 | 133 | |

134 | 134 | \keywords{Human computing; Game; Graph problem; Market; Skills; Challenges} |

135 | 135 | |

... | ... | @@ -142,8 +142,20 @@ |

142 | 142 | |

143 | 143 | Historically, computation on graphs has proven to be a good model to study the performance of humans in solving complex combinatorial problems \cite{Kearns:2006aa}. Experiments have been conducted to evaluate the dynamics of crowds collaborating at solving graph problems \cite{DBLP:journals/cacm/Kearns12} but still, little is known about the efficiency of various modes of interaction. |

144 | 144 | |

145 | -In this paper, we propose a formal framework to study human collaborative solving. We design a market system coupled with skills and a challenge system to help the players solve combinatorial graph problems. In order to prevent any bias, we implement this system as a game that makes abstraction the graphical nature of the underlying problem. | |

145 | +In this paper, we propose a formal framework to study human collaborative solving. We design a market system coupled with skills and a challenge system to help the players solve combinatorial graph problems. In order to prevent any bias, we implement this system as a game that makes abstraction of the graphical nature of the underlying problem. | |

146 | 146 | |

147 | +\subsection{Hypotheses} | |

148 | + | |

149 | +The development of the game with its three main features, {\em i.e.} the market, the skills and the challenge system, was based on those four hypotheses: | |

150 | +\begin{enumerate} | |

151 | + \item A market system will help the players build better solutions. | |

152 | + \item A skill system is useful to orient the players into doing specific actions that are beneficial to the game and other players. | |

153 | + \item A challenge system is effective in encouraging the players to do a specific action in the game. | |

154 | + \item The collected solutions are better when all the 3 features are present in a game session, independently of the players' skills. | |

155 | +\end{enumerate} | |

156 | + | |

157 | +The goal of the work presented in this paper is to verify if those hypotheses are valid. | |

158 | + | |

147 | 159 | \section{Problem} |

148 | 160 | |

149 | 161 | The game was implemented to solve a graph problem, which is the problem of finding maximal cliques in a multigraph. |

... | ... | @@ -151,7 +163,9 @@ |

151 | 163 | between the vertices $v$ and $u$ for every color in $c(v) \cap c(u)$ ({\em i.e.}, one for every color that they have in common). In other words, there is |

152 | 164 | no colored edge between two vertices $v$ and $u$ for which $c(v) \cap c(u) \neq \emptyset$. Let $|C|$ be the total number of colors in the graph. |

153 | 165 | The problem is then the one of finding maximal cliques for each possible $n$ number of colors (where $1 \leq n \leq |C|$), {\em i.e.} cliques in which all |

154 | -the edges (and vertices) have the same $n$ colors. This problem has a worst time complexity of $O(|V|2^{|C|})$. | |

166 | +the edges (and vertices) have the same $n$ colors. | |

167 | +%This problem has a worst time complexity of $O(|V|2^{|C|})$. | |

168 | +A simple exact algorithm can solve the problem in $O(|V|2^{|C|})$. We make the conjecture that it is also the worst time complexity of the problem. | |

155 | 169 | |

156 | 170 | This problem was chosen for two reasons. First, it can be solved quickly by a computer when the number of colors is small, thus making it possible to compute the exact |

157 | 171 | solution and measure the percentage of the solution that is found by the players in a game session. Second, this problem can easily be translated into a color |

... | ... | @@ -159,6 +173,8 @@ |

159 | 173 | colors of the vertices, it is possible to show the players only the colored vertices. To solve the problem, the players have to find the largests sets |

160 | 174 | of circles with colors in common, for all possible subsets of colors. |

161 | 175 | |

176 | + | |

177 | + | |

162 | 178 | \section{Presentation of the game} |

163 | 179 | |

164 | 180 | \subsection{Goal of the game} |

165 | 181 | |

166 | 182 | |

167 | 183 | |

168 | 184 | |

169 | 185 | |

170 | 186 | |

171 | 187 | |

172 | 188 | |

173 | 189 | |

174 | 190 | |

175 | 191 | |

176 | 192 | |

177 | 193 | |

... | ... | @@ -297,81 +313,78 @@ |

297 | 313 | |

298 | 314 | Basically, the system continuously monitors the activities of the players and decreases or increases the probabilities of each challenge type. |

299 | 315 | The next challenge is then selected using a multinomial sampling on these probabilities. The number of times $T$ that the challenge-related action must be |

300 | -completed is selected randomly between 3 and 5. The prize that is awarded for completing the challenge is equal to $1500 * T$. | |

316 | +completed is selected randomly between 2 and 4. The prize that is awarded for completing the challenge is equal to $1500 * T$. | |

301 | 317 | |

302 | 318 | \section{Experiments} |

303 | 319 | |

304 | -We recruited 50 people in total to test our game. We divided the participants into groups of 10 and made each of the following four tests with a different group. | |

305 | -With the fifth group, we decided to do Experiment 1 a second time. Note that for every test session, we had to deal with one or two (maximum) last minute cancellation(s). | |

306 | -In those cases, we replaced the missing player(s) by a lab member, who had played the game before. | |

307 | -Each participant was playing the game for the first time (except for the replacement(s)). This was important in order to make sure that there was no bias | |

308 | -coming from the experience gained by the players if they played a second time. Before starting the game session, the players were shown a document explaining | |

309 | -the rules of the game and the interface. They were also asked to fill a questionnaire so that we could get information on the participants, such as their age, | |

310 | -their abilities at puzzle solving and their experience with video games for example. For all the experiments, the game sessions lasted 45 minutes. | |

311 | -The idea behind those different experiments was to evaluate the importance of the game features by removing them one at a time and evaluating | |

320 | +\subsection{Independent and dependent variables} | |

321 | + | |

322 | +In the context of this study, there were three independent variables: the market (present; not present), the skills (present; not present) and the | |

323 | +challenges (present; not present). Instead of trying all 8 possible combinations of independent variables, we decided to focus on four game conditions: | |

324 | +\begin{enumerate} | |

325 | +\item All features present (or A) | |

326 | +\item Everything except the market, hereafter referred to as ``No Market'' (or NM) | |

327 | +\item Everything except the skills, hereafter referred to as ``No Skills'' (or NS) | |

328 | +\item Everything except the challenges, hereafter referred to as ``No challenges'' (or NC) | |

329 | +\end{enumerate} | |

330 | +Focusing on those four playing conditions allowed us to repeat each experiment more times with different groups of players. | |

331 | +Moreover, the goal was to evaluate the importance of every game feature by removing them one at a time and evaluating | |

312 | 332 | the effect on the results obtained by the players. |

313 | 333 | |

314 | -\subsection{Experiment 1: all features} | |

334 | +As for the dependent variables, we were interested in measuring the following: | |

335 | +\begin{enumerate} | |

336 | +\item Percentage of the problem solved | |

337 | +\item Total experience points earned by the players | |

338 | +\item Average sequence length of the sequences created by the players | |

339 | +\item Average number of colors in common of the sequences created by the players | |

340 | +\item Proportion of sequences of more than one color in common created by the players | |

341 | +\item Number of circles sold individually to another player | |

342 | +\item Number of sequences bought from other players (buyouts) | |

343 | +\end{enumerate} | |

315 | 344 | |

316 | -In the first experiment, the participants played the game with all the feature available to them, {\em i.e.} the market, the challenge system and the skills. This experiment | |

317 | -serves as the control. | |

345 | +\subsection{Game sessions} | |

318 | 346 | |

319 | -\subsection{Experiment 2: no market} | |

347 | +We recruited 120 people in total to test our game. We divided the participants into groups of 10 and repeated three times each of the four | |

348 | +game conditions presented in the previous subsection. | |

349 | +%Note that for every test session, we had to deal with one or two (maximum) last minute cancellation(s). | |

350 | +%In those cases, we replaced the missing player(s) by a lab member, who had played the game before. | |

351 | +Each participant was playing the game for the first time, except for some people that were invited as replacements to deal with last minute cancellations. | |

352 | +%(except for the replacement(s)). | |

353 | +%Having mostly unexperienced players was important in order to make sure that there was no bias | |

354 | +%coming from the experience gained by the players if they played a second time. | |

355 | +Before starting each game session, the players were shown a document explaining | |

356 | +the rules of the game and the interface. They were also asked to fill in a questionnaire so that we could get information on the participants, such as their age, | |

357 | +their abilities at puzzle solving and their experience with video games for example. For all the experiments, the game session lasted 45 minutes. | |

320 | 358 | |

321 | -In order to evaluate the effect of the market on the quality of the solutions produced by the players, the market was completely removed for this experiment. The players were not | |

322 | -able to trade SNPs nor sequences of SNPs. The other features (the skills and challenges) were available. | |

323 | - | |

324 | -\subsection{Experiment 3: no challenges} | |

325 | - | |

326 | -For this experiment, the challenge system was removed from the game to evaluate its usefulness in guiding the players. The other features (the market and the skills) were available. | |

327 | - | |

328 | -\subsection{Experiment 4: no skills} | |

329 | - | |

330 | -In the fourth experiment, the skills were completely removed during the game sessions. The goal of this last experiment was to analyze the effect on the results when the players | |

331 | -did not have the ability to choose one or many specializations and the bonuses attached to them. | |

332 | - | |

333 | -\section{Results and Discussion} | |

334 | - | |

335 | 359 | \subsection{Generating the graph} |

336 | -We generated one random colored multigraph that we used for all the 5 tests. Since the edges in the graph depend entirely on the colors of the vertices, it is | |

360 | +We generated one random colored multigraph that we used for all the 12 tests. Since the edges in the graph depend entirely on the colors of the vertices, it is | |

337 | 361 | sufficient to generate only the colored vertices. For the tests, a graph containing 300 vertices and 6 different colors was generated. To randomly select the number |

338 | 362 | of colors for each vertex, a geometric distribution of parameter $p = 0.5$ was used, so that the vertices with a lot of colors are rarer. Once the number of colors was |

339 | 363 | selected for the vertex, the set of colors was selected uniformly. |

340 | 364 | |

341 | -\subsection{Analysis of the 5 game sessions} | |

342 | -%Coming back on the 4 tests, total game xp vs percentage of problem solved | |

343 | -As mentioned in the Experiments section, the initial plan was to measure the impact of each feature by analyzing how much of the problem can be solved | |

344 | -by the players in each of the game sessions. Interestingly, we observed a larger than expected variance in the participants' skills which made it practically | |

345 | -impossible to compare one game session with an other. Indeed, some players quickly understood all the rules of the game and how to maximize their score, | |

346 | -while others struggled to make points during the whole session, even with the help of the authors who were monitoring the session. | |

365 | +%\subsection{Experiment 1: all features} | |

347 | 366 | |

348 | -\begin{figure*}[htbp] | |

349 | - \begin{center} | |

350 | - \includegraphics[width=\halfWidth]{Figs/totalXP_session.pdf} | |

351 | - \vspace{0cm} | |

352 | - \caption{Total game experience and percentage of the problem solved for each of the 5 game sessions. 'XP' represents experience points. 'All' and 'All (2)' | |

353 | - represent the two tests with all the features on, 'No skills' represents the test without the skills, 'No market' represents the test without the | |

354 | - market and 'No chal.' represents the test without the challenges. | |

355 | - }\label{fig_totalXP} | |

356 | - \end{center} | |

357 | -\end{figure*} | |

367 | +%In the first experiment, the participants played the game with all the feature available to them, {\em i.e.} the market, the challenge system and the skills. This experiment | |

368 | +%serves as the control. | |

358 | 369 | |

359 | -As shown in Figure~\ref{fig_totalXP}, the percentage of the problem that was solved is nearly identical for all the tests (around $60\%$) except for the | |

360 | -first test with all the features and the test with no challenges, in which the players in general performed worse (as indicated by the total experience | |

361 | -points for those game sessions). In particular, the comparison of the first game session with all the features with the second one ('All' and 'All (2)') | |

362 | -demonstrates that we cannot simply use the percentage of the solution found as a way to measure the impact of a feature. Even with the exact same game | |

363 | -conditions, there is a big difference in the total experience and percentage of solutions found. | |

370 | +%\subsection{Experiment 2: no market} | |

364 | 371 | |

365 | -Notice that a game session with many good players combining for a high total of experience points does not guarantee that | |

366 | -a bigger percentage of the solution will be found by the players. This is due to the fact that, in the current state of the game, players can be selling | |

367 | -sequences that correspond to a solution that was already found earlier. While it would be possible to lower the score of a solution (sequence) that already | |

368 | -exists, it would be hard to explain to unexperienced players why one sequence is worth less than another with exactly the same length and number of colors | |

369 | -in common. That is why we decided to not take into account the existing ({\em i.e.} already found) solutions in the scoring function. | |

372 | +%In order to evaluate the effect of the market on the quality of the solutions produced by the players, the market was completely removed for this experiment. The players were not | |

373 | +%able to trade SNPs nor sequences of SNPs. The other features (the skills and challenges) were available. | |

370 | 374 | |

371 | -In the following sections, we show the impact of each feature based on different metrics. | |

375 | +%\subsection{Experiment 3: no challenges} | |

372 | 376 | |

373 | -\subsection{The efficiency of the market} | |

377 | +%For this experiment, the challenge system was removed from the game to evaluate its usefulness in guiding the players. The other features (the market and the skills) were available. | |

374 | 378 | |

379 | +%\subsection{Experiment 4: no skills} | |

380 | + | |

381 | +%In the fourth experiment, the skills were completely removed during the game sessions. The goal of this last experiment was to analyze the effect on the results when the players | |

382 | +%did not have the ability to choose one or many specializations and the bonuses attached to them. | |

383 | + | |

384 | +\section{Results and Discussion} | |

385 | + | |

386 | +\subsection{Testing hypothesis 1: the efficiency of the market} | |

387 | + | |

375 | 388 | The market system we implemented in the game allows the players to exchange circles and partial solutions (sequences). The main goal of the market |

376 | 389 | is to help the players in building longer sequences. |

377 | 390 | |

... | ... | @@ -402,7 +415,7 @@ |

402 | 415 | the two sessions for which we had the smallest total experience (see Figure~\ref{fig_totalXP}), both averages of sequence lengths were larger than the averages |

403 | 416 | of the game session without the market. Those observations confirm that the market is helping the players in the creation of longer sequences. |

404 | 417 | |

405 | -\subsection{The benefits of using a skill system} | |

418 | +\subsection{Testing hypothesis 2: the benefits of a skill system} | |

406 | 419 | |

407 | 420 | We implemented the skill system for two reasons: (1) to encourage the players to level-up, because the reward is a skill point, and (2) to influence indirectly |

408 | 421 | the players into doing actions that are either improving the solutions collected by the system or helpful to the other players (which in the end will also |

... | ... | @@ -534,7 +547,7 @@ |

534 | 547 | \end{center} |

535 | 548 | \end{table} |

536 | 549 | |

537 | -\subsection{The usefulness of the challenge system} | |

550 | +\subsection{Testing hypothesis 3: the usefulness of the challenge system} | |

538 | 551 | |

539 | 552 | The challenge system was implemented to analyze the current state of the game and guide the players towards doing actions that are currently needed. As mentionned |

540 | 553 | previously, five different challenge types were implemented in the game (see Section Challenge system for the complete list). In order to analyze the effect |

... | ... | @@ -619,6 +632,38 @@ |

619 | 632 | This can be explained by the fact that it was the hardest challenge. All the other challenges are more general and can be completed by |

620 | 633 | doing actions that are not specific to a certain subset of colors. Even if the market should be helpful in finding circles with the required |

621 | 634 | subset of colors, it seems highly probable that the players felt that this type of challenge was too hard and never tried to complete it. |

635 | + | |

636 | +\subsection{Testing hypothesis 4: relationship between total experience and percentage solved} | |

637 | +%Coming back on the 4 tests, total game xp vs percentage of problem solved | |

638 | +As mentioned in the Experiments section, the initial plan was to measure the impact of each feature by analyzing how much of the problem can be solved | |

639 | +by the players in each of the game sessions. Interestingly, we observed a larger than expected variance in the participants' skills which made it practically | |

640 | +impossible to compare one game session with an other. Indeed, some players quickly understood all the rules of the game and how to maximize their score, | |

641 | +while others struggled to make points during the whole session, even with the help of the authors who were monitoring the session. | |

642 | + | |

643 | +\begin{figure*}[htbp] | |

644 | + \begin{center} | |

645 | + \includegraphics[width=\halfWidth]{Figs/totalXP_session.pdf} | |

646 | + \vspace{0cm} | |

647 | + \caption{Total game experience and percentage of the problem solved for each of the 5 game sessions. 'XP' represents experience points. 'All' and 'All (2)' | |

648 | + represent the two tests with all the features on, 'No skills' represents the test without the skills, 'No market' represents the test without the | |

649 | + market and 'No chal.' represents the test without the challenges. | |

650 | + }\label{fig_totalXP} | |

651 | + \end{center} | |

652 | +\end{figure*} | |

653 | + | |

654 | +As shown in Figure~\ref{fig_totalXP}, the percentage of the problem that was solved is nearly identical for all the tests (around $60\%$) except for the | |

655 | +first test with all the features and the test with no challenges, in which the players in general performed worse (as indicated by the total experience | |

656 | +points for those game sessions). In particular, the comparison of the first game session with all the features with the second one ('All' and 'All (2)') | |

657 | +demonstrates that we cannot simply use the percentage of the solution found as a way to measure the impact of a feature. Even with the exact same game | |

658 | +conditions, there is a big difference in the total experience and percentage of solutions found. | |

659 | + | |

660 | +Notice that a game session with many good players combining for a high total of experience points does not guarantee that | |

661 | +a bigger percentage of the solution will be found by the players. This is due to the fact that, in the current state of the game, players can be selling | |

662 | +sequences that correspond to a solution that was already found earlier. While it would be possible to lower the score of a solution (sequence) that already | |

663 | +exists, it would be hard to explain to unexperienced players why one sequence is worth less than another with exactly the same length and number of colors | |

664 | +in common. That is why we decided to not take into account the existing ({\em i.e.} already found) solutions in the scoring function. | |

665 | + | |

666 | +In the following sections, we show the impact of each feature based on different metrics. | |

622 | 667 | |

623 | 668 | \subsection{Understanding what makes a good player} |

624 | 669 |