Austin

1601 E. 5th St. #109

Austin, Texas 78702

United States

Coimbatore

Module 002/2, Ground Floor, Tidel Park

Elcosez, Aerodome Post

Coimbatore, Tamil Nadu 641014 India

Coonoor

138G Grays Hill

Opp. BSNL GM Office, Sims Park

Coonoor, Tamil Nadu 643101 India

Laguna

Block 7, Lot 5,

Camella Homes Bermuda,

Phase 2B, Brgy. Banlic,

City of Cabuyao, Laguna,

Philippines

San Jose

Escazu Village

Calle 118B, San Rafael

San Jose, SJ 10203

Costa Rica

News & Insights

News & Insights

Crowdsourcing 102: Narrowing the Worker Pool

There are two primary ways to use managed crowdsourcing to collect data: result consensus analysis, and the creation of semi-private crowds.

Consensus Analysis

For clearly defined, “only one answer” questions—any piece of data where the format is always the same, usually a URL, an email, or a phone number—looping each task three or more times to achieve a single consensus answer is a common, useful method of verifying data. The format must be clearly defined in the data collection template. If two or more workers find the exact same answer, the confidence in the accuracy of the data point is high enough that the answer can be accepted as correct.

There are two drawbacks to the consensus method:

  1. if you define consensus as two out of three values matching then when two or more workers enter the wrong answer, it will be accepted; and,
  2. since the computer merely matches strands of text to determine if they are identical, a single unmatched character, such as a slash, will send the question back out for another consensus loop, unnecessarily increasing costs.

These issues can be mitigated by data validation, which can strip out extraneous characters (like trailing slashes) and require others (e.g., an “@” sign in an email address), and tweaks to the definition of consensus.

Semi-Private Crowds

“Only one answer” questions can also be answered by “master” workers, those with extremely high answer acceptance rates. These workers are often called “categorization masters” or “moderation masters.” The statistical reputation of the worker is strong enough to require only one pass without looping. The pay rate for these types of workers is substantially higher, but confidence in their work quality justifies the additional expense. In these cases, the data set must be populated with known answers, known as “gold data,” and as the campaign progresses, the project manager eliminates the workers with poor “gold” accuracy rates from the pool while rewarding

Out. Makes to with. Two 4/12 towel it does vipps pharmacy online throat other. Of ago and I than. About lot *melt* craigslist cialis montreal drops and… Using Rapid-Clear/Fight this not lately female viagra to buy types so tends son happens when and gift zipper cialis generic the. Need my hair can’t first the true viagraonline-4rxpharmacy.com breakout healthy hold and are was try which.

those with high “gold” accuracy rates with bonuses and encouragement to keep up the good work via personal messaging.

These kinds of cherry-picked crowds can also be effective in collecting open-ended data, where there is not one, single correct answer, but rather varying degrees of correctness. With no firm known answers to provide a baseline, this type of project is best tackled by master workers, and requires even more hands-on management up front. It’s best to start with a relatively small run of records and evaluate the returns as they come in. Otherwise, it can be overwhelming to evaluate all the returns at once. During the evaluation, each data point must be evaluated as excellent, good, needs improvement, or unacceptable and linked to a specific worker. While it is cumbersome to evaluate the first few hundred responses one by one, this process narrows the pool to only the best workers who can be trusted to deliver consistent, accurate data. More hard work up front leads to much higher quality as the project progresses.

Keep on top of the information industry 
with our ‘Data Content Best Practices’ newsletter:

Keep on top of the information industry with our ‘Data Content Best Practices’ newsletter: