Discussion in 'Requesters' started by Bobby, Jun 8, 2018.
Hey y'all, just a personal note from me.
Everyone knows I'm pretty split on both sides of the platform these days. I Turk a ton, I produce tools as much as I can, teach as much as I can, etc etc. I also do a LOT of work for various Requesters on the other side of the platform & I can say with some certainty that I totally understand where GeoHIVEs is coming from on this one.
What @Bobby is bringing to y'all is about as nice & serious a discussion opportunity as you'll ever get about what is actually a pretty big concern. This is legitimately meaningful discussion as much as he's a goofball & puts a fun spin on it. I'm going to open this up to other communities to come participate as well because its legit, businesses need to know/understand how we're able to fly through some tasks while still doing them legitimately (especially because a lot of people fly through them WITHOUT doing them legitimately), and you'll probably never get a better shot at providing a clear proof of concept documentation on it all than this thread.
So, go ham. But, as always, be respectful. When someone on the other side of the table goes "yo, we don't get it, it looks like bad data to us, can you help us understand?" you gotta take the olive branch & work with it so please no TO level content.
Loved X1000000000 !
I don't have the qualification but maybe open up the qualification to more people; allow each one to do only X number of HITs, which might stop people from racing through them to complete as many as they can. Unfortunately, that's the nature of work which pays per assignment -- when it pays terribly or just average, people need to do them quick to make them worthwhile. If something pays above average or excellent, people want to capitalize on it before the batch is gone.
How do the others feel about this suggestion? Thanks for the thoughtful response.
The GH team is also discussing modifying the value of your qual, should they feel you are gaming, instead of revoking a qual altogether. So instead of being revoked, you'd keep the qual, but not be able to take the HIT. I like this idea, as it won't reflect in your records as a revoke.
This isn't a bad idea. I've definitely stepped up my game when I've gotten the email about my value being reduced before.
The problem with the proof is that the actual batch is different than the one we've been rejected for.
I have attached a proof of my rejections rate. I had 6 rejections (for dumb reasons like requester not finding my worker id) in 2 years and 19 from the Geohive batch in a day. I don't think I've done them so fast and even the first hit was rejected so I was definitely not fast on that one. Most of us are working on multiple monitors so while a hit loads on one monitor we work the hit from the other one. I sometimes miss the submit button on one monitor and then find myself submitting 2 hits in the same time which might be telling your system that I worked too fast? And please don't forget about detailed instructions.
I don't mind y'all "hiring" new workers personally if its what it takes. Like someone else said though at some point there becomes a weird "race condition" where taking the time to read/really absorb the instructions feels "wrong" to a worker because the batch will be dropping at a rate too fast to make learning the instructions worth it. That's 100% wrong on the worker's part, I'm not defending it, just trying to paint a little nuance to the picture because I know it happens. Folks feel pressured to get out some HITs because they're straight up not paid to read instructions/etc and too many requesters let them get away with that behavior, lol. Its a crap sandwhich on both ends of the sub
FWIW I don't think there is a substantial difference but the communication might help if you're not doing just a binary "no longer qual'd" adjustment. A qual revocation doesn't hurt a worker's standing like a rejection (or block) do AFAIK.
I've got thoughts on data quality I'm trying to jot down for y'all, but one thing that would be helpful to know is if you're using metrics other than just speed to determine quality? It seems to be the main focus of the video but I've always just assumed you guys would use golden maps / known answer maps embedded which is pretty standard for most heavy batch work but idk. Speed is generally seen by workers as the worst metric to use outside of majority rules (which I doubt y'all can even use for these current hits up but I haven't looked at them much?)
From the worker perspective, most of the frustration from revoking quals or rejecting HITs comes from the lack of explanation/communication usually accompanied by that. Honestly, if you were like "yo we rejected this worker because he failed 80% of the golden questions we gave him" I don't think many people would be upset (well, that dude might be, but no one else is gonna blame you tbh).
This is definitely a good approach. Honestly, it really all boils down to the communication. I know you can't be here to do that forever now, but so much of this stuff is reliant on it haha.
A message saying the qual has dropped with an explanation on why and the opportunity to fix/raise it again would probably be a good system to look into implementing.
Basically if I'm doing something wrong, I'd love to get some feedback on WHAT it is I'm doing wrong and maybe give me a chance to do it right. I don't have a problem with rejections since you couldn't use the data. I mean, it would be ideal if you didn't reject, but...it's totally within your rights to do so But I think you'll get better work and more community involvement if you share feedback on why you rejected and don't just pull the qual with nothing else. Get a dialog going.
Suggestion is maybe a 2 stage qual with value 90 for you haven't done anything wrong yet. And then drop it to 80 with a message with some feedback as to why it got dropped if the situation warants it. And then if things still don't improve drop the qual to below 80 and make that the cutoff point for the work.
I mean, if somebody is obviously just hitting 1 1 1 1 1 over and over again or whatever, then sure, nuke 'em I'm just trying to save the turker who means well, but misread something and is making an honest mistake while trying to provide meaningful data.
Easy solution: qual test with submitting an example of the work. Quals/qual tests are basically Pokemon "gotta catch 'em all!" and everyone shuts up and listens when there's a qual, even unpaid (though paid will get a better reception). Learn the instructions from the qual and you don't have to waste money time.
Thanks for this @Bobby, it's really awesome to be treated as a human instead of faceless button pushers like we feel some requesters think of us as, and it's equally awesome to see the requester as a human too. Both sides of the table lose sight of that which I think accounts for a lot of poor quality which would be avoided when you actually share a human interaction with each other to work together exactly like this. I've said the word human so many times I think I just convinced myself I actually am a robot pretending to fit in.
I agree with what @ChrisTurk and @ThisPoorGuy said. A qual that also gave feedback at the same time would be nice. I think 3 stages gives more room for improvement: 100 -> 90 (1st feedback) -> 80 (final warning) -> 70 unqualified, but that also gives more room for errors in your data. I don't know if that's a qual per batch type, or overall ability to work on geo's hits.
Someone had mentioned limited workers to a max amount of hits per batch, but that seems to punish workers that submit good data while working quickly. Just reject/qual block the workers giving bad data, then newer workers or other good workers can fill the void.
I would love to know my geohive stats so I could tweak my workflow to give good data while trying to maintain a high hourly.
This 100%. Idc about people legit scamming, they're not even a consideration, block them too while you're at it.
If you're going to reject work for y'alls tasks it really needs to be well done / reasoned. GeoHIVEs shoves a LOT of work into a single HIT. If a worker does 7/10 slides correctly and GeoHIVEs rejects the entire thing thats kinda crappy. Amazon does that and its led to me not working on their tasks anymore. They'd shove 4 pieces of work into 1 HIT and if you got 1 wrong they rejected the whole thing --- but wait, I could have done 75% of the work correctly but because they set up their work in a crappy manner I'm in an all-or-nothing gamble which isn't a good situation to be in given the inherent risk of rejections.
If someone screws up a script & submits an entire HIT blank, by all means reject it and append a "Sorry, we have a policy of not paying for completely blank work, please be more careful in the future" or something.
But IMO it'd be really, really ideal if the tasks could be split up a little more if there's going to be an in-place rejection policy. I know that splitting up HITs sucks for workers who don't know how to multi-tab but at the same time the current pagination in the HITs up right now is equally as yucky.
Sometimes you get out what you put in:
I'm splitting this into its own thing, because the above made me go look at the HITs currently up.
To be honest, the current tasks up right now look lazy & sloppy and give off, to me, the impression that the GeoHIVE team didn't put any work into making them. There is no on-page formatting, no keybinds, nothing to help workers help the team. Frankly, that's on the GeoHIVE team, at some point it can't always be up to workers to produce community scripts that fix the lackluster (I'm being nice here) effort put in on the part of the Req to make the HITs a desirable work interface to interact with.
You ever work in an office with a total slob? Throws his crap everywhere on the desk, etc etc? That's my honest first impression of the HITs currently up (Example)
So much of those can be fixed with HTML/CSS/JS, a few hours of concerted effort by a GeoHIVEs dev to consider what that work experience is like would pay off huge dividends in improving the "QoL" of the worker, which is going to attract better workers. I don't have time in my day to spend 1-3hrs reformatting the entire HIT for the GeoHIVEs team, I personally will just choose to go work on something else
In my opinion when no up-front effort is put into creating the HIT, some of the quality workers the team would hope to attract are just going to go elsewhere, and the availability of the HITs to people who are just going to speed-run them is going to increase. Good workers don't spend their time unpaid fixing mistakes on the other end, they go do work that is setup correctly from the get-go.
Now, all of that is not going to solve your problem of folks speed-running the tasks, but it'll at least help the folks who wont do that interact with the work available. It sort of ties into the $/hr of the tasks without having to just say "increase the pay" which, I understand is not helpful lol.
But if I wanted to show an example of how quickly I could do these, I'd have to spend a minimum of an hour, but probably 2 or 3, completely rewriting the HIT for GeoHIVEs to show just how fast I could do these - thats not really workable haha.
Agree that requester feedback really goes a long way into helping a turker understand if what their submitting is considered good or bad data. A simple message like "keep up the good work" or "you missed a couple of buildings" something to help convey to user about the work they are submitting. Leave feedback on the hit itself.
This won't be a popular opinion but how about adding a default timer to each image before you can advance to the next image. Something like a ten to twenty second delay before being able to move on to the next image. This should weed out the lets submit hits in ten to fifteen second turkers. If your still getting bad data after that then you kind of know that the turker is just blindly pressing 1 without looking at the image then qual revokes and rejections should be handed out.
Use golden maps like @ChrisTurk said maps that you have "x" amount of buildings or "y" bridge is in the image. Other hits do use this on their hits and their have been some turkers who have had their accounts suspended due to just blindly rushing thru hits thinking that no one will check what they submit.
A tier score for qual on the hits would be useful, if your qual score drops then you know that you need some work on completing the hits correctly, at the same time if your qual score has gone up then you know that your own the right track with the hits. Ycharts is a requester that is known to do this.
If bobby doesn't respond TL;DR; to this post I'm going to be seriously disappointed.
But yes. This.
I went to go try & speedrun the HITs they have up.. looked and was like "no, sorry, can't do it" LOL. I miss the old TomNod interface for everything. It was a little janky, and I definitely reformatted a lot of TomNod to speed things up, but the stuff they have up right now is just unworkable
The HITs currently up are pretty bad in both content and style, and they aren't something I would consider doing after trying a few. The $.10 batches over the past week or so look similar, but flowe better and ended up being pretty decent with a relatively simple script... I get what you're saying about geo making them easier for workers, but I'm okay with them driving people away with hits that look like that were cool in the Windows 3.1 days, because I'll keep on keeping on.
Honestly, I used to love GeoHives. These most recent ones though are just awful so I haven't been doing them. There's no way to zoom in on the photograph, so it's quite hard to tell at a quick glance if something is a road/house/etc. There's just too much ambiguity in them for the pay, so I haven't been touching them.
10-20 seconds per image would bring the hourly to $1.80 - $3.60/hr on the latest $.10 batches. Not really a reasonable rate.
Right, I know you will, but that's totally unhelpful for him lol.
They have to learn to reformat their HITs and put out a good workspace if they're going to continue uploading these kinds of things, or accept speedrunners (you & I) who are honestly probably submitting a higher % of bunk stuff because idk about you but I'm a human being and if I'm being super honest my eyes glaze over after 15 minutes or so of uninterrupted button mashing. The boys gotta learn how to fly now that @Bobby isn't there to hold them up haha
I think if workers are honest with themselves, anyone speed running HITs (Pins, Geos, PFAs) can probably admit their data quality tanks pretty fast unless they're doing that IV line of meth/caffeine/addy while Turking, lol. There's just only so much focus willpower can sustain for a relatively mind-numbing task, its not even about malicious intent, I'm not saying folks are bad for trying to max their gains, its just the nature of needing to do things at as fast a pace as possible to maximize your personal gains + the effects of fatigue on the ol' noggin
ETA: I can't see their back end data, but this seems to be the crux of his issue. They're having problems w/ speed-runners who are marginal at best data-quality, and those are the only people even doing the tasks because those are the folks who can make them worthwhile. Otherwise they're left with people desperate/bored enough to work for $3/hr, which I cannot fathom being a particularly exceptional group of workers --- but again I'm championing this viewpoint blind as a bat & just going off what I've seen in other dashboards.
Generally, to be honest, you can get away w/ questionable pay if you make the tasks half decent to do IME. Sucks, but it do be what it do be, there are plenty of workers doing this as a side-gig who don't mind $5/hr on top of their $10/hr desk job, sitting at home w/ the kiddo bored of Soaps, blahdiblahdiblah. The people doing this FT demanding $20/hr for clicking 1 & 2 are in the minority.
Separate names with a comma.