Agreed. And for that part, we can leave it at that.
But… This comment thread starts to get a bit too deep, but I have one more thing to discuss I think is interesting: How would you measure the issue? Well, because you said you would like to get a data foundation and then measure the quality - compared to my position, that it is a subjective impression anyway and therefore perfectly fine to base reports on the subjective impression of visible people using the system. I even agree that it would be nice to have that data foundation. But how to do it that way?
In my opinion, one can't simply measure length. Even swearwords are only a possible, but not a sufficient factor for spam. My first idea was to measure the use of the spam-button, but given that the G+-Inclusion might change the basis for that heavily (Comments lived on G+ before, where the circles change the dynamic), that might not be a fair comparison. So where do you see the possiblity to get that objective foundation for your line of reasoning?
Well, while discussing this issue on HN I was thinking about the task obtaining data to reason objectively about the pros and cons so I'll be happy to share my thoughts. What could be done?
1. Query lots(100 000s) of Youtube videos and store the comments associated with the video.
2. Repeat the operation after 6 months when the Google+ integration goes in full effect and there is enough G+ comments.
3. Label an initial set of comments (100 000s) as spam, non-spam, hateful, sexist, neutral, etc using Mechanical Turk.
3. Use a supervised learning ML algorithm on a training and testing dataset to understand perfomance and error rate.
4. Iterate as needed.
5. Run an algorithm on the whole corpus.
6. Compare the results.
7. Publish the results on HN and discuss the issue based on data.
Obviously, this requires lots of resources so one could try to reduce the input dataset and see if it is possible to draw any conclusions. What do you think?
Ok, interesting. I was about to dismiss that approach, but instead took some time to think about it, and it might just work.
The problems I see:
> 1. Query lots(100 000s) of Youtube videos and store the comments associated with the video.
One would have to do it as early as possible, before they change too much, as the thesis is they already changed. Though I would be surprised if there weren't some studies which used comparable data, maybe something like that is available?
> 2. Repeat the operation after 6 months when the Google+ integration goes in full effect and there is enough G+ comments.
Is there a next step of the integration? If not, one wouldn't have to wait that long.
> 3. Label an initial set of comments (100 000s) as spam, non-spam, hateful, sexist, neutral, etc using Mechanical Turk.
That is the main culprit. I'm not convinced that the new set of comments is easily detectable as offending, given that the context seems to be more readily used by the trolls. First and Rickrolling is a thing of the past. Besides, even given the low prices there, to rate 100k would cost a lot…
But still. Even something like "they changed a lot and are hard to compare" would be an interesting result.
The algorithm is of course the next question, is something like that easily doable given the nature of the comments?
Hm. Is that something you seriously consider to do? It could be an interesting experiment, it surely would be an interesting HN-worthy article - and if you are in academics, it might be even worthy of a publication (maybe something like "study of the effect of de-anonymization on commenters on an internet-plattform") or at least a few credit point. Is there a working API to get those comments?
Agreed. And for that part, we can leave it at that.
But… This comment thread starts to get a bit too deep, but I have one more thing to discuss I think is interesting: How would you measure the issue? Well, because you said you would like to get a data foundation and then measure the quality - compared to my position, that it is a subjective impression anyway and therefore perfectly fine to base reports on the subjective impression of visible people using the system. I even agree that it would be nice to have that data foundation. But how to do it that way?
In my opinion, one can't simply measure length. Even swearwords are only a possible, but not a sufficient factor for spam. My first idea was to measure the use of the spam-button, but given that the G+-Inclusion might change the basis for that heavily (Comments lived on G+ before, where the circles change the dynamic), that might not be a fair comparison. So where do you see the possiblity to get that objective foundation for your line of reasoning?