> *I guess it is a complete coincidence that popular "we are now hacker news" th...

vdaniuk · on Nov 17, 2013

>Yes, it is. Not alone, but given the argumentation that Youtube-comments did get even worse because of this change.

No, it is not. You cant cherry pick examples to prove your point. To argue that Youtube comments did get worse by supplying a single(10,100,10000) data point(s) is preposterous. One should do a comprehensive scraping of comments for many videos and many countries and many categories, analyze for quality and compare with previous data(without Google+ integration). Then you have a solid foundation for making conclusions. Anything else is just verturing into fantasy land.

onli · on Nov 17, 2013

This is not a scientific journal… Sure, we could use those principles for everything, but I doubt it is necessary here. To look at the opinion of the people hosting those channels, who are directly in contact with those comments, and to observe their reaction and base a first conclusion on that, is perfectly fine for me.

vdaniuk · on Nov 17, 2013

Well, obviously we have very different criteria for judging arguments and data.

onli · on Nov 17, 2013

Seems that way :)

I think I should try to make that clear: I think that the article is fine. And I think it is alright to base a first conclusion on the reactions of the top-channels, and to report about those reactions given the context. The example given is just an example, and doesn't claim to be more, but it is a convincing one, especially combined with the other reports claiming the same thing: That the promotion of discussed=controversial=trolly comments on this G+ inclusion, as it seems to work now, is a bad thing for Youtube comments.

And isn't it true that Google did say they will try to improve that? Could be taken as an additional point going for that observation.

Note that I agree that this is not final - to be totally convinced about the topic, we have indeed to look at more data (and give it more time). Maybe Google will provide an argumentation following that. Though this is a hard problem to analyze objectively anyway, the subjective impression is what matters.

PS: I don't think your argumentation warrants downvotes!

vdaniuk · on Nov 17, 2013

We won't be able to find any common ground on this matter. I belive that subjective impression of random journalists or channel owners doesn't matter at all. So we'll agree to disagree.

onli · on Nov 17, 2013

> So we'll agree to disagree

Agreed. And for that part, we can leave it at that.

But… This comment thread starts to get a bit too deep, but I have one more thing to discuss I think is interesting: How would you measure the issue? Well, because you said you would like to get a data foundation and then measure the quality - compared to my position, that it is a subjective impression anyway and therefore perfectly fine to base reports on the subjective impression of visible people using the system. I even agree that it would be nice to have that data foundation. But how to do it that way?

In my opinion, one can't simply measure length. Even swearwords are only a possible, but not a sufficient factor for spam. My first idea was to measure the use of the spam-button, but given that the G+-Inclusion might change the basis for that heavily (Comments lived on G+ before, where the circles change the dynamic), that might not be a fair comparison. So where do you see the possiblity to get that objective foundation for your line of reasoning?

vdaniuk · on Nov 17, 2013

Well, while discussing this issue on HN I was thinking about the task obtaining data to reason objectively about the pros and cons so I'll be happy to share my thoughts. What could be done?

1. Query lots(100 000s) of Youtube videos and store the comments associated with the video.

2. Repeat the operation after 6 months when the Google+ integration goes in full effect and there is enough G+ comments.

3. Label an initial set of comments (100 000s) as spam, non-spam, hateful, sexist, neutral, etc using Mechanical Turk.

3. Use a supervised learning ML algorithm on a training and testing dataset to understand perfomance and error rate.

4. Iterate as needed.

5. Run an algorithm on the whole corpus.

6. Compare the results.

7. Publish the results on HN and discuss the issue based on data.

Obviously, this requires lots of resources so one could try to reduce the input dataset and see if it is possible to draw any conclusions. What do you think?

onli · on Nov 17, 2013

Ok, interesting. I was about to dismiss that approach, but instead took some time to think about it, and it might just work.

The problems I see:

> 1. Query lots(100 000s) of Youtube videos and store the comments associated with the video.

One would have to do it as early as possible, before they change too much, as the thesis is they already changed. Though I would be surprised if there weren't some studies which used comparable data, maybe something like that is available?

> 2. Repeat the operation after 6 months when the Google+ integration goes in full effect and there is enough G+ comments.

Is there a next step of the integration? If not, one wouldn't have to wait that long.

> 3. Label an initial set of comments (100 000s) as spam, non-spam, hateful, sexist, neutral, etc using Mechanical Turk.

That is the main culprit. I'm not convinced that the new set of comments is easily detectable as offending, given that the context seems to be more readily used by the trolls. First and Rickrolling is a thing of the past. Besides, even given the low prices there, to rate 100k would cost a lot…

But still. Even something like "they changed a lot and are hard to compare" would be an interesting result.

The algorithm is of course the next question, is something like that easily doable given the nature of the comments?

Hm. Is that something you seriously consider to do? It could be an interesting experiment, it surely would be an interesting HN-worthy article - and if you are in academics, it might be even worthy of a publication (maybe something like "study of the effect of de-anonymization on commenters on an internet-plattform") or at least a few credit point. Is there a working API to get those comments?

paisawalla · on Nov 17, 2013

This is why the liberal arts need to survive. Because since at least the Greeks, man has known that persuasion involves more than simply a structuring of facts into aesthetically pleasing syllogisms. But there are some folks who forget that logic begins and ends with axioms, and if your axioms don't map to anything about the world, all your arguments are navel-gazing. One goal of argumentation is to convince other people to accept your axioms, which will not be achieved when you scoff at them for being so illogical by not accepting -- a priori -- your arbitrary and private set of rules.

tl;dr -- Logical operations are universal, but not the axioms.

vdaniuk · on Nov 17, 2013

You are absolutely correct. I am lost, however, at why your comment is directed to me, not my opponents who are trying to persuade the community at large that G+ YT integration is evil.