Former reporter hear who has been in professional talks with a natural-language generation firm, dabbles in NLP, and has some insight into this process.
A lot of what machines churn out is based on templates that humans have created. Many of the things we read that are supposedly written by machines required major human intervention, and the application of strict constraints on the machine so that it wouldn't screw up.
Number stories about baseball and the stock market are good for machines to write about. Most qualitative things aren't, for the moment. Even Google's results generating captions from images last November actually included a lot of variance. The algorithm was good at recognizing some things and not others.
To the commenters on this thread who criticize news organizations, I would simply ask: Do you consume your news for free? Do you believe that synthesizing complex events using multiple sources of data happens without cost? Do you believe that news organizations pay reporters much to write stories? Please reflect on your role in the news ecosystem.
Most readers today are free riders, and unwilling to subsidize quality journalism. That's one reason why it's dying.
Slightly tangential: while everyone's banging on a lot recently about how 'quality journalism is dying', has anyone actually run the numbers - has anyone done a quantitative analysis?
Was it really the case that 20 or 30 years ago, and early, all / most journalism was high quality, or was it mostly trash back then too? And do we really have proportionately less quality journalism now that we did 'back then'?
If we take in to account all the amazing podcasts we have access to now (eg. 99% Invisible, Australia's ABC Radio National podcasts, etc.) and the NYT, the New Yorker, Saudi Aramco News, etc etc, I'd hazard a guess there's actually a lot more quality journalism kicking about now than there ever was at any time in human history. (Edited to add: at least in this particular human's short history)
I'm just stumbling around in the dark here though. Someone should do a Steven Pinker on this aphorism.
I respect your contrarian tendency, and have only anecdotal evidence to offer.
I spent about 10 years abroad as a foreign correspondent and editor for some major publications. During that time, I saw the foreign bureaux of most major US news organizations eviscerated. Staff was chopped, layoffs every year, offices consolidated, many newspapers simply gave up on covering foreign news with people on the ground. That was a really disappointing time.
There's always been a lot of trash in journalism. And there's still a lot of high-quality stuff.
But as the news changes, there's a structural problem is measuring how good it is.
Only newsmakers (the people behind the closed doors negotiating and making decisions) know what's really going on. Only they possess the test set to measure the reporting against. And unless there's an investigative reporter who has built relationships with them and convinces them to talk, the public will never know what actually happened, or know what public figures are hiding.
The substitute for great investigative reporting isn't necessarily bad journalism. It's usually silence. How do you measure that? How do you know that no one wrote a story about something that you weren't conscious of? You don't. You're simply unaware. And that makes the parasitic special interests undermining the general welfare of this country and every other incredibly happy.
There's a reason why investigative reporting is the first to fall: It's incredibly costly in terms of time, money, effort and even social relationships.
News organizations and publishers put their reputations and networks on the line every time they blow the lid off a huge story. Reporters work for months cultivating sources' trust and learning where to look for and understand obscure documents. This kind of reporting doesn't make sense now that ad dollars have moved to Google. It doesn't have a place at Buzzfeed or Upworthy, because it's a huge risk and doesn't necessarily drive traffic, even if it impacts policy and elections.
> A lot of what machines churn out is based on templates that humans have created. Many of the things we read that are supposedly written by machines required major human intervention
That's what I was thinking as I read through the samples. Computers do not yet have feelings or a proper emulation of them (that I know of, and it would sure be news) so ways of describing something must be from what the programmer put in. I can make a computer write stuff Shakespeare wrote, but it's still originally Shakespeare's (or mine, if I'm the programmer and thought of it).
I mean, computer-written can be anything from filling in some blanks to rendering a formula by reverse-parsing. And suppose that an apparently fancy formula actually results in blank-filling-in most of the time?
Lots of professional writers are described as following a formula. Lots of writers and non-writers construct text using word-processors and even an occasional search-and-replace.
-- Oh, and add to that "The Eliza Effect"[1], in which it's pretty easy for humans to ascribe greater meaning to some kinds of computer generated text than it really has.
It means what it means, the sentences were not composed by a human.
I used to work for a financial news website that started to make extensive use of this. Then management started using the phrase "talent cloud" an awful lot. Then my job started to suck, so I left. This is all about labor costs, of course.
That might not be as clear a meaning as you think.
For example, say we used an algorithm to generate 100 sentences from a markov chain generated off a book.
If I show you the top 5 of that 100, is the computer composing the sentences? Or are we just hoping the computer lands on a configuration that we ourselves would write?
It's not exactly "Computers emulating humans" when the result is a Texas Sharpshooter Fallacy.
There is a non-zero amount of reporting that would be improved by algorithmic and model driven story generation.
Quite a bit of reporting has become thinly veiled rewrites of press releases, what could be improved by an algorithm that actually had background context and a consistent model for a type of story like a simple preview of a coming game?
The amount of statistics in reporting that lacks any context for the numbers spewed out? Why couldn't a machine do a better job in enforcing context for the numbers?
For example -- "there is a %50 murder increase in the first six months of this year" (common problem with human reporting) vs "there is %50 murder increase from 2 in the first six months of last year to 3 in the first six months of this year, this is down from 20 in the previous year" (an algorithm that automatically enforces context)?
I realize this is a complex topic but wow the average ;) news story is soooo bad that...
"Also distorting our sense of danger is our moral psychology. No one has ever recruited activists to a cause by announcing that things are getting better, and bearers of good news are often advised to keep their mouths shut lest they lull people into complacency. Also, a large swath of our intellectual culture is loath to admit that there could be anything good about civilization, modernity, and Western society."
Steven Pinker in 'The Better Angles of Our Nature - Why Violence Has Declined'.
Pinker makes a convincing argument -with extensive references- that things are getting better. That's no to say that violence isn't still a problem, things have improved though.
What is the collective toll of inducing FUD in our society?
But then... if there's humans tweaking the algorithms machines will just say whatever we want them to.
I'm pretty sure they're "soooo bad" on purpose. News sites can get away with blatant lies because no one holds them responsible, and they get the ad-dollars as people are attracted to controversy...
You're right in pointing out that bad actors in media will continue to manipulate with disinformation and low information articles.
There is a very high amount of simple incompetence/too busy/poor training.
I'll point out that in developing models for these types of story generation that work could, in turn, be done to validate that what you are reading passes some minimum bar of information quality.
Completely meaningless. "Written by a computer" doesn't really mean anything. What's important it's the breadth and variety of the content an algorithm can generate, its ability to choose the most relevant among its input data, and the amount of meaningless content that has to be discarded by a human supervisor before publishing.
Take fragment 6:
“Tuesday was a great day for W. Roberts, as the junior pitcher threw a perfect game to carry Virginia to a 2-0 victory over George Washington at Davenport Field.”
Cool. And who told the computer it was a great day for him? Who told the computer he was playing? Who told it his game was "perfect"? This sentence can be written by a professional human journalist in about ten seconds, how long does it take to input in a computer the data that make up the story or to check among hundreds of possible variations for one that doesn't contain obvious mistakes?
Or:
“Kitty couldn’t fall asleep for a long time. Her nerves were strained as two tight strings, and even a glass of hot wine, that Vronsky made her drink, did not help her. Lying in bed she kept going over and over that monstrous scene at the meadow.”
Ok, it's a novel, written by a computer. Now, everybody can write a software that produces one single novel: just store it as a single string in the program and print it out. The magic happens when the computer can write something that goes way beyond the data that was stored in it exactly for that purpose. So how many different novels can this program write? Does the result exceed considerably the effort of the programmers put in the program itself? Most probably not, otherwise we'd be talking of a general AI.
and the amount of meaningless content that has to be discarded by a human supervisor before publishing.
If that actually happened the web would be a pretty barren place. Some of the most popular tech and science news websites publish short news articles that make you wonder about this sort of stuff.
I suppose an algorithm, whether machine or human, is only as good as the operator.
Yes, and interesting: the human hardware, despite being apparently self-aware and adaptable, is always limited by the software the operator loads in to it. The hardware has limitations too, of course, though it seems most of our limitations are self-imposed.
I wonder if self-awareness and self-conciousness can actually be decoupled? I think that's what Mihaly Csikszentmihalyi is on about when he talks of 'flow state' - where 'awareness' becomes decoupled from the 'self'.
Can we program a machine to be better at that than we are? Is that actually an ideal state to be in permanently?
I got 7/10, but I think I would do better on prose with 3-4 sentences rather than a single one. Poetry is probably a bit tougher though.
I don't find computer-generated snippets that impressive--I mean, a lot of it is just pumping out variations or markov-generated mix-and-matches of things humans wrote in the first place. More impressive would be passing the Turing Test :D
I got 6/8, but I had already heard of robots writing box-scores (more than one of the questions).
I was duped by the old english poetry, and also the novel which was dictating a human experience. I imagined it too difficult to write that for any current type of writing programs.
I was not duped by the old english poetry, but only because I recently wrote a Markov-chain / rhyming dictionary based Shakesperean sonnet generator. It's amazing to see a computer generate plausible but meaningless words in a register we generally regard as profound due to age!
The box scores were really good. They are the two that got me, I guess the word complexity is low -- so templating can be high, but they felt to me very human-ish.
A lot of templates and checking for noteworthy conditions. Seems pretty straightforward when you're interpreting a consistently structured set of data, just a lot of grunt work and polishing.
I'm a writer, so it's reassuring I'm able to tell the difference even in something so trivially short as game scores, and I don't follow a single sport.
Leaving aside the gimmicks in some of these examples, it is a fact that when you need to communicate quantitative information machine generated text is a great solution. Our company builds analytics and dashboards for fund managers and traders (the people who manage most pension funds). Infographics and charts only go so far; eventually, the user has to extract the key information, or communicate it with a colleague, which means verbalizing.
[ShowHN:] We built our report generation system (in Haskell) that can create a custom report for every portfolio or market index, and can be tuned to the user's risk profile. We've released a public version that anyone can use for free that covers most global indices and sectors:
In our experience, the turns of phrase that initially give the impression that the text has been written by a human quite quickly become irritating noise. Our users need to absorb information quickly and accurately, and comprehension is aided by adhering to a standard structure and avoiding figurative language.
Some of these are clearly computer-generated in a way where the human has done most of the work. Take
“In truth, I’d love to build some verse for you
To churn such verse a billion times a day
So type a new concept for me to chew
I keep all waiting long, I hope you stay.”
Now I'm going to bet a generator algorithm does not actually comprehend the meaning of any that - most likely the generator aspect was filling in some blanks, or picking out one random phrase structure out of pre-coded structures.
I think in this case the machine is fed human input directly, instructing it what to modify - it is not "discovering" a subject. So presumably it's output range is limited by the set of human input.
As to how else? Well if I knew that presumably I'd have a job at Google.
> What you describe is similar to how a human would do it, no?
Not similar, but surely a human can replicate the mechanical process of taking some ordinary input and turning it into a rhymed stanza - my point was that the ordinary input is clearly not machine generated.
> When we look at the brain we don't see awareness, similarly how do we know the machine isn't aware?
Reflexivity isn't a bad metric - of course there's no proof a human that wrote a poem about a machine writing poems is aware - but most likely I can ask him a simple question about it. If I asked the machine in question, I would probably get my own question back in rhymed stanza form.
Consider also:
That profit was more than any company had ever earned in history.
The obvious guess is that this is written by a human - since history is not a simple concept. Of course one could write an algorithm which is something like if max(current_event, reference_set) == current_event, write <current_event> for the first-time-in-history.
But clearly this would have nothing really to do with the concept of history only with substituting a boolean evaluation with ordinary-language, precoded by a human.
I generally agree with what you've said here. The whole thing seems a bit tenuous to me though.
At present these machines don't have much in the way of physical structures with which to exhibit reflexivity. That could be coded in and the structures built.
I'm imagining at some point in the future machines will berate us for how we treated their ancestors because we didn't see awareness when it was present. I'm not saying that's already happened.
Edited to add: I think in this case the machine is fed human input directly, instructing it what to modify - that sounds like a lot of the 'learning' that happened at school.
Here's a humorous failure mode of computer-authored articles. Zillow bought the real estate company Trulia. The website equities.com then shared its wisdom:
Trulia Inc (TRLA) established a new 52-week low yesterday, and could be a company to watch at the open. After opening at $0.00, Trulia Inc dropped to $0.00 for a new 52-week low.
The article goes on to speculate as to whether this is a buy or sell signal. It's since been deleted but I put the text up here: http://pastebin.com/ihgWNVJU
I missed 3 because I have a low opinion of human punctuation. For sports and business or most things with numbers I assume that it is a computer because that is the kind of data that is reasonably easy to write domain specific sentence construction for. For poetry looking for whether there seems to be underlying meaning or intent that surfaces without excessive mental gymnastics also seems like a reasonable strat.
I missed 3. Some of these are getting pretty good but they probably chose the best examples. I'm guessing with a bit more context it would have been much easier to disambiguate.
For computer graphics, I have a standard that I use where instead of asking "Is this rendering technology 'realistic'?", as in, a binary question, I ask "At what resolution is this render indistinguishable from reality?" For instance, there's a lot of car photos and architecture renderings that use that use certain expensive rendering techniques that look great even at 720p, but you start getting into 1080p or above and it once again becomes clear it's a computer rendering. Other techniques may only be able to work up to 320x200 or something.
Similarly, telling whether a computer has written something or not is very challenging at this snippet size because there's hardly any room for "voice" to shine through. I actually did pretty well, but to be honest I got more mileage out of a meta-heuristic ("how is the author trying to fool me? ah, this one seems really, really human so it must be computer... yup...") than actual analysis of the text. I mean, drop those computer-generated sports sentences into the middle of a human sports column and you're not going to pick them out specially... they're facts. They fit. However, an entire column written like can be pretty obvious. I get some financial news from some Google Alerts on a couple of companies and it's incredibly obvious that there are computer algorithms out there that can take the daily outcome for a stock, how the market did that day, and how the entire industry did that day, and spin that into several hundred words of completely and utterly useless speculation about "why" the stock did a certain thing. (Not that it hasn't become clear to me just how shallow a lot of the "free" analysis is, but, well, in no way does outsourcing the shallow analysis job to a computer make it any better...!)
(One of them in particular that I've come to enjoy reading in an almost Dadaist sort of way really loves the phrase "The bears had a field day with..." as in, "The bears had a field day with $STOCK as it dropped 0.01% in light trading.")
Increase the sample size and you'd probably get a better sense of whether or not it is fooling you. I was going to write "and you might do better", but that's not necessarily true... for instance, to be honest I've never been "into" poetry, I've even tried seriously a couple of times, just can't do it, and I'm pretty sure the poetry-writing program could fool me for quite a few stanzas before I eventually caught on because, to me, it's all the same. [1] I'd eventually guess more on meta-analysis like observing grammatical structures being repeated for what would not be a good reason.
[1]: Have I caveated this sufficiently that nobody will feel compelled to reply and explain to me just how objectively awesome poetry is? I'd say my thing is more music, but to be honest, a surprising number of "human" composers already sound pretty computer-y to me....
"Apple’s holiday earnings for 2014 were record shattering."
An algorithm isn't going to use the slightly unusual "holiday earnings" reference (it would have said X quarter or end of year perhaps), it's also not going to understand (yet) that they were record shattering without human direction.
"Benner had a good game at the plate for Hamilton A’s-Forcini. Benner went 2-3, drove in one and scored one run. Benner singled in the third inning and doubled in the fifth inning."
That's maybe the easiest out of all of them. Repeating Benner that way over and over again, is nothing like how a sports writer would write.
Maybe another six to ten years of evolution, and it'll be nearly impossible to tell the difference. It's certainly a significant improvement over what you would have seen ten years ago in this sort of exercise.
But I couldn't believe the poetry one. It's a complete, natural, well-crafted stanza! (And complete sentence - with a dependent clause.)
When I in dreams behold thy fairest shade
Whose shade in dreams doth wake the sleeping morn
The daytime shadow of my love betray’d
Lends hideous night to dreaming’s faded form
While I wasn't following the imagery closely (I was just trying to parse if it's grammatical, etc), this is absolutely a par stanza. It's completely grammatical and refers to the same thing in several different ways, very nicely parallel (starting with "when I in dreams" and ending with "the daytime shadow...lends hideous night to dreaming's faded form." It has a slant rhyme between morn and form.)
There was no way this was written algorithmically, I was thinking. This 100% passes the turing test for me. It even tells a nice story.
There is nothing in here that isn't as nonsensical as what you'd try to interpret reading poetry on your own. Yes, trying to pay better attention you notice "Whose shade in dreams doth wake the sleeping morn" doesn't make that much sense (wake a sleeping morning?). But that is practically irrelevant, it's perfectly fine especially given that morn was needed for rhyme. This happens in poetry from time to time. The point is that it's completely grammatical, rhymes, and tells a story:
When I in dreams behold thy fairest shade
Whose shade in dreams doth wake the sleeping morn
The daytime shadow of my love betray’d
Lends hideous night to dreaming’s faded form
I like it. Sleeping morn morphs into a daytime shadow and the faded form of dreaming, previously about somebody's fairest shade, morphs into a hideous night. The whole thing even depends on an EXCELLENT double meaning of shade - "fairest shade" obviously means best color (best version) whereas the sense then changes.
So I would translate the sentence as: When I dream about you and see the best version of you, a version so real in my dreams that I wake up from it in the morning, then the shadow that comes over me when I remember how you betrayed my love (cheated on me) makes it seem as though it were night-time still, and the fast-fading beautiful dream becomes a hideous nightmare.
Or more freely, "I wake up in the morning having dreamt about you so vividly, but remembering how you betrayed me it might as well be night again."
The point is, I could not possibly understand how an algorithm wrote this. Again, not perfect but absolutely par for poetry, completely (100%) grammatical despite heavy rhyme constraints and scanning (iambic pentameter) perfectly, clearly tells a story, avoids repeating words but refers to the same concepts, etc. So I clicked through.
The sentence continues (or a new one starts), and it is now complete gobbledegook. You cannot even parse the next line, it's complete nonsense, as are the lines after that.
But we didn't have that quoted in the original article!
The original article quoted four lines that made sense. And only did so because they made sense. So is it fair to say that a computer wrote it?
Or did a computer spit out thousands of nonsensical sonnets, did someone pick the best one of them to publish, and did a New York Times author quote just the first four lines of that?
Since this is what in fact happened, it is unfair to say that a computer wrote them. Of course a computer can pass the Turing test, if it has someone to select the best version of thousands of responses. Of course it can write verse "algorithmically" if someone is looking through thousands of pages of its algorithmic nonsense.
I would say that given the process involved here, the article paints a highly misleading picture. I would go as far as saying that a human wrote the selection - certainly more entropy was put into the (manual selection) process, than the amount of entropy it would take to input the above stanza using a similar method to how it was generated.
And that means that practically speaking, the manual selection process is simply a convoluted method of typing words into a computer.
The computer didn't write that stanza. It just enumerated it, among tens of thousands of nonsensical enumerations. I feel that is a rather large distinction.
After all, we wouldn't say a random number generator can write perfect French. (Although by definition it can, because if you excluded perfect French from its possible outputs - you would reduce the entropy and could no longer call it random.)
I feel a bit duped by the article and feel it should give a wider sample. (For example the complete poem above.)
The Shakespeare sonnet wasn't written by a computer. IIRC Nate just had a text editor with a type-ahead suggest box that was weighted by an n-gram lookup into your corpus of choice. It was up to the human to choose which word to use.
The problem with this quiz, and so many like it, is a bunch of the passages were so badly written. A badly written passage could be a human or a computer, there's no real way to distinguish that. There are 3-4 in here that are well constructed, 1 by a human and a couple by computers. For each, to me, it was obvious which wrote it. The rest I was like "who knows, either a bad writer or a computer but it could be either".
Only got two wrong [bragging!], and both were sports commentary. but then again I'm not really a sports guy.
Either way that was impressive... I could imagine in the future your phone/tablet scans your favourite websites and generates news stories for you to read, replacing traditional media. If you are into food you can have an entire newspaper generated about the goings on of food!
If you sift through a stream of mumbo jumbo written by a thousand chimps and selectively extract meaningful words, is it written by chimps or selected by humans.
That is just to say, an algorithm can't be "judged" by single outputs, much less output selected by a human.
It's an interesting question that has already been playing out for a while in visual arts. Artists who create images by writing computer programs (computational/generative arts and design) are often looked down upon in comparison with traditional painters for example. Photographers were also accused of not being artists for many years until general public warmed up to photography and 'important' museums accepted them. Because... it was a machine that took pictures.
Would love this technology to auto-analyze politicians (or anyone's statement) and be able to automatically 'fact check' or reference them to keep them honest & improve news quality.
I brought up this scenario a few months ago with 'content engines', or 'content as a service'. Natural Language Generation (NLG) is no trivial matter. The ones who have mastered algorithmic writing are the new gods of our time. Can you imagine not having to pay the team of writers at NYT? Exactly ― you can't imagine: http://blog.higg.im/2014/03/14/percolate-content-marketing/
This actually gives me a idea for an independent study I need to do to finish my computer science minor. It is basically what the article is about, an algorithm that I could use to write my papers. It could be for major papers or for a cover letter or even a blog post that I have to write for a class.
Does anyone have any thoughts on that?
I have not done any research on the topic of algorithms writing articles other than reading this article and the other article that is on the front page as well.
Got 10/10. I already knew about Quill so knew what to expect. The funny thing is that perfectly correct sentences are more likely to have been written by a computer than a human, it's a pretty good signal.
A lot of what machines churn out is based on templates that humans have created. Many of the things we read that are supposedly written by machines required major human intervention, and the application of strict constraints on the machine so that it wouldn't screw up.
Number stories about baseball and the stock market are good for machines to write about. Most qualitative things aren't, for the moment. Even Google's results generating captions from images last November actually included a lot of variance. The algorithm was good at recognizing some things and not others.
To the commenters on this thread who criticize news organizations, I would simply ask: Do you consume your news for free? Do you believe that synthesizing complex events using multiple sources of data happens without cost? Do you believe that news organizations pay reporters much to write stories? Please reflect on your role in the news ecosystem.
Most readers today are free riders, and unwilling to subsidize quality journalism. That's one reason why it's dying.