By John Bates
Noam Chomsky has infamously stated, "There is no such thing as the probability of a sentence." For that, he is roundly mocked by computational linguists everywhere, whose statistical models have delivered tremendous development in the practical analysis of spoken and written speech.
His argument, though, is worth considering, and it is much more subtle than it sounds. "Colorless green ideas sleep furiously" is only the starting point of his critique: a sentence that is syntactically correct, and which can be assigned a (low) probability under statistical models, but which is semantically meaningless. (Of course, by virtue of its becoming a standard example in linguistics, it has become both more probable and more meaningful, although not in a way that is easily amenable to standard analysis.) Some computational linguists will argue that because that string of words is clearly possible, it is open to consideration, and that various approaches will weed out semantically meaningless cruft, much as the human mind, fundamentally a statistical engine, will puzzle over and then discard nonsense.
Chomsky's argument goes beyond the mere syntactic/semantic split, though. Computational linguists generally start with a set of utterances, text, or whole corpora, examining word and feature frequencies. By using these starting points, they build after-the-fact models of language. They can map those models to search for or summarize both syntactic and semantic features, e.g. named entity recognition, topic selection, or bias detection. They regard words and combinations of words as mere (ha!) random events, sequences of which can be assigned specific meanings. Chomsky dismisses that randomness. In his view, words have meaning. Words and sentences are created by people.
That's not to say that Noam Chomsky is some kind of starry-eyed mystic claiming special status for the human soul. Instead, his point (or rather, my interpretation of his point) is that words are objects of communication between two deliberative entities, and that the meaning of a sentence is therefore only interpretable taken in that context. As I write this sentence, it is the product of the sum total of all of my experiences and decisions up to this point, as well as the physical apparatus that I am using to deliver it. As you read it, I am making physical changes in your brain. Right now! You can't stop me! Ha!
Statistical models tend to strip away this context. His point might be better taken as something closer to: you can't predict what I'm going to say next, and I can't predict how you'll interpret what I say. Except, of course, for the context that we share. If I stand on the corner and talk about the coming apocalypse, you can probably guess quite a bit about me and my current mental state. If I babble instead about colorless green ideas, you're bound to be lost, and I cease to be communicating. Regardless, the likelihood of the words that I use has little bearing on the act of communication.
As someone who dabbles in both computational linguistics and statistical processes, I can take issue with this argument. I can point to numerous techniques that incorporate broader contexts into analyses. I can illustrate problem after problem that has fallen to our tools, and I cannot disparage anybody for attempting new challenges. But ACQUINE clearly demonstrates the shortcomings articulated by Chomsky. Photography, like writing, is fundamentally an act of communication, requiring the full context of a human's knowledge to go beyond the shallowest parse.
The frame is a perfect example. Subconsciously, we clearly do prefer images with boundaries. Consciously, or possibly semantically, we are able to eliminate them from our consideration of an image, or to view them as an extraneous feature. ACQUINE's developers must have forgotten about that, probably because they are so easily elided from our perception. But on the social networks, people posting simple snapshots rarely frame, while people who are trying to be "artists" do. Artists post in black-and-white (why? context, again...) but parents dump their flash memory cards to the web.
Obviously (at least, to me) the ACQUINE developers are not idiots. They know all this better than I do. They are attempting a grand thing: analyzing the affective perception of visual content. And there are some pretty cool refinements that are clearly possible: for example, clustering the responses of the raters can lead to social filtering, so that all of us who are Callahan fans can be distinguished from the infrared HDR cat lovers.
But "collective intelligence" and social filtering is at best a highly refined popularity contest, and fails at the one task that is most interesting: finding and evaluating truly original content. For, if it is original, it is, by definition, low probability, and also incomparable. How can you rate that which has never before been seen, if you can't understand it?
Featured Comment by Seinberg: "Johns Hopkins had natural language processing (i.e. computational linguistics) talks every Tuesday this past semester. I attended many of them, as someone whose research is not in natural language processing but who takes a serious interest in it and have dabbled with it in the past. The first guest lecturer was Martin Kay, one of the fathers of computational linguistics. Someone asked him point-blank whether the standard approach taken today by the Big Names like Google and the professors at the most celebrated computational linguistics university on the planet (Hopkins) is enough to eventually pass the Turing Test. The standard approach is statistical analysis. His response was immediate and he said, 'No. I think it will be a new method not in use today.'
He did eventually back-track a bit and admit that statistical analysis is helpful for several very specifics processes. For instance, part of speech tagging, for which statistical analysis is 97 or 98% accurate given a good training corpus (far better than the majority of humans do).
Where statistical analysis falls down, however, tends to be precisely where Chomsky seems to be going with Green Photographs: understanding context. Reference resolution is still a wide open problem. If several people are discussed in previous sentences, resolving pronouns like 'he,' 'she,' 'it,' etc. are very difficult. In parsing the semantic content of sentences, these pronouns should be linked to the entities to which they refer. Humans do this automatically, but current computational methods do it poorly. Statistical analysis does this very very poorly, particularly with complicated language. The current best we can do (more or less) is 'sentiment extraction,' where analysis tries to extract small parts of the meaning to get the general idea of what people are saying e.g. in blogs. For instance, 'does this blogger support Obama or McCain?' But that's easy.
"Then there's the problem of the pink elephant in the room. Imagine the date is September 12th in 2001. Everyone who talks to each other says things like, 'How could it have happened?' 'Isn't it terrible?' 'How are you and your family?' Every person in the US—and probably most in the world—knew precisely what had happened. But a statistical parser would be at a complete loss, because it wouldn't have any context in which to understand the Pink Elephant of 9/11 that was in the air.
"So statistical analysis needs to be augmented (or: should be used to augment) some other method, and that other method needs to include knowledge in its analysis. Then, of course, statistics might be used to determine with 95% probability that on 9/12 the Pink Elephant is the events collectively known as 9/11 (or 'the terrorist attacks' or whatever).
"One final link. A professor at my university is trying to incorporate knowledge into natural language processing, although perhaps to his detriment: he uses very little statistical processing, claiming the whole venture is for the birds."
Mike comments: Where I come down on ACQUINE is that it wouldn't work even if it worked.
One thing I've been threatening to do for a while now is to take a single photograph and map different "modes of approach" to it—that is, specify some of the various ways (formalistic, coloristic, epistemological, factual, connotational, emotional, etc., etc.) different viewers might choose to think about it, giving concrete examples of each mode of approach so people might actually understand what the heck I was talking about. I might still do it, too...another of those "one day" projects. At the very least I think it might help illuminate why there can't be a single standard for aesthetics, even if we don't define aesthetics as personal to begin with—which, incidentally, I do.
Featured Comment by Paul Pomeroy: "In Claude Shannon's seminal 1948 paper, 'A Mathematical Theory of Communication,' he says up front that, 'The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem.'
I think most people overlook the full significance of what he was saying. In part, he was saying that 'information' doesn't exist in the middle. What goes across the wire is without meaning and 'meaningless information' is an oxymoron. Information, true to its -ation suffix, is a process. It's not a noun, it's a happening and for people it happens mostly under the level of consciousness.
"The history of computational intelligence is riddled with examples of seemingly brilliant people who have made the mistake of thinking the meaning was in the message. This has lead to all sorts of naive optimism and costly errors. As an example of the later, consider the 1999 Mars probe that crashed because one device was 'sending information' using imperial units (pounds) to a device that was 'receiving information' in metric units.
"My favorite example of naive optimism involves an Artificial Intelligence project funded by the military. The goal was a machine that could look at a photograph (or real-time image) and locate any hidden enemy tanks. This was to be done using neural networks which were 'trained' by showing them thousands of actual photographs some including tanks that were hidden (some not very well, others so well that experts couldn't find them) and others with no tanks at all.
"After a surprisingly short training period the neural networks suddenly started getting really good at indicating which photos had the hidden tanks. In fact, they were perfect. So the military started trying them out in real life situations and...wait for it...they were completely useless (actually worse than purely random guessing).
"It took quite awhile to finally figure out what had happened. It turns out that the original set of photos had been taken over a two day period. They first photographed locations without any tanks and then they brought in the tanks, hid them and took another set of photographs. No one took notice of the fact that on the second day the sky was overcast while it had been sunny during the first shoot. No one, that is, except for the trained neural network. So, for about a million dollars the military got a device that could tell them whether or not a photo was taken on a sunny day.
"There hasn't really been a whole lot of advances since then. And maybe I'm mistaken, but it seems to me that finding aesthetic quality in a photograph is a bit tougher. Unless, of course, it turns out that aesthetics is just a fancy word for tanks on an overcast day."
"As you read it, I am making physical changes in your brain. Right now! You can't stop me! Ha!"
Make brain hurt. Oww!
Dense, but interesting, and being a practitioner of the "orphan of the arts", I agree about framing.
Bron
Posted by: Bron Janulis | Saturday, 23 May 2009 at 10:57 AM
So the upshot here seems to be that machines, even extraordinarily complex computational machines, cannot really mimic subjective human aesthetic judgments.
Can I say "duh" without sounding too improbable?
Posted by: Jeff Glass | Saturday, 23 May 2009 at 11:31 AM
I would very much like to see an infrared HDR photograph of a cat. That sounds.. amusing, if nothing else.
Posted by: Micheal Leuchtenburg | Saturday, 23 May 2009 at 12:00 PM
This is an excellent (and well researched) commentary on true creativity and uniqueness; both disconcerting and inspiring.
Posted by: Robert Fisher | Saturday, 23 May 2009 at 12:09 PM
"How can you rate that which has never before been seen, if you can't understand it?"
One limits the matrix and redefines the quality being assessed. At least that seems to be the method as evidenced by the caveats on their About page. But I may be reading something into it.
Posted by: Robert Howell | Saturday, 23 May 2009 at 12:15 PM
As you read it, I am making physical changes in your brain. Right now! You can't stop me! Ha!
These lines have made this my favorite explanation of Chomsky's ideas.
It is interesting thinking of photography in terms of this sort of communication, and wondering if visual images have grammar and syntax in the same way. We're pretty clearly hardwired for verbal communication, and obviously we're also visual animals - but are we hardwired for visual communication as well? Or is it piggybacked on the basic template for communication, and thus must be taught?
You've given me things to think about on today's long drive.
Posted by: Rana | Saturday, 23 May 2009 at 12:24 PM
Is today Saturday?
Posted by: Stan B. | Saturday, 23 May 2009 at 12:35 PM
Thank you, John Bates, for a most thoughtful (and very nicely worded) perspective. It is a welcome diversion from our usual, more narrow, focus here.
Then again, going from rating cameras to rating pictures may not be that big a leap.
People are funny: always looking outside themselves for 'valid' measurements on which to base their own feelings. How else would we measure the value of art if not for critics and pricetags?
Cheers!
Posted by: Tyler Monson | Saturday, 23 May 2009 at 01:01 PM
Wow, Chomsky! Wow, linguistics! Wow, semantics! I haven't read anything about it since I graduated and now I find the topic here on TOP. Thank you, Mr. Bates. And you, of course, Mike.
I'll just add something that probably strengthens your thesis about context. That is, "I can't predict how you'll interpret what I say. Except, of course, for the context that we share."
Have you ever considered the Internet? Particularly the Internet from about ten years ago and specifically the Usenet, the news groups. The text-only medium removes the non-verbal (dare I say, emotional?) content and so a good part of context, increasing the probability of misinterpretation. That's why the emoticons were invented, after all, although they do such an imperfect job.
Cold equations cannot replace the context. If a computer algorithm could really have an emotional response as the developers claim, it wouldn't really be a computer algorithm but a real artificial intelligence. And we are still far from that.
Posted by: erlik | Saturday, 23 May 2009 at 01:17 PM
This whole thing strikes me more as funny (if not cynical) than anything, but I would be interested in knowing where they got the algorithm for the work.
A long time ago (15 years) a couple of guys -- Russian immigrant painters -- set out to find out what the perfect painting would be in the U.S., and so they hired a public polling firm to find out. I read the book ("Painting by Numbers") when it came out, and found it to be both interesting and hilarious, but not interesting or hilarious enough to actually keep. I looked around on the web and found a review from 1998 in the New York Times.
The two painters interviewed 1,001 Americans, according to the review, by Luc Sante, and found that "Sixty-seven percent of respondents liked a painting that was large, but not too large -- about the size of a dishwasher (options ranged from "paperback book" to "full wall"). A whopping 88 percent favored a landscape, optimally featuring water, a taste echoed by the majority color preference, blue being No. 1 and green No. 2. Respondents also inclined toward realistic treatment, visible brushstrokes, blended colors, soft curves. They liked the idea of wild animals appearing, as well as people -- famous or not -- fully clothed and at leisure."
They then took their survey abroad, and found that a lot of these characteristics were actually reflective of international tastes, from all kinds of different cultures.
They then went on to paint some pictures that essentially ridiculed the whole concept of the information (the "American" favorite painting included George Washington and a hippopotamus -- the wild animal - in a watery landscape). Even so, the numbers were actually relevant, and the research methods were actually technically correct, as much as they can be in this kind of study.
It seems to me that any such algorithm for a program that would judge "art" would have to be based on similar studies, and, of course, would inevitably lead to hippopotami in the Hudson River.
If you actually apply the above numbers without any wit, it seems to me, the perfect American painting or art photo would be a hunting or hiking scene along a river, with both the hunter (hiker) and the wild animal in sight, with a blue sky reflected in the water-- which may explain why so many people love Winslow Homer, and why we see a million calendar photographs of the same thing.
JC
Posted by: John Camp | Saturday, 23 May 2009 at 02:04 PM
JC, an interesting story, but how do you explain the popularity of the bales of hay then?
Posted by: erlik | Saturday, 23 May 2009 at 03:52 PM
John Camp, have you seen the episode of the Dilbert show with the "Blue Duck" painting? Recommended. They do a study group and create an image which takes over the art world.
Anyway, this article may be over my head, because I don't really get the connection between the *probability* of a sentence or picture, and the *quality* of a sentence or picture.
Posted by: Eolake Stobblehouse | Saturday, 23 May 2009 at 04:10 PM
Interesting comments. I have been experimenting with still life photos modeled on Egyptian friezes, an aesthetic that predates Raphael and the ivention of linear perspective in painting. The Egyptian frieze is a totally different set of rules for beauty.
So what I found interesting is that my conventional western photos scored as high as 99.2 ( had seven in the high nineties) yet my friezes scored seven.
so either my friezes just suck, or the Acquine doesn't understand pre-renaissance aesthetics.
Posted by: paul bailey | Saturday, 23 May 2009 at 04:30 PM
"so either my friezes just suck, or the Acquine doesn't understand pre-renaissance aesthetics"
Or ACQUINE is clueless. I'm not saying it is, and I'm not saying it isn't, just that you shouldn't gloss over that possibility.
Mike
Posted by: Mike Johnston | Saturday, 23 May 2009 at 04:32 PM
To Micheal Leuchtenburg:
Here's an infrared HDR picture of a cat:
http://www.flickr.com/photos/pancier/2929129548/
Google couldn't find anything, but there are a few on Flickr.
Posted by: Chris Allen | Saturday, 23 May 2009 at 04:35 PM
"JC, an interesting story, but how do you explain the popularity of the bales of hay then?" --erlik
The answer to that question (which I assume is tongue-in-cheek) is (NOT tongue-in-cheek) a book in itself. I suspect that certain reactions to art, that can only occur with forms of art, are hard-wired; and among these are a sense of pleasure when we see fecund fields, sources of water, and signs of human activity. That's why Constable's warm country farm fields are so much more loved than American paintings of glorious wilderness.
Ergo, Haystacks.
One of the most famous paintings every made is Pieter Breugel's "Hunters in the Snow" (Sometimes called Hunter's Return") which is essentially (according to the statisticians) a perfect painting: clothed people, animals, water (though frozen), warmth, blue, etc. And for you Canadian readers, who will talk at the drop of a hat about how Canada invented hockey, you will see in this mid-sixteenth century painting, people playing pond hockey...
I'm not good at links, but the painting can be seenhere.
JC
Posted by: John Camp | Saturday, 23 May 2009 at 04:36 PM
A person walks into a gallery and is perplexed and disappointed at the photographs on display. They appear to be colourless green photographs - without artistic merit or meaning. A gallery employee advises the visitor the photographs are examples of the exploration of idea X, which critics agree is significant and important. The photographs failed by not including the general viewer in the conversation.
Art evolves and progresses (not sure if that's always the right word) with the introduction of new ideas that change the frame of reference and introduce new ways of having visual conversations. Think Braque and Picasso with cubism - a new way of looking at the world - or how photographs of child labour and third world sweatshops changed perceptions and started new conversations about brand fashion labels and western consumerism.
If the visual conversations become too far removed from the general collective understanding they are perceived as colourless green ideas sleeping furiously, and a gallery of photographs that are only understood by the cognoscenti. Context and shared understanding are part of the visual conversation.
It's the quality of the ideas and how effectively they are translated that matter. Executed well, photographs can result in the broader acceptance of new ideas and start new conversations in the collective consciousness.
As an earlier post mentioned, we are visual creatures and the language of imagery is still developing and not well understood. Never before have so many people on the planet created so many images. At first, there will be a lot of noise. But visual languages and our understanding of them are probably still in their early stages of evolution.
Posted by: Lynn_B | Saturday, 23 May 2009 at 08:14 PM
This sort of posting is why I sent you that book, brother!
Posted by: Knot | Saturday, 23 May 2009 at 08:36 PM
Actually, I like taking purrty pictures of my cat with a VGA camera phone. None of this fancy HDR stuff. The new language is "miaow". I just need a gallery prepared to print the cat-alogue.
Posted by: Lynn_B | Saturday, 23 May 2009 at 09:39 PM
"One of the most famous paintings every made is Pieter Breugel's "Hunters in the Snow" (Sometimes called Hunter's Return") which is essentially (according to the statisticians) a perfect painting: clothed people, animals, water (though frozen), warmth, blue, etc."
And all of that doesn't really take into account the emotional content of the painting. By the postures of the hunters and dogs in Breughel's painting, they are dead-tired. But at the moment of the painting, they are finally entering their village and will be home soon.
They are also successful - as far as I can see on the dark painting - the leftmost hunter carries a dead animal on his back.
The skaters on the frozen pond also mean that the village is prosperous because they have leisure time to play.
The fire on the left, beside warmth, shows that the peasants are either singeing a hog or rendering fat. Prosperity again.
We see all of that*. Computers don't.
BTW, there is the theory that Breughel's landscapes from that period, including this one, show the Little Ice Age. Also something that a statistical analysis wouldn't really understand.
* Of course, those of us who still know what you do with a slaughtered hog. :-)
Posted by: erlik | Sunday, 24 May 2009 at 02:10 AM
I was discussing this very topic with the guy who does our lawn just the other day, and bam!, it shows up in TOP. Amazing!
Posted by: John Roberts | Sunday, 24 May 2009 at 05:16 AM
Dear Folks,
Seems to me a lot of people are presupposing the answer (and worse, the import of the answer) before the experiment is complete.
Possibly semi-passable pontificating but extremely poor scientific method.
I certainly have no idea what fraction of aesthetic is psychophysical and what it cultural. I'm plausibly convinced it's not 100% of either, but if experimentation like ACQUINE eventually showed it was 90-10 either way, I'd not end up being shocked beyond words. Or 50-50. Or whatever.
What would shock me is if anyone came up with a truly plausible answer absent experiments like ACQUINE.
pax / Ctein
Posted by: ctein | Sunday, 24 May 2009 at 04:11 PM
I like it and thats all I have to say without typing for days.
Posted by: tony | Sunday, 24 May 2009 at 09:35 PM
Creationists, especially the "intelligent designer" kind, really want there to be some arbitrary measure on information that takes *meaning* into account. It is hard to explain to them that according to Shannon, a random string contains the *most* information since you need every bit to receive it properly.
Posted by: KeithB | Sunday, 24 May 2009 at 10:16 PM
More like this please! (Parse that without context.)
I'm pretty sure that John camp is referring to Komar & Melamid: The Most Wanted Paintings project which can be seen here.
http://www.diacenter.org/km/painting.html
Komar & Melamid are also responsible for
American Most Unwanted Song and American Most Wanted Song.
http://musicology.typepad.com/dialm/2008/04/you-want-postmo.html
Quote from the above "the opening soprano rap is especially arresting: lyrics are simultaneously rap and cowboy-related, while the vocal line is atonal and the bass is provided by a tuba. Note the bagpipe breaks."
Posted by: hugh crawford | Monday, 25 May 2009 at 04:17 AM
"What would shock me is if anyone came up with a truly plausible answer absent experiments like ACQUINE."
Plausibility in the results of the Acquine experiment would only be within the limited parameters set out by the experimenters, which are incomplete, and their presumptions, which are faulty. This is no different from the experience of studying Aesthetics in art or philosophy. What would be shocking is if, after they came up with their one plausible answer, anyone were to employ it and it alone. Humans! Always upsetting the apple cart.
Posted by: Robert Howell | Monday, 25 May 2009 at 11:57 AM
This is late because I had been contemplating just biting my tongue but...
The objective of Acquine is niether to succeed or fail. It is all about the information acquired trying to create such a system.
Similar for the various computational linguistics initiatives and the "Komar & Melamid" work.
The ONLY beneficiaries of this information will be marketers and propagandists. All of us regular folk will be the losers.
Posted by: Jeff Hartge | Tuesday, 26 May 2009 at 10:31 AM