Richard J. Senghas
Kasparov: Bxe7 [Black bishop takes White Queen at e7]
Deep Blue: c4 [White pawn to c4]
[Kasparov becomes the first human world champion to lose a regulation match to an artificial intelligence, IBM’s Deep Blue.]
Trebeck: The category is 19th Century Novelists, and here is the clue: William Wilkinson’s “An Account of the Principalities of Wallachia and Moldavia” inspired this author’s most famous novel….
. . .
Watson: Who is Bram Stoker?
[IBM’s Watson computer thus wins the final match against two human Jeopardy champions.]
Forstall: Do I need a raincoat today?
iPhone Siri: It sure looks like rain.
The October release of Apple’s iPhone 4S carries with it Siri, a significant achievement in applied linguistics, natural language processing, and product design and marketing. As far back as Pygmalion, humans have contemplated the godlike ability to produce creations to which we can speak, that would, in turn, speak back to us intelligibly and helpfully. Of course, such aspirations carry with them their own nightmares, as played out by Shelley’s Frankenstein and Kubrick’s HAL9000. So far, the most visible contributions of language-processing programs have been limited to text-to-speech (a la Stephen Hawking), increasingly reliable dictation transcription software, and some text-translation applications. The majority of us are not yet using such tools in our daily lives. However, with the rapid and widespread adoption of Siri we are crossing a line, and need to address a range of issues that warrant some careful thought.
Last February’s Watson Jeopardy game victory received international attention, not the least because analysts were as impressed with the complex linguistic aspect of that game, producing the right question that would match the given answer, as they were with the ability to mine enormous databanks of facts. Of course, Watson’s victory immediately spawned a flurry in the popular media. Unsurprisingly, popular attention soon faded, as we looked around for the Next Shiny Object to distract us, relatively unconcerned with the full implications of the Shiny Object Still Before Us. Parsing natural language input to determine what is actually wanted by us humans is difficult, very much because so many contextual clues are buried within linguistic forms that are often quite indirect. So, for those of us who analyze language cross-culturally, a key issue now is framing the usefulness and effects of our research to inform natural language processing.
In the examples of human/computer interactions above, we see a trend from narrow to increasingly broad applications, with correspondingly widening implications. Restricted chess moves and strategies can be computed by recursive analyses of potential moves, supplemented by some pattern recognition of strategies and tactics from prior games. These are simpler to program than what was expected of Watson. Watson, on the other hand, is required to parse the natural (English) discourse of the game’s host and the fellow competitors. The cultural milieu and linguistics of Jeopardy are complex, though as demonstrated in February, not insurmountable.
However, the algorithms Watson uses to determine best responses do not necessarily follow human patterns of processing, as discussed in a May 2011 Scientific American interview with Stephen Baker. In fact, among the long-planned first commercial applications is WellPoint’s “Dr Watson” for medical diagnostics, to complement rather than replace human performance, with interesting implications (both positive and negative). The tasks delegated to Dr Watson are those with which humans have trouble: recall of extensive details, complex computations, and problematic interactions of medications. Dr Watson frees caregivers to focus on contributions best provided by humans, among them patient/caregiver interactions, and intuitive genius when encountering novel situations for diagnosis and treatment. Clearly, a second issue we must explore are the effects of Dr. Watson-type tools in human interactions.
Yet, compared to Jeopardy, daily life is far more subtle and nuanced, and its linguistic environment is significantly more convoluted than games or even clinical settings, as any ethnographer would attest. So how is a mobile phone supposed to cope? It isn’t. At least, not by itself. The paradigm behind Apple’s Siri interface does involve data provided by the phone: which language is being used, a user’s geographical location, a list of known contacts (some tagged with traits such as home, work, relationships such as spouse, child, manager), calendar events with times and locations, and lists of reminders. These idiosyncratic data are combined with some sound processing software to gather the user’s audio input. And —the most important part— all of these are bundled up and instantaneously transmitted to centralized servers much like Watson (more on this later), with access to serious processing power and extensive databases, including maps, details of businesses, and other information that trade secrecy will prevent me from ever knowing. The processing doesn’t happen in the phone. Among the many issues such a paradigm handles better than Watson (in its current form) ever could are answers to questions such as “will I need a raincoat today?” These require knowing where the person is at the time of the query, expected weather conditions in the nearby area on that given date, and many other cultural assumptions (raincoats and umbrellas are not universal!).
Among the first user complaints about Siri are the problems it has with those who speak English with “non-standard” accents. The initial problems with accents are not surprising, but given that the phonology of input must already be addressed by Siri in order to handle the expected “normal” range of Standard English input, I suspect optimization to be relatively simple, especially if the devices can carry codes to cue the servers as to which phonological algorithms have already proved most effective with the user of the given phone. Likewise, when other languages besides English are encountered (French and German are coming next), sentence word order and lexical items should be relatively easy to accommodate on Siri’s servers. (Ironically, I doubt that Siri will ever handle the endangered Nigerian language Siri…. What are the implications of leaving certain languages behind?) However, figuring out what is actually meant by particular pronouns, here/there, this/that (aka deixis, indexicality, and anaphora) doesn’t always map easily across different languages. Similarly, we might see issues regarding the gendered and otherwise socially-constructed nature of what is expected in a “personal assistant.”
The tethering of mobile devices to large, centralized servers via wireless networking provides new problems as well as new opportunities. Such configurations do open up risks for system failures when connections go down (which occurred following the initial release), but they also allow for servers to gather and compare a wide range of discourse, complete with supporting data that might help unpack issues of deixis, cultural discourse frames and metaphors, and speech events. Certainly, ethical issues associated with gathering such discourse are as complex as the linguistic ones (imagine the Human Subjects Review aspects alone!). Yet, the potential linguistic corpora open entirely new horizons. Time for more fieldwork to help analyze those corpora….
Senghas: Who are you?
iPhone Siri: I am your humble virtual assistant.
Richard J. Senghas is a professor of anthropology at Sonoma State University in Northern California. His research follows the emergence of a sign language and Deaf communities in Nicaragua.
Editors of Language and Culture Column: Leila Monaghan, leila.monaghan (at) gmail.com; Jacqueline Messing, jmessing (at) usf.edu; Richard Senghas, richard.senghas (at) sonoma.edu