Wednesday, 20 June 2012

I, robot

From the Arnold Schwarzenegger character in the Terminator series to Lieutenant-Commander Data in Star Trek: The Next Generation, from R2D2 and C3PO in the Star Wars series to Marvin the paranoid android in The Hitch-hikers Guide to the Galaxy, we have become accustomed to thinking of robots as machines with personalities. We take them for granted and rarely pause to ponder how difficult it might be to construct such a machine. If we did stop to think, we might begin to realize how perfectly suited for its purpose is the human brain.

In fact, the gap between imagination and reality is exceptionally wide. Although it would be a foolhardy prophet who predicted that this gap will never be closed, the human race is not on the threshold of replicating itself in machinery. To imagine otherwise is to underestimate the complexity of the mental processes that humans employ as a matter of routine. The truly awesome aspect of this complexity is not the intellectual output of a William Shakespeare, a Ludwig van Beethoven or an Albert Einstein but the routine calculations performed by the mind of a five-year-old child. They are no less marvellous because they happen below the horizon of consciousness.

Take vision. The cinematic convention for the view through a robot’s eyes is either that of a distorted fish-eye lens or the cross-hairs of a gunsight. We, as viewers, understand the convention because we have brains that have evolved in order to interpret what our eyes see, but such a view would be useless to a robot. The first requirement of any visual system is that it be able to determine where an object ends and the background begins. However, the world is not like a child’s colouring book, with comforting black lines to delineate the boundaries, and the light falling on a human retina does not produce a ready-made picture. It merely generates a series of electrical impulses, which to be of any use must then be interpreted by the brain.

The best analogy for what a robot would ‘see’ is a rectilinear grid of numbers in which each number represents the brightness of a small portion of the visual field, with larger values indicating brighter cells within the grid. We may reasonably expect the robotic brain to be able to interpret any significant difference in the values of adjacent cells as indicative of a boundary, but this is where the process of interpretation becomes vastly more complicated. A large number next to a small number could be the result of a light object against a dark background, a dark object against a light background, a dark and a light object touching each other, the edge of a shadow, two different shades of the same colour on the same surface, and many other combinations, all of which the human brain can distinguish so easily that we fail to appreciate how difficult an achievement this is.

And the difficulties are just beginning. Once the objects and background in the visual field have been delineated and distinguished, a robotic brain then needs to identify these objects, which means that it needs to identify fundamental properties such as colour and composition. At first glance, the problem appears to be a trivial one. Compare a lump of coal and a snowball, one black, the other white. If larger numbers represent brighter regions of the visual field, then large numbers must, intuitively, indicate the presence of a light material such as snow (and small numbers the presence of a dark material such as coal). Not always. More light bounces off a lump of coal outdoors than off a snowball in a typical indoor setting, because the brightness of a given object is merely a measure of the amount of light it reflects. Nevertheless, the human visual system can see a bright outdoor object as black and a dark indoor object as white. Unlike a robotic system, it is not easily fooled by trompe l’oeil confabulations, perceiving the world as it is rather than the world as it is represented on the surface of the retina.

The next problem is the estimation of depth and relative size. In the absence of clues, even the human system can fail here, unless there is a basis for comparison. A postage stamp in your hand and a building on the horizon that is the same shape will produce the same effect on the retina, and it is only by experience that we learn which is which. In my own experience, when working in the barren, featureless landscape of the Sahara Desert, I found it almost impossible to decide whether a distant object was a stick driven into the sand a quarter of a mile away or an oil rig ten miles away.

Finally, having solved the problems of shape, brightness, colour and size, our artificial vision module will also need to assign identifying tags (names) to the objects it detects and be aware of their purpose. This is difficult enough for simple geometric shapes and letters of the alphabet, but it is an almost intractable problem to construct any kind of artificial template for the recognition of human faces. Yet, unless a person has sustained damage to one or more parts of their brain that deal with vision, they will still be able to recognize people whom they haven’t seen for twenty years or more, despite the subtle changes that will have taken place in the intervening period. The template would also have to take into account what a face looks like under a huge range of different lighting conditions, a compensation that a human viewer performs effortlessly.

In other words, a seeing robot simply cannot be built with just the fish-eye viewfinder of movie convention, and it should not come as a surprise to learn that the human visual system is not built this way either. And there is one more factor to consider: an early evolutionary adaptation of our sensory input systems ensures that we pay attention only to signals that are changing. This adaptation is most obvious in relation to our senses of hearing and smell—how often have you totally forgotten an annoying racket emanating from a nearby construction site, or a nauseating smell that made you feel sick when you first encountered it—but even our vision shuts down if absolutely nothing is happening in the visual field, to give our video-processing circuitry a break. There is more to seeing than meets the eye.

Once we have mastered the problems associated with artificial vision, the next challenge is locomotion. The optimum engineering solution to the difficulties that arise in moving an object around is to set that object on wheels, especially if it is heavy. For this reason, it is tempting to think that a robot would be much better off with wheels than with legs, except that wheels are of limited use on rough or uneven terrain.

However, a mere two legs is not the intuitive choice. As one leg moves, the other has to maintain the body’s balance, which involves constant monitoring and feedback in order to make instant fine adjustments. All of this requires processing power. While four legs may be a better technical solution—creatures with four legs can move far more quickly, and far less effort is required to maintain balance—there is an unexpected advantage in bipedalism that offsets these assets.

The human hand, even more than the hand of other primates, is a machine exquisitely adapted to its purpose. And the evolution of that hand, over the last million years, has driven the evolution of the brain, to the extent that a significant amount of grey matter is now dedicated to the operation of the hand. While the human brain has not increased much in size during this period, there has been a considerable increase in the size of the cerebral cortex, especially the frontal lobes, which are associated with such so-called ‘executive functions’ as self-control, planning, reasoning and abstract thought. However, like every part of human behaviour that we take for granted, building a robot with these functions presents challenging engineering problems.

We often hear on the news about the latest in robotic hands, and what they are capable of doing, but if you think about the range of grips that the hand has available, you will quickly realize just how versatile is the human hand, and how much further researchers in the field have to go to design a robotic version of equivalent versatility. There is the grip between thumb, index finger and middle finger used to hold a pen, the grip between thumb and index finger for turning a key in a lock, the grip between two fingers to hold a cigarette or a spliff, the grip that employs all five fingertips to pick up a flat disc like a beermat, and the way we grip a glass of beer or a hammer.

In this last case, the grip is the same, but the amount of pressure applied by the fingers differs considerably, illustrating another characteristic of the human hand: the ability to vary the amount of pressure applied depending on the object to be picked up. And then there is the dazzling dexterity required to manipulate a pair of chopsticks effectively, a skill at which, incidentally, most Chinese demonstrate only moderate competence. The point to bear in mind here is not that the human hand is a masterpiece of engineering, which it is, but that achieving this level of dexterity requires a colossal amount of processing power.

Imagine that we have finally succeeded in designing suitable visual, locomotor and manipulation systems, but there is another problem lurking on the horizon. An intelligent system cannot treat every object that it encounters as a unique entity unlike anything else it has ever seen. It has to have some means of deciding whether a new object belongs in a previously seen category or whether it should be assigned to a new category, and in making that distinction it has to have some way of distinguishing between essential and incidental properties. At the risk of repetition, this is another skill that humans are good at, but designing a comparable artificial system is a massive engineering challenge.

One of the most interesting commentaries on robotics is Isaac Asimov’s novel I, Robot, in particular the book’s three laws of robotics:
• A robot may not injure a human being or, through inaction, allow a human being to come to harm.
• A robot must obey orders given it by human beings except where such orders conflict with the first law.
• A robot must protect its own existence as long as such protection does not conflict with the first or second law.

Asimov showed remarkable insight by including the third law, because self-preservation is not an automatic property of an intelligent system. However, with the first and second laws, the author fell into the trap of echoing the ancient fear, illustrated, inter alia, by the rampaging golem of Jewish legend, Faust’s bargain with the devil, the sorcerer’s apprentice, Frankenstein’s monster, and the computer in 2001: A Space Odyssey, that artificial intelligent systems would one day become so smart and so powerful that they would turn on their creators.

Unfortunately, Asimov was unable to step outside his own thought processes and recognize them as artifacts of his mind rather than universal and scientifically verifiable laws. The human capacity for evil is never far from our thoughts, and it is disarmingly simple to imagine evil to be an inescapable aspect of our existence, just as it is almost instinctive to think that a self-aware system must possess an ego, as envisaged by the title of the novel (and ego, or intention, is a necessary component of evil).

On the other hand, although machines built originally by humans are unlikely to turn on their creators, we have no way of knowing whether other civilizations on other planets have developed machines that are programmed to kill. Even now, it is possible that a civilization in a not too distant star system, having picked up I Love Lucy on its radio telescopes, many years after the original broadcasts, has dispatched a fleet of murderous robots, like the Cylons in Battlestar Galactica, to exterminate the perpetrators of this outrage.

Friday, 15 June 2012

peculiar pronouncements

You say eether and I say eyether,
You say neether and I say nyther,
Eether, eyether, neether, nyther,
Let’s call the whole thing off!

Ira Gershwin, Let’s Call the Whole Thing Off, from Shall We Dance, 1937.
The catalyst for this post was the BBC’s reporting of the conflict in Syria, in particular reports of the bombardment of the city of Homs by government forces. It is of merely incidental concern that there has been no consensus on how this beleaguered city’s name should be pronounced, but it does point to a general malaise within the BBC that was not present thirty years ago, when a newsreader, faced with an unfamiliar place name, was expected to consult the corporation’s Pronunciation Unit for guidance before broadcasting.

Whether this unit still exists I have no idea, although it does seem like the kind of outfit whose services might be dispensed with if the BBC is required to cut its costs. Certainly, the variety of pronunciations heard on the BBC nowadays suggest that it is no longer in business, and individuals are thus left to make their own decisions.

So what is the correct pronunciation of ‘Homs’? I have no idea, although I can make an informed guess. However, it should be noted that ‘correct’ in this context does not have an objective meaning, and in using the word I am merely reflecting the way in which residents of the city might refer to it. With this in mind, I have noticed that the majority of reporters and newsreaders say ‘Homz’; very few refer to the city as ‘Homss’, which I believe to be the ‘correct’ pronunciation.

There are two pieces of evidence for this view. First, I cannot think of a single English word in which a voiced consonant is followed by an unvoiced sibilant (‘-ss-’), so it is natural for a native English speaker to voice the sibilant in any unfamiliar word (we say ‘bedz’ and ‘dogz’, for example, not bedss’ and ‘dogss’). Second, most Muslims pronounce words like ‘Islam’ and ‘Muslim’ with the sibilant unvoiced, so I deduce that they would do the same with ‘Homs’, although a counter-argument here might be that in ‘Islam’ and ‘Muslim’ the sibilant follows a vowel, and therefore the analogy is not valid.

There are times when I wonder whether there is a conspiracy not to agree on a specific pronunciation, because some words may be pronounced in three or four different ways. When the existence of al-Qaeda first registered with the general public, after the attacks on New York and Washington in 2001, President Bush pronounced ‘al-Qaeda’ to rhyme with ‘raider’. Later, a pronunciation that rhymed ‘al-Qaeda’ with ‘rider’ became popular, and hapless former British Home Secretary John Reid muddied the waters by rhyming it with ‘reader’. There has always been the odd journalist or newsreader who eschewed all three of these options and instead pronounced the two internal vowels of ‘al-Qaeda’ separately. I suspect that this is closest to how it should be pronounced, and that the other three versions are the result of the natural tendency of English speakers to seek to pronounce two adjacent vowels as a single vowel or diphthong.

That English speakers might have trouble pronouncing Arabic names is predictable, but you wouldn’t expect those same speakers to have problems with their own place names. Even Americans know that Gloucester, Leicester and Worcester are not pronounced the way they read, but there are hundreds of English villages that also retain their mediæval spellings but have acquired a modern pronunciation. If you think that you can work out that modern pronunciation from the name, I invite you to try the following quiz, the source for which is a booklet produced by the BBC Advisory Committee on Spoken English in 1936: Broadcast English II: Recommendations to Announcers Regarding the Pronunciation of Some English Place-Names.

It may seem that this exercise is already out of date, but the trend is clearly towards a simplification of pronunciation, and long names will inevitably be shortened, usually by elision of their middle syllables. For example, the village of Faldingworth in Lincolnshire was pronounced as it read in 1936, but by now it may well be most commonly rendered as ‘Falworth’, or something similar. With these caveats in mind, how do you pronounce the following place names?

1. Almondbury (a village in Yorkshire).
2. Barugh (a village in Yorkshire).
3. Brougham (a village in Cumbria).
4. Caldmore (a village in Staffordshire).
5. Cholmondeley (a village in Cheshire).
6. Cholmondeston (a village in Cheshire).
7. Congresbury (a village in Somerset).
8. Happisburgh (a village in Norfolk).
9. Hardenhuish (a village in Wiltshire).
10. Puncknowle (a village in Somerset).
11. Trowse (a village in Norfolk).
12. Wyrardisbury (a village in Buckinghamshire).

To give some indication of how difficult this is, Garboldisham in Norfolk is pronounced ‘Gaarblshəm’ (where ‘ə’ is the indeterminate, unstressed vowel heard in the second syllable of ‘often’), while Maugersbury is Gloucestershire is pronounced ‘Mawzbəry’. There are also regional variants to consider: the initial letter of Gillingham in Kent is pronounced as in ‘Jack and Jill’, while the same letter in the Gillinghams in Dorset and Norfolk is pronounced like the gills of fish; the Houghton in Hampshire is pronounced ‘Hotən’, the Houghton in Lancashire is rendered as ‘Hawtən’, and the Houghton in Norfolk is known locally as ‘Howtən’.

Local pronunciation is clearly the guiding principle of the BBC’s booklet, mainly because it is unlikely that anyone living more than 50 miles away from the place in question will have heard of it. Newcomers to a town or village are unlikely to listen carefully to what the locals call their new home, so they make up their own versions. My own home town (Penrith, in Cumbria) ought to pose no problems with regard to pronunciation, but locals pronounce it with the stress on the first syllable, and newcomers invariably place the stress on the second syllable, especially if they came originally from the south of England.

If you want to know how well you have done, the ‘correct’ pronunciations are listed in a comment below. When I’ve tried similar quizzes in the past, even natives of England struggled to get more than three or four correct, so this quiz is really for amusement rather than a genuine attempt to assess your knowledge.

Sunday, 3 June 2012

a hard question

I shall shortly be embarking on my annual pilgrimage to my home town in the UK, and while I’m there I shall be tuning in to one of my favourite programs, Radio 4’s Round Britain Quiz, which is unlike any quiz show airing on mainstream media. It pits two teams drawn from the UK’s regions against each other, and in the space of 30 minutes only eight questions will be asked.

You might suspect that the questions will be fiendishly difficult, and you would be right. Each question requires that the two people in the team being quizzed piece together a series of convoluted connections. A correct answer is worth six points, and each clue provided by the questionmaster entails a one-point deduction. This method of playing the game is made necessary by the time constraints of the program, but the game is far more interesting, and challenging, if both the time allowed is unlimited and no clues are given.

With this in mind, I offer the following conundrum:
How might a Greek letter, a saddle, a member of the human race and three points in the English Premier League lead you to the Christian name of an American president?
I could have posed one of the many questions used on Round Britain Quiz, but then the answer might be available either online or in one of the books published to accompany the program, so this is one of my own devising. In terms of difficulty, I think that it is on a par with the program’s usual fare, although the answer should be screamingly obvious once it has been explained.

As noted above, I will not provide any additional clues, and because I will be travelling, it may be several days before I am able to respond to any comments. Correct answers will be acknowledged but not published until the end of the month. Please note that in order to gain six points, all components of the question must be identified, but points will be awarded for partial answers.