7#>F=DVDVDVDVDVDdfDDDD D DD|EbxDE EF*FIDVFF FFFIOFFFFFFIntermedia

Intermedia

Peter Gärdenfors

In Cognitive Science and Information Technology, ed. by Y. Wearn, Swedish Science Press, Uppsala, 1995, pp. 68-72. 1. People are amazing machines. We are tremendously effective at processing information. Our senses and our language abundantly provide us with enormous amounts of information that we normally have no problem interpreting. Furthermore, we can simultaneously handle numerous information codes and translate between them without even noticing it. For example, first-graders have no problem with describing the content of a picture in a magazine or drawing a picture that illustrates a story. With little effort, we can convert information from pictures to words and back again. Within this context, one can ask whether it is possible to get computers to do the same thing.

Within modern computer technology, primarily within the area of multimedia, people handle different information codes in parallel. On a computer screen, people can simultaneously be presented with a text document, graphics that complement the text, a film that illustrates the text, and sound that accompanies the film. This technology is used in a modern electronic lexicon. For example, a reference to a bird, in addition to a text and a picture as in traditional reference books, can contain a film sequence which shows the bird in flight together with the call of the bird. In addition to each one of the different media, there are powerful programs that can be used to work with the material.

The construction of the information "highways" contributes to making the multimedia presentations easily accessible. Anyone with a modem can sit down and access multimedia data bases the world over in matter of seconds via World-Wide Web or other such programs.

If one wants to mince words, the term "multimedia," as used in the context of computers, is a bit erroneous. The medium is rather "mono," in the form of a computer screen with a multiple coding of the information that is presented. To the extent that sound is incorporated, the loudspeakers constitute an additional medium. Despite this distinction, the term "multimedia" is well established, and I will not propose that it be changed.

2. A considerable limitation with multimedia is that the programs work with parallel codes that do not co-operate in any way. The different media forms are like Leibniz' monads - their worlds run side-by-side but cannot influence one another.

The multimedia programs provide the user with a great amount of information much like the videos shown on MTV. It is the user that gets to make the relevant connections and structure the information from the different media. This presents a problem because the largest portion of the perception process consists of filtering out irrelevant information rather than absorbing as much information as possible.

It is a myth, which is tied to all the talk about cyberspace and infobahns, that more information is always better. The real information first appears when the user interprets the stream of bits. The human brain is not like a hard disk that passively accepts the flow of information. It is rather the case that we actively seek that which is meaningful.

Interpretation consists of sifting through the incoming flow and making it resonate with earlier experiences and knowledge. The greater the amount of bits that are let through, the more the user needs to struggle to keep up and make sense of a message. Narrow information channels that intelligently have sifted through the stream of bits would be of greater help for the human user than highways where everything indiscriminantly rushes by.

Imagine having programs that could translate between different media or information codes! One example would be a program that could produce an illustration of a story, or a gadget that could read street signs for the blind. I will coin the concept intermedia to refer to programs that can interpret and establish connections between different information codes. In order to create order in the overflow of information that modern technology offers, we need intermedia rather than multimedia.

For certain special applications, intermedia already exists. Programs for synthetic speech can convert a given text to sounds that alarmingly resemble human speech. Swedish research is in the forefront of this area due largely to the pioneering work of Gunnar Fant at the Royal Technical Institute in Stockholm. Numerous large computer companies are working on the opposite problem of converting sound to text, primarily in order to allow computers to accept spoken commands.

As long as one uses the vocabulary that the operating system requires, it works quite well. There are also commercially available programs with the help of which one can dictate a letter or an article and have the text written out directly by the computer.

3. One can say that communication between humans and computers has gone through 3 phases. The first of these is the phase of the magic formula where the user provided cryptic commands in a secret language to the machine. The smallest spelling error was punished by a complete break down in communication. The second phase, which still dominates, is on the level of point-and-say books. One can point to pictures (icons), and by clicking on them, one can get new pages to click on. With the help of speech recognition, the third phase will be one of military commands. By giving the computer short bursts of commands, it will be able to carry out some of the operations of which it is capable. At this point, however, the computer is completely insensitive - a flattering tone of voice can in no way change what happens.

Commands are really not a very advanced form of communication. A future fourth phase should be based on human dialogue. A true dialogue assumes that the person (or thing) being addressed can interject comments, objections and further questions. Within the computer world such abilities are, as yet, missing, even though there are some computer games that make a primitive attempt at it.

We can look forward to word processing programs that offer smart, stylistic and content related advice about the text that one is writing. But we will not get anything that can match the time trusted secretary until the computer understands the written text.

Understanding a text involves the ability to make the connection between it and what the words mean. One of the assumptions in so called cognitive semantics, is that the meaning of words is closely connected to the information that we get through perception and memory. Additionally, this information can be represented in the form of image schemas.

The meaning of a word thereby consists of a sort of code that is related to the image code. The understanding of a text is achieved by combining the image schemas produced by the words in the text and then by putting them together in an internal scene. What happens here is a performance, in the theatrical sense, of what the text is about.

4. A so-called OCR system which can read a page of a book and convert it to computer coded text (for example ASCII code) is also a form of intermedia. The page is fed into the computer as an image and comes out as computer text, which can then be corrected and edited in a word processing program. Another variation is Apple's Newton, which after some training can interpret hand written text with a varying degree of success.

Although these programs run quite well converting images to words, they are limited in terms of the narrow domain of the world of letters. There are almost no existing methods for interpreting more general images that give text as the output. However, I would like to mention a couple of interesting attempts by Bengt Sigurd at the Department of Linguistics in Lund. Already in the early 80's, together with Jan Fornell, he developed a program that was capable of commenting on what was happening in a microworld where a couple of figures moved in relation to a doorway represented on the computer screen. The program was hooked up to a speech synthesizer which meant that the comments were produced in spoken form. It could say, for example, "Eva is near the doorway, but Adam is not."

One type of problem with the development of the program was to precisely describe what words like "near" really meant. Another problem was determining what should be said by the program. In principle, the program could generate any number of sentences. But if old information kept being repeated, the output became rather tiresome even if the information was given in a different spoken form.

More recently, Bengt Sigurd and his colleagues have been working on a program that can produce comments about weather maps. A map of Sweden with the usual meteorological symbols for rain, low pressure systems, etc. is fed into the computer and the program generates a spoken description of the weather. One can choose between different reporting styles from a telegraph-like style to more conversational style. To a certain extent, the program can be used to carry out the procedure in reverse, i.e., give the program a spoken weather report and have it produce a weather map.

5. A modern method for building models of cognitive processes is to employ neural networks. One of the strengths of such networks is that they have a more flexible learning ability than traditional models. At the University of California at Berkeley, artificial neural networks have been created that take moving images as their input - the network learns to associate the input to the spoken expressions that describe how the objects in the scene are spatially related to one another.

At the Department of Cognitive Science in Lund, Lars Kopp has developed a related system in which the simulated eye movements of a hypothetic observer of the scene are used to generate the spoken descriptions. The function of this program is quite similar to Sigurds program, but the method of using artificial neural networks provides a better basis for learning and greater generality.

The programs described here are some of the existing examples of intermedia. What remains to be developed above all are programs for a more general ability that can translate between images and words. Obviously one cannot literally "translate" an image into words - there are always aspects of an image that cannot be captured in words. This is also true in reverse: for a given text, it is not simply the case that one image can be created that has the same content as the words. We humans, however, can easily translate information from one code to the other. But we hardly have any knowledge about how we do it, in any case not on a level that can be exploited for developing intermedia programs. The fundamental problem is that we do not particularly know that much about how images are treated in our heads.

Speech synthesis and speech recognition tend to work well because there is a theory of phonetics that provides the programmer with suitable variables for sound analysis. Without fundamental knowledge about the mechanisms that underlie how people produce and interpret spoken language, there would be no possibility of creating computer programs that solve the relevant tasks. In order to reach a degree of success concerning the general problem of translating between images and words, a related theory is required for how people interpret images and store them in their heads.

There are some rough drafts of such a theory within cognitive science. But there is still quite a way to go before we can talk about what people actually do when they understand images if this is to be implemented in a computer program.

Take an apparently simple problem like placing a name with a face. How does one describe for the computer the relevant features that allow one to recognize a face, even after several decades? In solving this problem, the experience of artists and passport control personnel may be of greater importance than the current methods used by image processing engineers.

Consequently, intermedia requires fundamentally basic research regarding human cognitive processes. Modern information technology has been overdominated by methods for transferring information between computers and other machines. The most interesting direction in the transfer of information is, nevertheless, the link between humans and machines.

6. For the icing on the cake, research regarding intermedia can provide us with further aid for handicapped individuals. The deaf and blind are each lacking a respective medium. Through intermedia, these sensory deficits can hopefully be remedied. If we achieve better programs for speech recognition for example, deafness can be compensated for. But it is rather doubtful that the written word is the best way of representing speech to the deaf. Text loses much in the way of emphasis, rhythm and breaks that make spoken language a lot richer than written language. In addition to translating spoken sound, it may also be beneficial to translate words into colors and forms and to present them to the deaf as well.

Currently, the blind have some tools to aid them in converting written text to a tactile form which can be read-off with the tips of their fingers. Using this method, they can read newspapers or instruction manuals. The problem of converting images into tactile form, however, remains. In this context, Swedish research is again in the forefront. Gunnar Jansson at the Department of Psychology in Uppsala is working on methods that will enable the blind to "read" maps. Technologically there is no problem with converting the image from, for example, a video camera to a surface that can be felt with one's finger tips. The problem is rather that the camera image lacks a "form" that can easily be interpreted by the blind. Generally, the image contains too much information and must be filtered in order to highlight the relevant features.

This process is quite similarly related to the one concerning image comprehension discussed above. We have thus returned to the basic question regarding how humans understand and remember information about objects and spatial structure.

7. In various forms, intermedia is well on its way and will lead to a revolution within many areas. And one central requirement leading to the development of new kinds of intermedia is more knowledge about how humans process information. Above all, we must focus on research regarding how we interpret images and convert them into language.

Swedish research is well prepared to pursue the development of intermedia, and internationally it could take a leading role in the area. But this would demand, however, a form of co-operation that is not very common in this country. The IT-commission, the research councils, and various other authorities like NUTEK and The Swedish Work Environment Fund ought to support this type of research at the universities.

For once, research concerning human abilities can provide us with the possibility of quick technological applications. Put together people from the humanities with people from psychology and the technological areas so that they will be forced to break down the traditional walls that have separated them in the past! The human faculties co-operate without significant problems, and the same should apply to the academic faculties as well.


To Staff List

To Home Page


 >>/01s}]g  #  pw26!#%%')C,t./12|25O899:<>F>G>L>M>>>>> !  !  ! ! ! ! ! !  ! ! ! :  = (C2=L=1Q> >! U   #*2*+++++++++++++++++++++++/7?HH +6G{HH d'@A.|R@H-:LaserWriter 8 Geneva===(({OIntermedia.htmlDatalogi & Num. Analys (DNA)Datalogi & Num. Analys (DNA)