Opinion: African ‘clicks’ outwit artificial intelligence

Taking African languages into the digital and the fourth industrial ages is our responsibility. We cannot just import technology, such as speech recognition machines, but we should adjust them to our particular environments, writes Professor Tshilidzi Marwala.

The Vice-Chancellor and Principal of the University of Johannesburg (UJ) as well as the author of the book Artificial Intelligence for Rational Decision Making, Prof Marwala recently penned an opinion piece, ‘African ‘clicks’ outwit artificial intelligence’, published by the Sunday Independent, 13 May 2018.

African ‘clicks’ outwit artificial intelligence

IsiXhosa is an interesting language that has over 9 million speakers. It is a language often associated with clicks. Our famous musician, the late Mama Africa, Miriam Makeba, made isiXhosa famous by introducing the Click Song, also called Qongqothwane to the world.

Despite the stereotype, isiXhosa is not a clicking language but a Bantu language. (Bantu language is a linguistic classification of a group of languages and should not be confused with the apartheid definition of the word).

Joseph Greenberg, the US linguist classified African languages into four stocks, one of which is the Bantu language spoken from Tanzania to South Africa.

Bantu languages belong to the Niger-Congo group of languages. Those of us who speak a Bantu language and for me Venda, isiXhosa is completely comprehensible.

Malume in Venda is malume in isiXhosa meaning “uncle”, makazi in isiXhosa is makhadzi in Venda meaning “aunt”, while iza apha in isiXhosa is ida hafha in Venda meaning “come here”.

Even though isiXhosa is not that difficult for Bantu language speakers, it is very difficult for artificial intelligence machines.

Artificial intelligence (AI) is taking over the world. President Vladimir Putin of Russia has stated that AI is the new arms race. There are many types of AI and one of these is machine learning. One approach to machine learning is neural network.

The neural network is inspired by how the brain works. A large neural network is deep learning.

Deep learning is able to do complex tasks such as taking spoken words and translating them into another language. Deep learning can take words spoken in Chinese and translate them into English and vice versa and it can do this well. IsiXhosa is difficult for these machines to understand.

For those of us who have been working in AI for over 20 years, this is an interesting challenge that needs to be tackled.

Why is isiXhosa such a difficult language for machines? One needs to understand the classification of the Xhosa language.

Xhosas are a mixture of a Bantu tribe and generally called Nguni and the Khoisan. The Khoisan are the people who spoke the clicking language that uses the four click Khoisan system.

A study conducted by the Human Science Research Council (HSRC), found that in terms of genealogy, the Xhosas have the highest Khoisan genes than any other African ethnic group in South Africa. It turns out, that the Xhosa language has the highest incidence of clicks among African languages in South Africa.

This correlation between Xhosas and Khoisan means that among Africans in South Africa, the cross-pollination between the first nations, the Khoisan, was the highest with Xhosas linguistically and genetically, than any other ethnic group.

Despite this maximum interaction between the Khoisan and the Xhosas, the Xhosa language is not generally a clicking language. When AI translates spoken words from one language to another, it takes the spoken words, which are in the form of signals and decodes them.

Therefore, the EFF Commander in Chief, Julius Malema was right at Mama Winnie Madikizela-Mandela’s funeral, as all spoken words are signals!

These signals need to be deconstructed so that the machine can understand them.

The French mathematician Joseph Fourier in 1822 was the first person to come up with a method of understanding signals. The University of Johannesburg has a course called “Signals and Systems” that does the trick.

Fourier understood that all signals represent a combination of cycles and these in maths language are called sinusoidal functions.

Karl Marx also observed the idea of a signal being represented as a combination of cycles in his critical work Capital. The word that spoken in isiXhosa is deconstructed using Fourier’s method so that it can be broken down into cycles that the AI machine can be able to understand.

To convert the broken down words into cycles, the signals are put through a window, which makes sure that out-of-the-ordinary characteristics are eliminated. This is where the difficulty of the Xhosa language is encountered!

Actually, Xhosa is only 15% clicks and 85% Bantu language. Therefore, the window technique thinks the clicks are not part of the language but background noise and thus eliminates them.

Nevertheless, isiXhosa is an important language locked out of the fourth industrial revolution. We ought to develop a new window that will not treat these clicks as noise but as an integral part of the language.

The other way to handle this situation is to discover the new version of Fourier’s method, which can take the signals directly and not disregard clicks as noise.

Another technique is to discover new types of AI machines that take the spoken words raw without any pre-processing and not treat these as noise. We will have to discover new forms of algorithms that are decolonised and that take into account the uniqueness of our languages such as in isiXhosa.

What are some of the ideas that need to be explored? In psychology, there is a problem called the cocktail party problem – first described by Colin Cherry in 1953.

Cherry observed that when one is in the middle of a noisy room one is able to hear words from a person she is talking to. Therefore, the human ear is able to filter out the noise.

With the Xhosa language one should turn the cocktail problem on its head and not filter out the clicks, which are conventionally deemed as noise by AI machines, but be able to take them into account so that the artificial intelligence machine can hear them, understand them and transmit them.

A great deal of work has been done to make artificial intelligence machines understand the identity of the African people.

One example, at the University of Johannesburg (UJ) is that of Gugulethu Mabuza-Hocquet who completed her doctorate on designing algorithms that are able to understand the fact that the difference between the pupil and iris of the eye is sharper among people of European descent than among people of African descent.

These algorithms, therefore, allow biometric systems based on the iris of the eye not to implicitly discriminate Africans in favour of Europeans.

The next step should be to develop better algorithms that understand the Xhosa language. Taking our languages into the digital and the fourth industrial ages is our responsibility.

We cannot just import technology, such as speech recognition machines, but we should adjust them to our particular environments.

If adaptation is not an option, we should discover our own versions of the Fourier theory. This will require our funding agencies, such as the National Research Foundation, to sponsor projects that are rich in local content rather than us solving the problems of other nations and thus subsidising them.

This would require a new sense of confidence and a realisation that the African market is big enough to define its own technological problems and solutions. Any other way will reinforce colonial economic, political, social and technological systems.

The views expressed in this article are that of the author/s and do not necessarily reflect that of the University of Johannesburg.

Share this