retour au sommaire
Why did you create Cortical.Io, what were your motivations?
As part of my research work, I was handling data and statistics. Things changed over time and I found I was struggling to understand what the doctors were saying about patients. I wanted to understand how we could go about extracting the exact meaning from the texts they wrote. Everything is clear with numbers, but far more unclear when it comes to texts. I therefore started to specialize in text analysis. When I was developing search engine technologies, I worked with numerous large companies to create applications, notably in the field of patents. I soon realized that this sector had very little access to high-performance technologies and that there was very little progress in developing technologies to treat complex information such as technological and legal languages.
In 2005, I therefore decided to start a new company specifically in this field. We had been working with research corporations and universities to study natural language processing in great detail, but the approach they had was too far removed from what people were actually doing. We consequently decided to sell Matrixware Information Services to a US partner.
I began looking for an alternative and statistics-free approach to Natural Language Processing (NLP). My conclusions were that the only system that has proper Natural Language Processing capabilities is the human brain. Our approach therefore needed to be as similar as possible to the human brain. That was how Cortical.io came about.
Where did your inspiration come from? Did any work in particular inspire you?
My main inspiration is, of course, how the brain processes information. I found key inspiration in this field in 2005 with Jeff Hawkins’ theory about how the neocortex works, and began my own experimental work in 2010 to see whether his theoretical framework could be applied to language. Luckily, we were able to get a research grant from the Austrian Research Promotion Agency - FFG Science in Austria. This grant enabled us to hire a team of computer scientists in order to develop a prototype. Half way through the experimentation phase, the results were so astonishing that we had to accelerate the process and expose our findings to the market to test it. We found an angel investor in 2012 and started out on the journey of being a start-up in Natural Language Processing.
How does The Retina work? What is a fingerprint, exactly?
We have developed a new machine learning approach inspired by the latest findings on the way the brain processes information. Our approach uses similarities as a foundation for intelligence. By mimicking the understanding process of the brain, we benefit from millions of years of evolutionary engineering to help us solve today’s hottest NLP challenges.
Our system learns in a similar way to the way the brain works when we perceive words. When a child learns a new word, he stores every possible utterance on a mental map where similar meanings are organized in close proximity. For example, the different meanings associated with the word ‘organ’ (music, church, liver...) are each stored in specific places in this mental map. Depending on the context, our brain automatically associates the word ‘organ’ to the correct semantic cluster – this is what we try to reproduce with our Retina Engine.
Cortical.io’s Retina Engine learns about a specific language by processing relevant text content via unsupervised learning. It converts words into semantic fingerprints, a data format similar to the one used by the brain that captures the meaning behind natural language. The Retina can generate semantic fingerprints for different language elements such as words, sentences and entire documents.
A semantic fingerprint is structured like a map where you can visualize the overlap between words or sentences. This type of data representation makes it very easy to compare any two words, sentences or even whole documents, because you can calculate their similarity and measure their semantic overlap.
A concrete example would be a wildlife journalist who is looking for information on the Web about jaguars. No matter how many newsfeeds and keywords he has selected, he will always receive tons of non-relevant information about Jaguar, the car, rather than solely information about the animal. This is because computers only look for keywords in an article, without understanding the meaning behind them. In this example, Cortical.io’s Retina engine enables an intelligent news filter to be created based on semantic fingerprints. First of all, the journalist’s interests are captured in a filter fingerprint. Then, every article is converted into a semantic fingerprint. Finally, the Retina compares each article’s fingerprint with the filter fingerprint, and the system forwards only the articles that are extremely similar to the journalist’s filter fingerprint.
What is very interesting is that, with our approach, semantic spaces are stable across languages: the semantic fingerprints of the word ‘philosophy’ in Chinese, French, English or German (for example) all look very similar. This means that the Retina Engine makes it possible to directly compare documents that have been written in different languages, or, for example, to search a database of Chinese documents using English words.
What are the different possible uses of Semantic Folding?
Semantic Folding can have various applications and be used in different fields, including:
Social media: the semantic fingerprint of the description of a person can be compared with another profile, for instance. Semantic Folding can also be used for social content streaming or to detect abnormal behaviour in social media.
Banking and compliance monitoring: Banks have to monitor their communications for things such as inside trading. Previously, no satisfactory email checking solutions existed, because metaphors could be used in emails to cover potential fraud. Converting the messages into fingerprints has allowed banks to detect more frauds and to reduce false positives.
Content Management: when creating a website, its Google ranking will depend on the quality of the information. Your content has to be interesting to be ranked among the top websites. With the Retina Engine, you can create the fingerprints of the 20 top Google-ranked pages in your particular category. By overlapping those fingerprints with the fingerprint of your own content, you can see which parts of your text you should adjust to improve your website’s visibility.
Author detection: say you want to automatically associate new publications with the corresponding author. With the Retina Engine, you simply need to create a fingerprint of one of the author’s most typical articles and then compare the overlap of any new publication with that reference fingerprint to associate it with the corresponding author. This is particularly useful for digital publishers.
There are many examples of other uses, too, ranging from the analysis of TV captions to determine the best topics for shows to terrorism prevention and reputation management…
How do things currently stand as regards artificial intelligence? Do you think businesses are ready for it?
Austria is definitely not ready for this kind of technology, but Silicon Valley and New York have already thoroughly embraced it. Generally speaking, lots of customers say they have been trying Machine Learning solutions but that nothing suits their business activity and that it isn’t really working. They say that big software providers could not handle their requirements. As a matter of fact, most businesses are not aware of the solutions that are out there; everybody is talking about Machine Learning, but not about Natural Language Processing. There is a big gap between academic research and what is actually happening in companies, even for big players. Companies are still gathering Tera octets of data without having any idea of what to do with it.
Our advantage is that Cortical.io is not a black box, and people understand our approach quite easily.
What do you think the future holds for Artificial Intelligence?
In my opinion, AI is not going to take over the world. Theoretically this could happen, but considering the path of our history, I think it is highly unlikely.
The brain is a specific size and the size of the near cortex is limited, so I believe it will all be about adding intelligence to it. I believe we will continuously create larger patches and extend our individual cortex. We will be able to add a cortex if we want to detect messages in Japanese, for instance. People will extend their personality with a kind of added exoskeleton.
I don’t feel threatened by autonomous Artificial Intelligence, but more by people who could extend their capabilities and the ethical authority of the person wearing the extension.
However, high-level autonomous AIs are not the first things we are going to see, we will begin by empowering individuals.
Francisco De Sousa Webber first took an interest in Information Technology as a medical student specializing in genetics and serology at the University of Vienna. He participated in various research projects at the Vienna Serological Institute and was heavily involved in medical data processing. He was also involved in numerous projects, including establishing and organizing Austria’s dialysis register database and creating a patient documentation system for the university clinic.
In the mid-1990s, he worked alongside Konrad Becker to found Vienna’s Institute for New Culture Technologies and Public Netbase - Austria’s only free public-access Internet server at the time - thus establishing an international competency platform for the critical use of information and communication technologies.
In 2005, Francisco founded Matrixware Information Services, a company that developed the first standardized database of patents under the name of Alexandria, where he acted as a CEO. He also initiated the foundation of the Information Retrieval Facility, a not-for-profit research institute, with the goal of reducing the gap between science and industry.
He currently heads up Cortical.io, a start-up he co-founded in 2011 that develops and commercializes Natural Language Processing (NLP) solutions based on Semantic Folding, a theory that offers a fundamentally new approach to handling Big Text data.