retour au sommaire
Why did you create Datafloq?
The observation I made was that there was little understanding of Big Data and what it can bring to companies, and that lots of organizations were still finding it difficult to get the most out of their Big Data strategies. That was why 4 years ago I decided to set up Datafloq. It was initially called BigData-Startups.com, but we rebranded the firm 2 years ago. The main idea is to connect stakeholders and actors within the Big Data ecosystem, thereby helping them face the global Big Data demand. The objective is to educate the market to drive innovation and economic growth. There are over 150 bloggers creating high-quality content on Big Data trends, privacy, the Internet of Things, technology and offering organizational advice. 5,000 vendors and suppliers are listed and referenced on the website, along with their contact details and locations. The aim is to make them easier to find and to connect with. A recruitment platform has been launched, too. 400 job offers have already been published, and there are many more to come. Last but not least, the platform also advertises key events in the Big Data field. Our aim is to attract all the industry stakeholders and bring them together.
What is the future of data science and Big Data? Smart Data, regardless of its size?
That is a very broad question. To me, the term ‘Big Data’ is largely misunderstood. Lots of people think it means they will need a lot of data, resulting in lots of companies asking themselves if Big Data is really for them, as they do not always have large volumes of data.
That’s why I like to call it Mixed Data, which indicates that it is about combining different data sources, regardless of size. We generate data every day, external and internal data, structured and unstructured data, public data, Twitter data... Combining data sources from a variety of angles is the real challenge and offers the most insights. Companies therefore need to arm themselves with better algorithms to combine this data and derive action from it. Gartner called it the algorithmic business. Algorithms are now driving insights from the massive amounts and varying sources of data being generated.
What are the boundaries between Big Data, machine learning and deep learning?
I think almost all Big Data vendors are using machine learning, and I am seeing more and more algorithms appearing. Companies can even buy algorithms. Algorithmia, for instance, is a marketplace that already exists where you can buy algorithms.
But artificial intelligence is still basic. We saw Google’s AI win against Go master Lee Se-dol, but in a very specific situation it had been trained for. We still don’t have a more generic AI version that could be used in a variety of situations. There is still a long way to go, but progress is being made quite rapidly as we have ever-better computers and processing combined with smarter algorithms.
WHAT IS ALGORITHMIA?
This startup founded in 2015 by Diego M. Oppenheimer and Kenny Daniel, gives developers the ability to turn algorithms into scalable web services with a single click. Application developers can integrate the algorithm into their own applications with under 10 lines of code. Algorithmia hosts the web services, makes them discoverable and enables algorithm developers to get paid for usage.
Algorithm developers can host their work on the site and charge a fee per-use to developers who integrate the algorithm into their own work. The platform encourages further additions to its library through a bounty system, which lets users request algorithms that researchers familiar with the field can contribute from their work or develop from scratch for a fee.
More than 800 algorithms are already available on the marketplace, providing the smarts needed to do various tasks in the fields of machine learning, audio and visual processing, and even computer vision.
Are we heading towards an increasing amount of automation in Data Science and statistical modelling?
Algorithms will eventually take over many of our jobs - multiple studies indicate that in a few decades’ time, 50% of jobs will disappear. But, of course, humans will always be needed to build the algorithm and the IT infrastructure. Machine learning needs human interaction.
How do you see the future role of Big Data Scientists or CIOs, for instance?
The role of executives will change, because data is becoming more and more important and companies need to become data-driven organizations. The Chief Data Officer will most definitely have to be present in the boardroom alongside the Chief Security Officer and the Chief Information Officer.
There is no doubt whatsoever that data governance and data creation must be dealt with in the boardroom, something we can already see happening in the Fortune 500 companies.
What conditions still need to be met for a successful Big Data Strategy?
I think what most companies are still missing is a real data-driven company culture. It is a problem related to people. You can have the latest insights, but without the people to drive it into the business, it is inefficient. Companies need to introduce cultural change management to move to a data-driven and data-centric organization. All employees should have a good understanding of what Big Data is. I am a big supporter of Big Data and coding in the classroom - teaching future generations about Big Data is absolutely vital. The subject should already be on the curriculum in primary schools - it is just as important as mathematics or learning another language. Estonia, for instance, has already introduced programs to teach its pupils IT development skills.
Data governance is another aspect that organizations don’t focus on enough. They should do whatever it takes to make their data secure, because if they fail, they will be hacked and their data will be breached. This can lead to multiple bankruptcies. Four ethical guidelines should be followed:
Transparency: you have to communicate in a highly transparent fashion;
Simplicity: everyone should be able to understand what is being done with their data, both now and in the future;
Privacy: at all levels – everyone should be building trust through transparency;
Security: every organization will be hacked. If you are not, it means you are not that important.
Ethics is truly important; organizations have to treat their data as they would like to be treated themselves. Duckduckgo.com is a search engine that does not store any data about you - it is the opposite of Google. However, organizations need to be aware that they should use data correctly or customers will simply switch to their competitors.
What would you say are the best examples of successful Big Data Strategies or the most forward-looking companies?
WalMart is among the best examples. They were already doing Big Data when most of us were not even considering doing analytics. They collect 40 petabytes of data each day, combining different Big Data approaches to offer the right customer the right price at the right time and via the right channel.
The health sector is very interesting as well. With electronic health records, England is moving towards harvesting all its medical and health data. There may be privacy issues, but it certainly brings huge opportunities. The Aurora Health Care centre is another successful use case. They have just completed Smart Chart, a $200 million record system that has accumulated all the data collected in the past 10 years into a single data warehouse. Data collected from 1.2 million customers, 15 hospitals, 185 clinics, more than 80 community pharmacies, over 30,000 employees including over 6,300 registered nurses, and nearly 1,500 employed physicians. The not-for-profit Aurora Health Care system has decided to put that wealth of data to good use in order to improve decision-making and make the organization more information-centric. Using electronic and medical data, doctors and DNA data, they have generated a bigger picture of the patient to be able to recommend the right treatment. As a result, admission rates have dropped.
What about the landscape of providers?
Seeing Big Data services offered in the cloud is nothing new. Over the past few years, we have seen many Big Data vendors create Big Data solutions that can be accessed via the web to crunch and analyse your data. More recently, however, we have witnessed the rise of a new type of offering: Big Data-as-a-Service solutions. These solutions differ from Software-as-a-Service solutions or Infrastructure-as-a-Service solutions, as they are more or less a combination of the two. This results in a complete package for companies keen to start working with Big Data. Big Data-as-a-Service basically brings together data analytics services for analysing large data sets over the web, while also hosting all that data on scalable cloud hosting services, such as Amazon Web Services. It is therefore a complete Big Data solution, accessible over the web, which doesn’t require an in-house solution or a lot of Big Data expertise, thereby enabling small organizations to also benefit from Big Data. We will see more and more BDaaS that enable small companies to plug and play. Bigger companies will require more personalization, but it can help them start proof of concept without needing to invest too much money.
APIs and Applications are another trend in the Big Data solutions landscape. An Application Program Interface is a set of routines, protocols and tools for building software and applications. An API specifies how software components should interact and is used when programming graphical user interface components. A good API makes it easier to develop a program by providing all the building blocks. A programmer then puts the blocks together. APIs will become more and more important, especially for data sets you don’t want to own but need to use. The problem is that some major APIs like Twitter are trying to restrict access to their APIs.
Finally, Data Visualization will start taking up more space in the landscape, as it allows us to understand what the data is actually telling us. Augmented and Virtual visualization will give the data a whole new meaning. It will immerse us in the data, for instance with a 360° screen to play around with the data.
Mark van Rijmenam is the founder of Datafloq, the one-stop source for Big Data information and a platform connecting stakeholders within the global Big Data market. He is an entrepreneur, a Big Data strategist and a highly sought-after keynote speaker. He is author of the best-selling book Think Bigger - Developing a Successful Big Data Strategy for Your Business, and has been named a global top 10 Big Data influencer. He is currently a PhD candidate studying Big Data and Strategic Innovation at the University of Technology, Sydney.