Report of the IREP Workshop on Big Data Big Analytics
Paris, May 30th 2013
March 2011, McKinsey published a report entitled “Big data: The new frontier for innovation, competition, and productivity”. Since then, not a day goes by where we do not see on the web, an article, an announcement, or the release of a book on Big Data. All industries and businesses are looking at the phenomenon. Big Data has become the Holy Grail of technology that will change everything, turn everything upside down. New job (Data Scientist), new training (Data Science) are announced in the press, even the in the most established newspapers.
It’s time to demystify the phenomenon and separate realities from promises! It is in this context that the members of IREP started to ask questions about the impact of Big Data on the upstream and downstream activities of market research and decided to organize a dedicated workshop on May 30th 2013. The main objective of the workshop was not to redefine the movement or the underlying technologies of Big Data. Conferences, exhibitions and general seminars are regularly organized on this theme. The goal was to respond to questions posed by industry players in the advertising, marketing, and media research. Questions such as: Is Big Data just yet another hype of software companies as was the ERP, the CRM, and the e-business to name a few? What differentiates Big Data from mere Data that we use for our research and survey projects? What is the impact on our business, our knowledge, our processes, our methods, and IT systems?
To answer these questions and many others, IREP has invited leading French experts. Most have already been faced with Big Data well before the name Big Data became one of the subjects of the most sought after, most indexed, and most documented since the advent of Web 2.0 and social networks.
This document is a report of the key points of each of the presentations made by the speakers and a summary of lessons learned from the workshop, including the points the participants rose during the Q&A sessions.
Philip Tassi, Executive Vice President of Médiamétrie and a member of the IREP Scientific Council, made a brief, yet rich history of the concept of Data. In the 19th century, exhaustiveness ruled the world of data where government organizations used to collect all data one could collect on all individuals. The results came years later. It was the age of manual, paper and pencil. But it was already the Big Data at that time. The difference compared to Big Data in the 21st century is the frequency of data collection, the speed of data processing, and the sophistication of data visualization. In the 20th century came the reign of sampling, a true innovation at the time. With only a portion of the population, we could get the results of the general population and in record time compared to the exhaustive approach. Isn’t Big Data just the reborn of the exhaustive approach? Which of these two approaches should be chosen in future studies? For Philip Tassi, one complements the other and vice versa and therefore he proposes to combine both each time when it possible. To add facts and evidence, he cited two cases made by Médiamétrie for measuring the Web traffic on fixed and mobile Internet. Médiamétrie used a Big Data approach centered on sites and completed it by a Panel approach centered on users. The results were much better.
Arnaud Laroche, President and CEO of Bluestone and Board Member of ENSAI presented the underlying technologies of Big Data initially developed by Google and Yahoo and since adopted by other leading websites such as Facebook, LinkedIn, and Twitter. The initial idea of Google and Yahoo was to parallelize the data storage and processing of thousands of unbranded servers. Branded servers at the time were too expensive to achieve this high level of parallelism for both storage and processing. For Arnaud Laroche, Big Data does not create new mathematical or statistical models for analyzing data. Big Data is increasing capacity to collect, store, and process data. And because of this, it is the ultimate application scope for sophisticated data mining algorithms and machine learning that previously did not have a critical mass of data to give their full potential.
While Philip Tassi addressed the question about the changing nature of Data, Gilles Santini, President and Director of Hippo IREP, meanwhile, addressed the question of whether we need to change, extend or abandon our traditional models of probabilistic analysis. Do we use, extend or abandon the Benzecri’s multivariate analysis of matches, Tukey’s exploratory data analysis, Gauss’s linear regression, Fienberg’s response patterns, and Hartigan’s classification methods and time-series analysis? In short, do we need to change, extend, or abandon the mathematical and statistical foundation built for centuries? Like Philip Tassi, Gilles Santini also proposes to combine the two approaches mentioned above. For Gilles Santini there is the modeling level that seeks to provide meaningful relationships in a group of individuals and the “probabilization” level that seeks to provide each individual a probability of action. Big Data can be a tremendous asset for modeling the group for the purpose of targeting (which is the group of individuals who buy BMW cars in general?). Probabilization uses the results of Big Data modeling for finer targeting (in the previous group, which ones are the most likely buyers?). Gilles Santini concluded his presentation with a warning. We have the technical means to deal with massive data in every way but we must remember common sense: Do all the pieces of the collected data have a meaning? For Gilles Santini, always trying to make sense of the available data is the real business challenge for Big Data to gain acceptance. Otherwise, we run the risk of what IT professionals call Garbage In-Garbage Out.
For Gilbert and Grenie Zouhir Guedri, respectively Partner and Director at PWC, Big Data movement comes with challenges of a different order than technical or business orders. They are regulatory, legal and security orders. They cite the reporting requirements to CNIL on personal data and its use, the obligation to give people access to their data to correct or eliminate, the prohibition in principle to transfer personal data on people of the European Union outside Europe, etc. Add to these constraints intellectual property and copyright protection. All of this will make it difficult to use, distribute, and/or monetize data collected from social networks or information sharing platforms. Unlike what we read on the Web in general and in technical and scientific publications in particular, the presentation of Gilbert and Grenie Zouhir enlarges the definition of a Big Data project to encapsulate these and other legislative aspects that can slow the deployment or increase the cost of the original project. From now on, we must consider any Big Data project, especially those focused on consumers, as a mix of a business, a technical and a legal project.
Jean-Charles Cointot, Telecom Industry Leader at IBM France, presented the results of a survey from 1100 IT and business professionals worldwide. The results of this study showed 28% of respondents have implemented at least one Big Data pilot project, 47% plan at least one pilot project and 24% have not yet started. As indicated in these figures, the opportunities and challenges are still ahead. At the business level, we are still in the experimental phase. This is common. As with any major innovation, there is first the vision, then comes the experimentation that if successful becomes a deployment to later become a standard in the sense that it is no longer defined as an innovation. Other results of the study show the importance of expected benefits. As for the ERP, the CRM and the Web, it is the efficiency and effectiveness of customer relationship that is most sought by companies (49% of project objectives Big Data). Another important point of the presentation by Jean-Charles Cointot is the use of the underlying Big Data technologies on data already in house. A new Big Data project does not automatically mean new data or online data but to use new ways to get more from the old, already available data.
For Bruno Walther, Co-founder of Captain Dash, Big Data is both a revolution and an evolution. It is an exponential expansion of the variety and complexity of the concept of Data, now composed of text, numbers, charts, sounds, images, and videos from various sources such as the Web, RFID chips, databases, etc. But it is also a natural evolution of data processing. It is just more sophisticated. Big Data is the logical continuation of the Web with its catalog of products, price comparisons and logs of visits, which is itself a continuation of the CRM with its segmentation of customers, contacts and promotional offers, itself being following the ERP with its receipt tickets, purchase records, and payment records. In agreement with the speech of Gilles Santini, for Bruno Walther the goal of Big Data is not to understand Why but How relationships. With Big Data, it is not the causality that matters; it is the correlation that matters. It is the reign of Business Rules technology, heuristics and machine learning, called more generally Artificial Intelligence. Like Gilbert and Grenie Zouhir Guedri above, Bruno Walther states that the current rules that provided private protection consent, opt-out, and anonymity are no more fully guaranteed. If you use all the Google services on all devices, Google knows everything about you! To gain acceptance by the end consumers, Big Data projects must evaluate the reuse of collected data and the impact of reuse on the consumers. The latter have to find their benefit in delivering part or their intimacy to advertisers and their suppliers.
Michel Bellanger, Head of Marketing Carrefour Médias, presented the history of Big Data at Carrefour. Big Data started since the opening of the first hypermarket in 1963, 50 years ago. Today, Carrefour has 13.7 million loyalty cardholders representing 76% of sales. Each purchase is logged 24 months. The log details extend to the reference product. The example of France’s Carrefour is proof that Big Data is not born, as is often thought, two years ago. There is also evidence that Big Data is not just through the Web. If you total all transactions on all physical outlets of Carrefour (2108 shopping centers), you will arrive at a gigantic figure of 957 million transactions per year, or more than 2.8 million transactions per day. The example also shows that Carrefour Big Data is not based on Hadoop and its derived technologies. Instead, Carrefour uses technologies from the ’1990s to collect, process and analyze the massive data through physical outlets mentioned above plus 5 Web sites totaling 6 million unique visitors per month.
Lisa Labatut, Head Traffic Generation at Bouygues Telecom, and Alain Levy, Chairman of Weborama, presented the results of a study they did using data collected from online users to increase the conversion rate of ad appearance on the Web into visits to sales points. 27 million users were scored against Bouygues Telecom’s offering portfolio. Compared to CRM data only, the conversion rate has been multiplied by 3.1. The use of Big Data can significantly increase the performance of an advertising campaign by bringing it more precision and more details. Here, Big Data collected by Weborama enriched the CRM of Bouygues Telecom. Further proof that Big Data (external data) does not replace but complements the Data already available in business (internal data).
Presentations by different stakeholders show that at present, the concept of Big Data is not formally defined. Each speaker gave his or her definition according to his or her activity: a research firm, a software company, a consultancy firm, and an FMCG distributor. But there is a common finding in all presentations: Big Data is not new.
Big Data is a revival of the exhaustive approach that was the norm before the advent of Sampling that occurred in the late 19th century – early the 20th century. New IT with its capacity for collecting, processing and analyzing masses of data has made economically feasible the Big Data approach with a lower cost and a shorter time. Big Data is not born either with the Web. It is already present and has been used for decades by consumer goods distribution channels, airlines, car renting, and hotel companies, telecom operators, banks and insurance companies, payment card networks, etc.
Big Data will not replace sampling and panels. Indeed, Big Data can collect precisely the behavior and actions of consumers, but not their demographics such as age, gender, socio-professional level, income, etc. Big Data is centered on devices (PCs, set-top-boxes, laptops, smartphones, tablets…) while the panels are centered on people (men, women, baby boomers, teenagers…) For a complete analysis, richer information, we need to combine the two approaches: Big Data for behavioral data, Panels for intentional data. This leads us to wonder if after all Big Data is not the digitizing of all our search, selection, comparison, and purchase of products and services available both in physical outlets and on the Web.
Big Data is not a new technology although it started with Google and Yahoo with their developments of Hadoop, MapReduce, Big Table, etc. The technologies of the 1990s are still used and will continue to be used with great success. Big Data will not replace the ERP’s, the CRM’s and e-business sites already operational. Big Data will be an evolution of the systems already in production at B2C, B2B, and government organizations.
Big Data projects are not projects of a new type even if the legal aspect is more important. Conventional approaches, in particular the waterfall and the agile methods, remain valid. All is needed is the inclusion of a legal person in the project team.
Like any technological innovation, a few pioneers initiated Big Data for internal purposes: to parallelize the storage and processing of massive data at very high velocity but at a lower cost. Then came the experimenters with pilot projects. Gradually, the number of these pilot projects will increase. Some will give birth to larger projects. Others will fade away. In a few years, Big Data will become a regular innovation, in turn caught by a new innovation seeking to improve or even replace it. The force of creative destruction (defined by Joseph Schumpeter in 1942, rediscovered first by Everett Rogers in 1962, a second time in 1985 by Norbert Alter, a third time by Geoffrey Moore in 1991 and a fourth time by Clayton Christensen 1995) continues its work until the next innovation…