Big data can enable fast, comprehensive analysis, but the secret is knowing where the value lies, and how to extract it safely and reliably. Sarah Williams speaks to Halliburton’s chief data scientist Dr Satyam Priyadarshy about integrated platforms, scalability and data governance.

main

With the uncertainty of low oil prices compounded by the expense of needing to venture into more difficult-to-access sites, the need to improve efficiency wherever possible is greater than ever for the oil and gas industry. Comprehensive data and the ability to analyse it in real time are vital to achieving this – in other words, big data.

While the scale of information available to companies is indeed huge, the value of such data lies not in its vastness but in analysts’ ability to identify and understand its patterns, often within a short timescale.

Time-dependent data strategy

For Dr Satyam Priyadarshy, chief data scientist for Halliburton, building an effective big data strategy relies upon working towards the three primary benefits that data can bring.

"First, if you are running a business, then it’s about creating a process for faster and better decision-making, that’s number one," Priyadarshy says. "The second is how do we change business strategy to remain competitive in the global environment? And the third point is innovation, which is what big data brings to any industry, including oil and gas."

Specifically, this innovation is the ability to identify new patterns by applying data-driven approaches to aggregate – and previously separate – data sets. These findings can then be applied to inform strategy, solve problems and improve performance.

"Within the oil and gas industry there are, in principal, many challenges across the whole oil-well life cycle," Priyadarshy says. "So whether it is exploration, drilling, reservoir management or any of the processes involved in these phases, for each of them big data brings a different kind of value from a different perspective."

The ‘digital oilfield’ (the network of technologies used to process different measurement data) must now handle increasingly complex capabilities such as measurement while drilling, down-hole multiphase sensors and seismic surveys carried out to an ever higher degree of accuracy. Managing the huge volumes of data is essential but, in addition, the requirements of the individual case must be met.

"With exploration, the industry collects a few petabytes of data per square kilometre a day," Priyadarshy says. "And if you look at seismic surveys, the industry is moving towards collecting five petabytes per square kilometre of survey data. So the volume part becomes very important.

"But when you look at drilling, we deploy lots and lots of sensors on the field and although the data may not be huge in petabytes, it is coming in real time and a lot of this data has to be analysed in real time too. So depending on which phase we are talking about, the value in the data is also time-dependent."

With surface data sensors allowing factors such as down-hole pressure and RPM to be monitored, live analysis of drilling helps improve performance by allowing operators to make time-sensitive decisions. It also allows automatic alarm systems to flag deviations before more serious errors occur.

An integrated platform for unstructured data

For seismic surveys, by contrast, real-time analysis is not the priority. Here, the evidence gathered will be used to make the case for exploration and influence where exploratory wells are located; it is therefore not speed but accuracy that is prized.

Of course, for an upstream company like Halliburton, data gathering not only encompasses a huge range of operational functions but also large-scale internal considerations such as HR management.

If you wanted to analyse supply chain data… you are able to connect these data sets easily and optimise whatever process you are running. 

Almost as varied as the data itself is the form in which it arrives, Priyadarshy points out: "One challenge within the drilling world is that we get lots of what we call ‘unstructured data’. For example, we have daily reports that may be coming in paper form or a standard online form; and many companies are now leveraging mobile strategy and as a result we will start receiving forms filled out on mobile devices."

The mission of leading data scientists like Priyadarshy has therefore been to design an agile approach that can work from the data as it arrives – as opposed to the older, energy-intensive method of building a database or data model for each data set.

"With big-data technology, one can actually store the raw data as it is in whatever shape and form it arrives and then leverage the compute power to create data models on the fly," Priyadarshy says. "This means that if tomorrow you added more sensors or a different kind of data collection process, you’re not then sitting there trying to move raw data and recreate your tables."

The answer, Priyadarshy says, is to build an integrated and modular platform; in Halliburton’s case the DecisionSpace Integration Server, operated by the company’s R&D division. This allows the relevant data to be accessed without the effort, high cost and impracticality associated with actually moving the data.

"’Integrated’ means that we are not trying to separate the seismic domain, the drilling domain, the reservoir domain and the ‘long-term’ data, which would be your supply chain," Priyadarshy says. "In principle, all of these should be available through one platform.

"And ‘modular’ means you can analyse the data for whatever purpose you need. For example, if you wanted to analyse supply chain data, in the past you would have needed to connect to a CRM to access the different data sets separately: the basic information of where the material has to go; where the supplier is; and the chemicals you are shipping. Now, because you can access these things in a modular fashion, you are able to connect these data sets easily and optimise whatever process you are running."

Within a system like Halliburton’s, users can connect via the integration layer to their desired data sources, whether structured or unstructured. The application layer allows users to access this data via a number of big-data technologies as required such as Apache Hadoop, or data streaming services like Kafka and Spark.

Specific models, whether empirical or predictive, can then be built for a specific phase of the oil-well life cycle and represented through dashboards and other visualisation. What Priyadarshy calls an "actionable insight platform" allows users to identify patterns and act upon these.

Modular platforms of this kind can be used to operate a well and to optimise processes using real-time analytics. In this way, Halliburton’s system takes in the full sweep of data, from building a model, to planning best practice, and finally live, predictive and cognitive analysis of operations.

"The value comes in what I call ‘opening up’ the data," Priyadarshy says. "If we just keep our data in multiple silos and don’t connect it, then you may have ‘big data’ in terms of stored data but it has no value because you are not able to create new patterns and look at hidden inefficiencies. In limited data sets you can look at things that you already know about, but you cannot actually find new patterns."

For example, if a company wanted to investigate issues of non-productive time within the E&P domain, looking at a limited data set of a particular unproductive site would not provide sufficient insight to pinpoint cause and effect, Priyadarshy says. However, by looking at historical data for hundreds of wells and analysing the aggregate reasons recorded for non-productive time, patterns can begin to be traced.

The cyber-threat

Of course, with any initiative to integrate a company’s far-reaching data for staff access, security is a key concern. While the scale may seem daunting, the principles of data governance remain unchanged and Priyadarshy is keen to highlight the ways in which big-data technology can itself provide a helping hand.

"In my personal opinion, big data actually allows you to perform data governance even better now because the ability to monitor the activities of a user is much easier," he says. "So while lots of people can access a data layer, at the same time we can log that access very effectively, and that means that we can flag terms-of-service violations in real time, which was not the case 15 or 20 years ago."

Predictive analytics can also be applied to great effect within cybersecurity. While Priyadarshy keeps judiciously quiet on Halliburton’s precise strategies, he explains how big data in general can help companies overcome the growing threats of data breaches. Growing, he says, because the sheer volume of data is, too.

"Data is everywhere. We are generating it from our phones, from almost every device. When you have data stored in so many places, the biggest challenge arises when you forget where some of this is and you have no insight into it. The technical term for this is ‘dark data’, and many industries have this problem.

"Where big data can help is, again, in creating new patterns. You can look at previous attacks and profile them, so you can build predictive models of a certain behaviour, which you can then look out for in realtime."

While such modelling techniques have existed for many years, Priyadarshy says, the ability today to apply them to much larger and complex data sets allows outlier events to be identified much faster.

What’s more, the ability to carry out predictive analytics on a consistent, ongoing basis is a huge advantage, as much within cybersecurity as in the wider uses of data-driven approaches across all stages of the oil-well life cycle.

"When you do these things on an ongoing basis," Priyadarshy explains, "your predictive models can become better and better."