Subsurface Salt Interpretation Automation — Enhancing ML workflow for semantic segmentation

17 min readDec 29, 2021

By Branda Huang, Brandt Green, Jessie Lee, Meha Mehta, Shengxiang Wu

Link to implementation code on Github: https://github.com/brandahuang/Subsurface-Salt-Interpretation-Automation

Abstract

Accurately identifying subsurface salt deposits from seismic images is an important, yet time consuming step in the exploration and development of oil and gas resources in the energy industry. This process is undertaken in order to discover new resources and avoid hazards in certain basins globally. Each year, a large amount of employee hours are spent interpreting salt bodies in 3D seismic volumes. In this article, we propose an alternative, machine learning driven approach to automating the process.

The current model we have developed is built on 4000 original input 2D salt images, sliced from 3D seismic volumes and paired with a mask image that is created from a human expert’s interpretation. Due to the limited size of the learning set, image augmentation was performed to generate a larger training set. We then utilize a fully convolutional network model, with the U-NET architecture, that takes an input image and outputs a 2D “salt mask” that identifies where salt is located in the input image. This output image is then compared with the original mask generated by human interpreters to calculate error metrics. Our final model,evaluated on the hold out set, achieved an average pixel wise accuracy of 93.4% and a mean IOU of 72.4%.

Statement of Purpose

Why are we interested in salt?

Salt was deposited when oceans started to open millions of years ago and later arose through earth’s strata due to its plasticity and relatively low density, to form variable shaped bodies such as domes, canopies or diapirs. In places such as West Africa, the Gulf of Mexico and North sea, billions of barrels of hydrocarbon resources (oil and gas) were trapped above and around salt due to its very low permeability. As a result, salt related structures are hot targets for oil and gas exploration because often, finding salt can mean finding oil. (Fig. 1).

Aside from facilitating oil discovery, knowing the location of salt deposits also bears significance with respect to drilling safety. If the salt appears too shallow with trapped gas, it could be a drilling hazard that if not properly planned for and handled could cause explosions. Therefore, accurately identifying salt becomes critical for drilling safety.

Clearly, identifying and accurately mapping salt deposits is critical in the oil and gas industry. Unfortunately, human interpretation of salt bodies on seismic data is tedious, time consuming and often subjective. By applying machine learning algorithms, we hope to build the first piece of an ML workflow that automatically and accurately identifies salt bodies using 2D seismic images. While we do not tackle the challenge here, the next phase of the complete workflow would involve piecing together the 2D interpretations into 3D seismic volumes. The new process will take a few hours to a few days; a drastic improvement upon current methodologies. This efficiency gain removes one bottleneck in the project timeline which would ultimately create substantial economic value on a high capex project.

Image characteristics of salt

How exactly are these seismic images obtained? Subsurface salt bodies are almost exclusively imaged using a technique called reflection seismology. In offshore environments, this entails air guns shooting sound waves to the ocean bottom (Fig.2), which are then reflected by earth’s strata. The reflected waves are recorded by hydrophones and the data is processed to form a 3D seismic volume from which geoscientists interpret various subsurface features, e.g., salt bodies (Fig. 3). The 3D volume is then sliced into a series of 2D images for salt interpretation and later, can then be resembled to represent a 3D body.

Figure 3. Seismic Data with Salt in Depth

The strength of seismic reflections at any boundary is defined by the equation:

source: MIT OpenCourseWare

Where RC is the reflection coefficient of compressional waves, ρ2, v2 and ρ1, v1 are the density and travel velocity of compressional waves in overlain medium 2 and underlain medium 1 respectively. Due to the high velocity (V2) of compressional waves in salt (~4400 m/s) compared with surrounding sediment, (ρ2v2 — ρ1v1) term is large resulting in strong positive inflection at top salt, where salt base is characterized by a strong negative reflection.

When human experts (geoscientists) examine the images, there are a few characteristics that help them identify the areas containing salt:

Top of salt is normally a strong peak (a convoluted positive reflection), while the base is a strong trough (negative reflection).
The interior of a salt body is relatively homogeneous and lacks layered structure as surrounding sediments. However, enclaves and impurities trapped inside salt could cause disturbances and the development of more complex texture.

When salt — sediment boundary is close to vertical, sound waves coming from above would have incident angle semi-parallel to the boundary, resulting in refraction and diffraction of energy that causes complexity in imaging. When salt bodies are stacked, the energy of upper salts can attenuate (weaken) the seismic energy where the lower salt image can be blurred.

In summary, the characters and features of salt in seismic images present several challenges for human experts, and we can expect some of these same challenges to cause trouble for our ML models. In this study, we are trying to leverage our domain knowledge and train the model purposefully to address the challenges while maximizing the distinguishing characters of salt bodies.

Dataset

Data Description

The data for this project came straight from a Kaggle Competition: TGS Salt Identification Challenge. From this challenge, we were able to download 4,000 image pairs where each pair consists of a 101x101 pixel seismic image of the earth’s subsurface, and its corresponding 101x101 pixel “salt mask”, where white pixels represent the human interpretation of salt bodies. Here’s a sample of the first 6 image pairs in the data set (Fig 4):

Figure 4. Original training set image pairs (seismic image and mask)
*Note: Salt is indicated by the white shading.*

The images we have are simply cross-sectional slices of the 3D view obtained from reflection seismology as explained above. These images are stored in grayscale format, meaning each pixel has only one color value, between 0–255. This contrasts with a typical color image where each position contains 3 pixels for each RGB value. The “Salt-Mask” is created from an expert’s manual labeling which is considered the ground truth. We took 3,500 of these pairs to be our training data and set aside 500 as our final validation set.

Data Processing & Exploration

Depth

Depth information of the imaged salt bodies was provided in the dataset. However, based on our domain knowledge of salt bodies in geoscience, the chance of salt occurrence is normally uncorrelated with the depth at which the body is located. To test this assertion, we conducted a brief exploratory analysis to see if we could identify any patterns that we were missing, but as you can see from the graphs below, depth really does not hold any value to this problem (Fig. 5).

Data Alchemy

We stumbled upon a technique called Data Augmentation that allows you to synthetically increase the size of your training dataset. Data augmentation works by creating new data that are slightly modified, yet still realistic versions of the original dataset. With our particular dataset, we decided to apply three augmenting methods: Up-Down Flips, Left-Right Flips, and a 90 degree rotation. Each of these transformations was applied to every training set image, resulting in a revised training set sample size of 14,000 images. Examples of the transformations for 3 images are depicted below (Fig. 6):

Figure 6. Data Augmentation (Up-Down Flip, Left-Right Flip, 90 Degree Rotation)

Data augmentation can be a powerful tool because it makes the model more robust by ensuring the model does not overfit to certain irrelevant positional characteristics.

Common ML Image Tasks & Model Overview

Images and Machine Learning

Before diving into what models are available and which one we chose to solve this problem, we think it’s helpful to examine at a high level, some of the common image related problems in the world of machine learning. The image below gives a great representation what four of the big tasks are trying to accomplish:

Figure 7. Image segmentation illustration

Image Recognition: Image Recognition basically means to predict what is in the image. This takes the mathematical form of estimating probabilities for each of the potential classes.
Object Detection: This involves not only estimating probabilities for what an object may be, but actually predicting a bounding box that encloses the object. This is typically expressed as a regression task by predicting the coordinates of the pixel located at the center of the object and then predicting the object’s width and height. When put together, these components represent the box containing the object.
Semantic Segmentation: Drawing a box around objects in the image is neat, but there are certain problems where a box may be too coarse and imprecise to truly address the task at hand. This is where semantic segmentation comes in. Semantic segmentation seeks to predict on a pixel by pixel basis the exact shape of the object. As can be seen in the image above, the pixels representing sheep are all identified as one class and the pixels corresponding to the dog are represented by the next class. This is exactly what our project entails, only instead of predicting the pixel masks for sheep and dogs, we’ll be outputting predicted probabilities for salt or not-salt.
Instance Segmentation: Finally, there is one additional extension for many projects that could be implemented. Semantic segmentation lumps all pixels corresponding to the same class into one chunk which is often fine, but in certain situations you may need the extra precision of actually identifying that one blob of pixels representing “sheep” is different from the next blob of sheep pixels. When this distinction is necessary, the problem being solved is instance segmentation, but it is beyond the scope of our project because we do not need to distinguish between different clumps of salt in our images.

Choosing a Model

We now have a pretty clear picture of our goal: solve a semantic segmentation problem where the input is a seismic image, and the output is also an image that highlights the sections of the input image containing salt. To accomplish this goal, we explored several models and our thought process followed like below.

Multi-Layer Perceptron (MLP): This is typically the “starter” model for anyone learning about neural networks and it would be quite a win if it also worked on image related problems. Unfortunately, there are two thorny problems with MLPs when applied to images. First, the number of trainable parameters in the model will explode when using anything other than a trivially small image. This issue arises because the input to the MLP model will be a flattened array of every pixel in the image, and with multiple fully connected layers, the parameter count rapidly grows unwieldy. Second, MLPs suffer from another related problem: overfitting the training data. MLPs are notoriously bad at understanding positional invariance, meaning they struggle to learn that the same object can appear in multiple different areas of the image. For example, if your model only contained training images where sheep were on the right hand side of the image, it would struggle to recognize a sheep on the left hand side of the image in the testing data.

Convolutional Neural Network (CNN): Using a convolutional neural network solves many of the problems associated with MLPs. The two key components of CNNs, convolutional filters and pooling, work together to solve the problems of the MLP. A useful characteristic of the filters is that for each filter in a convolutional layer, all of the pixels share the same trainable parameters. This helps to reduce both the complexity by reducing the parameter count, and increases the models robustness to positional invariance because the same filter is applied to pixels in all positions of the image. The pooling layer also contributes to complexity reduction since the output shrinks the number of inputs fed to the next layer. These operations work together to help the model understand the “what” of the image which is necessary for a standard classification task, but the model as a whole tends to fall short when a probability prediction is needed for every single input pixel provided. The cause of this deficiency is mainly due to the repeated application of max pooling layers because with each downsampling, some of the precise localized information is lost. On top of that, a CNN is typically topped off with a fully connected dense layer which works well if you are predicting class probabilities, but not for predicting probabilities for each pixel. [6,7]

Fully Convolutional Network (FCN): Fully convolutional networks were introduced as an extension to CNNs with semantic segmentation tasks in mind. [8] The main addition of the fully convolutional neural network is that the final fully connected layer of a CNN is chopped off and instead replaced by another convolutional layer with one filter and one channel. By doing this, this output will be a 2D feature map of the exact same size as the original image and we can interpret the output numbers as class probabilities. An added benefit of removing the fully connected layer is that you can now send in images of multiple different sizes to the model because none of the layers require a specific input size.

Using an FCN seems appropriate for our challenge, but an FCN is actually just a general methodology that does not tell us how to specifically structure our model. Fortunately, we can build on the shoulders of the giants before us by taking an already successful FCN architecture off the rack. U-NET is the model we utilized.

UNET Implementation

What is U-NET?

U-Net, is a fully convolutional network architecture that was introduced in 2015 to develop a semantic segmentation model for microscopic cell images. The original architecture is displayed below (Fig. 8):

Figure 8. Image segmentation illustration

The U-Net model is built with the same familiar building blocks found in a CNN: convolutional filters and max pooling. The big idea in U-Net is to supplement the traditional contracting path of a FCN with a mirror path containing upsampling operations instead of pooling. The downsampling path and upsampling path are called the “encoding” and “decoding” paths respectively and the sequence of these two mini-structures, when visualized like above, creates the U-like structure that U-Net gets its name from. The arrows in the above image indicate that the mirror layers for each path are concatenated together. A final point worth noting is that this model retains a large number of filters in the decoding phase, which allows the model to propagate big picture, context information to the later layers. [9,10]

The reason this structure works so well for semantic segmentation tasks is because during the encoding step, the model is learning a lot of the big picture, macro information embedded in the image, and the decoding layers allow the model to retain the precise, localized information necessary to make fine-grained pixel predictions.

Base U-Net model Implementation:

To implement the U-Net model, we utilized the Keras library in Python because of its intuitive, easy to understand API. The code we used to specify our model structure can be seen below :

From our implementation, there are four encoding steps and four decoding steps. In each encoding step, there are two convolutional layers followed by a max pooling step, which halves the image resolution. The next step receives the output from max pooling of the previous layer. The four decoding steps that follow operate similarly except these layers replace the max pooling layer with a transpose convolutional layer. This layer has the opposite effect of the max pooling; it doubles the image size, which helps us to work our output dimensions back up to the size of the original input. The result of the transpose convolutional layer is concatenated with the output from the same-size complementary layer on the other side of the ‘U’. Finally the model is topped off with a single-channel convolutional filter that uses the sigmoid activation function. This will output for us a 2D feature map, and the sigmoid transformation allows us to interpret the resulting numbers as probabilities that each output pixel is salt or not!

This results in what we call our “Base U-Net model”, which has approximately 500,000 trainable parameters. We trained this model on both the original data set and our augmented data set. The results are presented in the next section.

Enhancements on the Base Unet Model:

Due to the nature of salt in seismic images, we hypothesized that we could improve our results by using edge-enhanced pictures. For this, we used Sobel edge-enhancing convolution. An example of this can be seen in Fig. 10.

Figure 10. Sobel Convolution for edge detection illustration

Model Training & Results

Hyper-parameter Selection

Batch size : 10
Epoch: Initial setting was 30 epochs, but an early stopping call back was set to prevent continued training if the error has not decreased in 5 straight epochs.
Loss Metric: Binary cross entropy
Optimizer: Adam

Metrics

We used the following two metrics: pixel accuracy and IoU.

Pixel accuracy is straightforward, simply the percentage of correctly classified pixels:

The accuracy for the entire model is the mean accuracy of all images.

IoU stands for Intersection-Over-Union, and is also intuitive when thinking about how to judge the correctness of a model based on an entire image classification (Fig. 11). It is calculated as the area of the overlap between the predicted mask and the ground truth mask, divided by the combined area of both masks. It’s best understood pictorially:

The formula for calculating IOU is:

Results

The five separate model trainings and their results on the validation set are displayed in the chart below. The main difference between each model is just the data that was fed into it. The first three include successfully more implementations of data augmentation, and the final two were input Sobel Filtered data.

Our best model is the model with augmented training data (14000 images) and without the sobel edge detection filter. It implied that increasing the training data size will improve the model performance, although it would increase the training time of each epoch. In addition, because a sobel filter before the U-Net would filter out some of the information contained in each image, it didn’t help to improve our model accuracy.

Below you can see the performance of our best model throughout the training process (Fig. 12). Both of our evaluation metrics appear to plateau after about 7–8 epochs.

Figure 12. Comparison of training IoU evolution with epoch (upper: training set IoU; lower: validation set IoU)

Model at Work

To visualize how the model does in predicting salt, we used 2 outputs and compared it with ‘true’ salt body interpreted by human interpreters (Fig. 13). As we can see, except for local disturbance that was ambiguous even for the human eye, i.e., bottom left corner for the upper image (Fig 13), most of the interpretations are impressively accurate.

Figure 13 Model result illustration (column 1: original seismic image; column 2: predicted masks; column 3: ‘true’ masks)

Key Takeaways

From the results, the solution that has the highest accuracy score of 93.4% and IOU of 72.5% was based on the original architecture of U-Net with 3 data argumentation techniques used. We can conclude that data augmentation including Up-Down flip, Left-Right flip, and 90 degree rotation have effectively improved the model performance. A larger number of epochs is also useful. On the other hand, an additional sobel edge detection layer before U-Net and an additional channel to U-Net did not have significant impacts on the enhancement of the model.

Future work

The following is a short list of potential ideas and next steps we would like to explore to escalate our solution to the next level:

Increasing Training Dataset Size

Increasing the training set would potentially capture not only the morphologic and textural variation caused by location of salt such as salt from North Sea and West Africa, it could also train the model to interpret seismic frequency, angle stack and any other pre- and post-stack processing related variations. There are many existing salt interpretation products in the industry, and only making a small fraction of them available to train the model would have a high potential of improving the model. In the ideal application, we always want to create a global training set while adding additional training data for specific local applications using the corresponding local salt image data.

More Data Augmentation

Would our model perform even better if we added in more image alterations? Should we add 180 degree rotations? Image blurs? Elastic deformations? Which alterations work best for our data set? How many alterations are too many? Certainly an intriguing topic to dig into.

Pseudo Labeling

Pseudo-labeling is a simple semi-supervised learning algorithm which attempts to incorporate unlabeled data into the model training process. The model does this by using a partially trained model to predict the unlabeled data, and then feeds this prediction back to the model for more training. This is another one of those techniques that at first pass, seems a bit like data wizardry, but as we mentioned previously, we have 18,000 unlabeled images and we would love to find a way to use this untapped data.

Different Segmentation Models

Though U-net is seen as a more stable solution with proven success for this type of semantic segmentation problem, there are several extensions to U-net that should be considered (eg: ResNet-Unet) along with other architectures (eg: MaskRCNN, FPN) that if implemented, could provide performance improvements.

Incorporate More Evaluation Metrics

We can also include other common evaluation metrics for semantic segmentation to get a more complete picture of the model performance. Dice Coefficient is another popular metric, similar to IoU with the main difference being that double weight is given to positive co-occurrences in the Dice Coefficient:

Transfer learning

Transfer learning involves using a model that has been pre-trained for another task, slightly modifying the model as needed and then applying it to your project specifications. There’s been a lot of success with this approach and we think it’s worth exploring. The reason for our lack of faith is because the vast majority of open source pre-trained models are trained on data sets that are drastically different from the seismic imaging data we have. Most are trained on ImageNet which consists largely of everyday sights and objects such as trees, people, and cars. That being said, we still think it’s worth trying due to the large success we’ve seen others have using this approach.

Concluding Thoughts

Seismic Imaging and salt identification play an important role in oil and gas discovery and there is a large amount of interpretation needed. The commercialization of any application, would ultimately require the model to be scalable to large volumes with large images size and eventually applicable directly to 3D volumes instead of 2D images. Salt interpretation automation is still an ongoing research topic that could have a significant economic impact by accelerating project timeline and labor cost.

References

[1] https://glossary.oilfield.slb.com/Terms/s/seismic_acquisition.aspx
[2] https://en.wikipedia.org/wiki/Reflection_seismology
[3] https://ocw.mit.edu/courses/earth-atmospheric-and-planetary-sciences/12-510-introduction-to-seismology-spring-2010/lecture-notes/lec9.pdf
[4] https://en.wikipedia.org/wiki/Acoustic_impedance
[5] https://www.kaggle.com/c/tgs-salt-identification-challenge/data
[6] https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
[7] https://cs231n.github.io/convolutional-networks/
[8] https://www.jeremyjordan.me/semantic-segmentation/
[9] https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
[10] https://arxiv.org/pdf/1505.04597.pdf
[11] https://towardsdatascience.com/pseudo-labeling-to-deal-with-small-datasets-what-why-how-fd6f903213af
[12] https://towardsdatascience.com/metrics-to-evaluate-your-semantic-segmentation-model-6bcb99639aa2
[13] https://www.youtube.com/watch?v=azM57JuQpQI
[14] https://www.tensorflow.org/tutorials/images/segmentation

ABOUT ME

Thank you so much for reading my article! You are welcome to follow me and give claps to me if you find it inspiring :) I am Branda, a current student studying in MSc Business Analytics at UT Austin. Don’t hesitate to email me at branda.huang@utexas.edu or connect me on Linkedin to discuss more interesting ideas!