2D Image Data Correlation Analysis

This post describes a few different methods to quantify how "close" two different 2D images/datasets are to each other.

We define a 2D datasets in this case to be:

z1(x,y) and z2(x,y)

This could be an image in a 2d plane or some other 2d dataset. Obviously, first just look at the images and now one would want to quantify their degree of identicalness.

1. Slope of M1 vs M2

Unzip the 2d array of image and plot them in the array (make sure to unzip the same way).

Check for:
(a) R2 should be close to 1
(b) The slope should be close to 1
(c) The offset should be close to 0

Make sure the errors are correctly defined for each of the three fits and they can be rather independent of each other.

2. Difference plot

Plot the difference of the two images on an image to find if there are any blind spots/something funny. If not, also unzip the 2d array and plot it as a series to look for weird outliers and check if each point is within the series. If yes, look and check them why is the difference so high.

3. Plot the ratio of unzipped 2d->1D arrays of the two measurements as a series. This should be as close to 1 as possible. Look for particular outlier and identify the reasons. If necessary, also make a box plot but don't just look for mean and SD!

CODE:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Jun 8 10:42:08 2018

@author: pranjal.bordia
"""

import pandas as pd
import seaborn as sns
import numpy as np
from scipy import stats

from scipy.optimize import curve_fit
def f(x, A, B): # this is your 'straight line' y=f(x)
return A*x + B

x = np.arange(0, 2, 0.1)
y = np.arange(0, 2 , 0.1)
xx, yy = np.meshgrid(x, y)

z1 = (xx**2 + yy**2) + 0.6*np.random.rand(20,20)
z2 = (xx**2 + yy**2) + 0.6*np.random.rand(20,20)

#### Metric one, Slope
z1array = z1.ravel()
z2array = z2.ravel()

def r2(x, y):
return stats.pearsonr(x, y)[0] ** 2

g = sns.jointplot(z1array, z2array, kind="reg", stat_func=r2,scatter_kws={'s':10},ci=95)

xlabel('Measurement 1')
ylabel('Measurement 2')

A,B = curve_fit(f, z1array, z2array)[0]

g.fig.text(0.15,0.65,' Slope = '+str(round(A,5))+'\nIntercept='+str(round(B,5)))
tight_layout()
savefig('M1M2Slope.png',dpi=600)

### Metric 2, Difference Plot
z3 = z2 - z1
figure()
sns.heatmap(z3,cmap='coolwarm',vmin=-2.6,vmax = 2.6)
xlabel('X')
xlabel('Y')
savefig('2DDiffPlot.png',dpi=300)

z3array = z3.ravel()
figure()
plot(z3array)
xlabel('Some point')
ylabel('Difference in two measurements')
savefig('2DDiffPlotSeries.png',dpi=300)

### Metric 2, Ratio Plot
z3ratio = array(z1array)/array(z2array)
figure()
plot(z3ratio,'o',ms=4)
xlabel('Some point')
ylabel('Ratio in two measurements')
savefig('2DRatioPlotSeries.png',dpi=300)

CSV vs HDF5 Time, Size and Shaping

I needed to optimize how we store and manage our data. The data was plain float numbers, so I decided to first check how Python does it with CSV vs saving for HDF5 format. In order to check quickly, I generated random numbers and checked the file size of the stored data as well as the time it took to save them. Results: 1. The type of ordering (Row, Column, Square) didn't matter for CSV or HDF5 data format for time to save as well as the file size. 2. HDF5 performed significantly better in time and constantly better than CSV in size. Somewhere around 10,000 as the number of floating point numbers, things shifted to HDF5, for less than that, CSV appears to do better. 3. To read back, you can use, h5f = h5py.File('ColH.h5','r') bb = h5f['ColData'][:] h5f.close() Note, there is NO loss of information in HDF5 compression. Plots and Code below. #!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Wed Jun 6...

CodeOrder

Search This Blog

2D Image Data Correlation Analysis

Comments

Post a Comment

Popular posts from this blog

CSV vs HDF5 Time, Size and Shaping

Anthropogenic Carbon Emissions and Global Warming