Oraple

Programming Project #2 (proj2)

COMPSCI 180 Intro to Computer Vision and Computational Photography

Chuyan Zhou

This webpage uses the Typora Newsprint theme of markdown files.

Part 1: Fun with Filters

1.1 Finite Difference Operator

Here we have the finite difference operators respectively horizontal and vertical as

Dx=[11],Dy=[11].

To compute the gradient which is given by

gx(i,j)=A(i+1,j)A(i,j),gy(i,j)=A(i,j+1)A(i,j),

where A is the original image, we can just convolute the image with the two operators:

gx=DxA,gy=DyA,

and we have the right as the original point for Dx, the below as the original point for Dy. In the practice (the code), we use the method scipy.signal.convolve2d where the mode is 'same' (the size of the result is kept same as the original image A).

To find the magnitude from the gradients, we use

|g|(i,j)=gx2(i,j)+gy2(i,j),

written as elementwise (pixelwise)

|g|=gx2+gy2.

To show the gradient or magnitude images, we should first normalize them. In this project, we treat all images in default with the float data type, so the normalized pixel range should be at [0,1] to be mapped into [0,255]Z.

Denote normalized gx as gxN, normalized gy as gyN, normalized |g| as |g|N, we have

gxN=12gx+12,gyN=12gy+12,|g|N=|g|2.

The magnitude image of the gradients is like a rough edge detection result, and we can further apply binarizing with thresholding on this image in order to suppress the noise and find real edges. We choose a threshold of 0.2 here, i.e. if a pixel in the magnitude image is greater than 0.2, then it is preserved as a part of edges and the value of it is set as 1; otherwise, it is considered not in an edge and the value is set as 0.

Shown below are the original image, the gradient images (normalized), the magnitude image, the binarized magnitude image.

The original cameraman.png image

 

The x-axis gradient (normalized)
The y-axis gradient (normalized)

 

Magnitude image (normalized)
Binarized normalized magnitude image

1.2 Derivative of Gaussian (DoG) Filter

1.2.1 First Gaussian then Gradient

By applying a Gaussian filter G to the original image using the outer product of the 1D Gaussian given by cv2.getGaussianKernel(), i.e.

A^:=GA,

we have the blurred image A^ as

The cameraman.png image filtered (blurred) with Gaussian kernel

The gradient for the blurred image is

g^x=Dx(GA),g^y=Dy(GA).

At this section and the section 1.2.2, we choose the Gaussian kernel with parameters σ=1 and the kernel size 7×7 to avoid artifacts brought by insufficient wrapping of the peak of the 2D Gaussian (3σ convention).

We do the same procedures as part 1.1, and choose a different (lower) threshold as 0.07 because the noise has been filtered through the Gaussian filter.

The x-axis gradient (normalized)
The y-axis gradient (normalized)
Magnitude image (normalized)
Binarized normalized magnitude image

1.2.2 Derivative of Gaussian filter

We can also combine the derivative operator with the Gaussian kernel, and use the combined operators i.e. DoG filters (for two axes) to convolve the image. This is to say, g^x and g^y, the gradients of the blurred image w.r.t. the Gaussian kernel can also be derived out from the associativity of convolution:

g^x=(DxG)A=Dx(GA),g^y=(DyG)A=Dy(GA).

By choosing the same threshold as 1.2.1, the result is the same as above:

The x-axis gradient (normalized)
The y-axis gradient (normalized)
Magnitude image (normalized)
Binarized normalized magnitude image

Part 2: Fun with Frequencies

2.1 Image "Sharpening"

In this section, all Gaussian filters are with the kernel same as before (the size is 7×7 and the sigma is σ=1).

2.1.1 Sharpening Progression

By adding the high frequency components with a coefficient α, where the high frequency components are produced from subtracting the Gaussian-blurred image, we get the sharpened images. Also by the properties of convolution, we have

A+α(AAG)=A[(1+α)IαG],

which means we can use one filter (called unsharp mask filter) with the expression (1+α)IαG where I is the identical operator to sharpen the image w.r.t. the parameter α.

We choose a set of α's, and show the progressive process of sharpening as follows:

Taj Mahal (original, alpha=0)
Taj Mahal (alpha=0.5)
Taj Mahal (alpha=0.9)
Taj Mahal (alpha=1.5)
Taj Mahal (alpha=2)
Taj Mahal (alpha=6)
Taj Mahal (alpha=10)

We can see the edge (patterns) of this beautiful architecture is more and more obvious in the progression when α is increasing.

Also for my self-chosen image of a famous wall taken in Shimo-Kitazawa, Tokyo:

Wall (original, alpha=0)
Wall (alpha=0.5)
Wall (alpha=0.9)
Wall (alpha=1.5)
Wall (alpha=2)
Wall (alpha=6)
Wall (alpha=10)

2.1.2 Sharpening after Blurring

For the Taj Mahal image, we try first blurring the image, then sharpen the blurred using a filter with α=3.

Taj Mahal (original)
Taj Mahal (blurred)
Taj Mahal (sharpened after blurred, alpha=1.5)

After being blurred, the image seems like being veiled, and re-sharpening removes this effect, but the resolution seems lower than the original, because there are some high frequency components (featuring high resolution) whose frequencies higher than the Nyquist limit in the original image is blurred by the LP filter, which is not able to be reconstruct (and also causes aliasing).

2.2 Hybrid Images

The general approach for creating a hybrid images involves these:

  1. Align two images by rescaling, rotating and shifting;

  2. Get the low frequency part of one image using a LP filter (we choose a Gaussian here though);

  3. Get the low frequency of another using probably another LP filter (this could be different from that in 2), and get the high frequency part by subtracting the original image with the low frequency part. This can also be reached with a Laplacian filter.

For all cases below, we all choose a square kernel with the width/height 7 times of the sigma for each Gaussian filter.

We show two successful cases of creating hybrid images, and one failure.

2.2.1 Result 1

The sigma for the image for low frequency is sigma_lo=5. The other sigma is sigma_hi=3.

yajuusteak/steak.jpg
Minecraft Steak (for low frequency)
yajuusteak/yajuu.jpg
Some shouting male (for high frequency)
Hybrid image: Steak shouting

 

2.2.2 Result 2

 

emoji/savor.jpg
Savoring Emoji (for low frequency)
emoji/fear.jpg
Fearing Emoji (for high frequency)
Hybrid image

At first sight I could see the fear, nonetheless focusing less and the joy hidden behind will emerge.

We do Fourier transform to these original (aligned), filtered images and the hybrid image.

The FT visualization for savor.jpg
The FT visualization for fear.jpg
The FT visualization for savor.jpg low frequency components (filtered)
The FT visualization for fear.jpg high frequency components (filtered)
The FT visualization for the hybrid image

2.2.3 Result 3 (Failed)

cat/explo.jpg
Exploding Cat (for low frequency)
cat/eat.jpg
Eating Cat (for high frequency)
Hybrid Image, but I think it's failed.

Though it mix well, but we could percept two features at the same time, for which I think it's a failure. I think it's because the original images are in low resolution, and the main feature (the cat) overlaps too perfectly with no other to-be-overlapped features which could make this be called a hybrid image.

2.3 Gaussian and Laplacian Stacks

In a Gaussian stack, we have the first image as the original image, and the every next image is the result of the last image in the stack convoluted with a Gaussian filter. In a Laplacian stack, every image with index i except the last one is the difference between the image i in the Gaussian stack and the next image i+1 in the Gaussian stack, and the last image is the same as the last image in the Gaussian stack which can make the whole stack adding up to the original image.

With help of the Gaussian and Laplacian stacks, we can smoothly blend the parts of two images together instead of Alpha blending (interpolating the pixel values of two images with a weight). In detail, for images A and B, we have the Gaussian stack G_A and G_B, and the Laplacian stack L_A and L_B, also the mask M (a binary image with the same size as A and B), for which we also build a Gaussian stack G_M. The blended image C is the collapsed (summed) every layer of two Laplacian stacks added together interpolated w.r.t. the mask extracted from the same layer index in G_M.

In implementation, for the Oraple we have 6-level stacks. Sigmas are always 5 and the kernel size is 7 times of that.

Level 0
Level 1
Level 2
Level 3
Level 4
Level 5
Sum-up
The oraple

 

2.4 Multi-resolution Blending

Given the approach of blending introduced in 2.3, we could blend any pair of images with some masks. The sigma/kernel/level number settings remain from 2.3.

When generating masks, I used Segment-Anything by Meta to cut out an transparent-background sub-image (which is a feature) from one photo I have taken, and manipulated the shifting and resizing by hand.

2.4.1 Statue of Hachiko the Squirrel

squirrel_original.jpg
squirrel.png (Cut out by Segment-anything)
hachiko.jpg
squirrel_changed.jpg (Shifted and Resized)

 

Mask
Blended Image

2.4.2 Extra Chashu to My Ramen

(but only 1 piece in 2)

chashu_original.jpg
chashu.png (Cut out by Segment-anything)
ramen.jpg
chashu_changed.jpg (Shifted and Resized)
Mask
Blended Image

The chashu blended into the noodles seems smooth, especially for the shadow appears at the upper part (which is not in the cut-out image and the black-background image) and the lower part is colored by the soup.

The Laplacian/Gaussian Stacks before/after masking is as shown as below.

Level 0
Level 1
Level 2
Level 3
Level 4
Level 5
Level 6
Level 7
Sum Up