Image Recognition Series Part 1:
Eigenvalues and SVDs
Updated Feb 17, 2025
Project Duration: Nov 2022 - Feb 2023
Introduction
Disclaimer: Although there were several issues to fix to reach the project's current state, I had not documented my steps and thought process during that time as I did not realize I would later be including this project in the portfolio here. My original plan was to instead link directly to my Github repository holding the code and commit history, which I now realize is not easy to understand without a written explanation.
Context and Plan for Project
This project aimed to build on a previous class assignment and to practice machine learning and facial recognition. The goal was to see how accurately one could reconstruct and recognize images using default methods provided in MATLAB. For this project, I converted the language used from MATLAB to Python, as Python is more accessible to others and more commonly used within the data analytics field.
To expand on the assignment, I plan to research and use various methods of recognition, analyzing accuracy and efficiency. Although I currently only have one method of reconstruction and recognition implemented, I plan to calculate runtime to include in the final analysis. The assignment originally asked students to focus on a specific image for reconstruction; however, I wanted to reconstruct my entire training set for comparison. I also incorporated three difficulties to test image comparison, where each difficulty contained images meant to be harder to distinguish between the training and testing data sets.
Getting Started
To run these scripts on your local machine, please take note of the following for proper installation.
First, make sure that matplotlib is installed:
pip install matplotlib​
If you are struggling to run the files properly due to "No such file or directory" or similar errors, you may have to modify the paths in the following files:
run_files.py
image_to_csv.py
image_to_csv_E.py
image_to_csv_M.py
image_to_csv_H.py
reconstruct_images.py
If you want to use similar code in the reconstruction file, the genfromtxt method will not work if your images are not of dimension n x n. The given code assumes pictures of size 300x300, so if you have not sized your images as such, you will have to manually change the dimension variables h, w in the following files:
image_to_csv.py
image_to_csv_E.py
image_to_csv_M.py
image_to_csv_H.py
reconstruct_eigen.py
Completing the Small Test
First, I wanted to check that the code ran smoothly on a sample dataset before attempting to run it with a larger dataset. This would allow me to identify any problems early on as there would be fewer rows in the resulting CSVs to investigate. To do so, I searched for only six images of official art of a single character to use as my training set.
Preparing the Data
The class assignment provided the image data pixels in a CSV file representing a 2D matrix, where each row represented a specific image, and 90000 columns corresponding to 300x300 pixels in greyscale. This project started with the images themselves, so the first step was to change the images into a usable CSV format to make the conversion between the assignment to this project easier. To begin, each image needed to be read in as a numpy array using the image function in the matplotlib library, providing us with a 3D array:
img = im.imread(full_path)
The first attempt to turn the images into data failed. I had wanted to keep the RGB values because the images would appear more visually appealing; however, this would mean that the data would remain as a 3D matrix rather than 2D, and would cause future problems during the reconstruction and recognition steps, when attempting to read the resulting CSV.
​
​After much trial and error and plenty of research, I realized these images would have to be converted to greyscale for the data to be usable. When writing the code to convert the pixels into actual data, I first used a 3D array to have the RGB values, which I then used in a formula to calculate the corresponding greyscale value and create a 2D array. This array would then be reshaped into a 1D array to represent a single row, such that appending the other images would create a final 2D matrix.
# reshape from 3d to 2d to convert from rgb to grey
# which will convert from 2d to 1d
tmp_reshaped = np.reshape(img_fp, (90000, 3))
img_reshaped = []
for j in range(len(tmp_reshaped)):
pixels = tmp_reshaped[j]
rgb_gray = (0.2989*pixels[0])+(0.5870*pixels[1])+(0.1140*pixels[2])
img_reshaped.append(rgb_gray)
Once the code was written, I exported the image pixel data as a CSV and the greyscale version of the images. However, when attempting to view the greyscale images to make sure the conversion was complete, the images did not show up properly; instead, I received an empty square plot. After discussing with a friend, I learned that the imread function was supposed to give me floating point values between 0.0 and 1.0, but my arrays contained RGB values between 0 and 255. The solution was to add a line dividing the arrays by 255 to give me floating points, in the spot after reading in the image data and before converting. These arrays were then reshaped into the 2D arrays used in the loop to convert the RGB values to greyscale.
img_fp = img/255 # to floating point between 0 and 1
​As an example, this is the fourth picture that was converted:

4th Training Picture, Color

4th Training Picture, Grey
Reconstruction
Once the image conversion was complete, the reconstruction and recognition code would be simpler to write as I mainly needed to convert from MATLAB to Python, and fortunately I also still had access to both the MATLAB and Python versions of the class assignment details. This meant that I could use the Python code provided to write my own scripts while using my submitted assignment to understand why the code was used.
To begin with image reconstruction, we first need to find the average of the data. Fortunately, numpy has a built-in function to calculate the mean of matrices. This average is then saved as a reshaped image for my audience to view, while the original averaged data is maintained to continue using in the code. We subtract the average from the data to see how the pictures deviate from the average by finding the reduced SVD of the result, also a built-in function:
# find average of data
avg = np.mean(data_reshaped, axis=0)
​
# uncomment to view
#plt.imshow(np.reshape(avg, (h, w)), cmap='gray')
#plt.show()
# cannot save reshape unless temp var
# because we're using the original shape later
plt.imsave(path+'/average.jpg', np.reshape(avg, (h, w)), cmap='gray')
"""
observe how pictures deviate from average;
study data by finding the reduced SVD of data - average
"""
# subtract average from data
X = data_reshaped - np.ones((6, 1)) @ avg.reshape((1, -1))
# reduced svd
U, S, VT = np.linalg.svd(X, full_matrices=False)
V = VT.T
The average of all the data is quite jumbled, but that is to be expected:

Picture Created by Average of Data
Next, let us retrieve the minimum number of singular values that hold at least 90% and 99% of the information in the training set. To do so, we rescale the singular values. Then we can use a loop to find the values needed. In other words, we will find the largest k such that the following E_k is greater than 0.9 and 0.99:
E = np.cumsum(S**2) / np.sum(S**2)
k_90 = 0
for i in range(len(E)):
if E[i] > 0.9:
k_90 = i
break
k_99 = 0
for i in range(len(E)):
if E[i] > 0.99:
k_99 = i
break
Taking into account that Python uses 0-indexing, we note that index 3, or the fourth singular value, holds a value that is > 0.9, and index 4, or the fifth singular value, holds a value that is > 0.99. Finally, these singular values will be used to reconstruct the training set, such that the resulting images should be 90% and 99% accurate to the originals. This means slicing the factored matrices resulting from finding the reduced SVD, and calculating the scores for the training images based on the previously found singular values:
U_3 = U[:, 0:4]
S_3 = np.diag(S[0:4])
scores_3 = U_3 @ S_3
V_3 = V[:, 0:4]
reconstructed_3 = scores_3 @ V_3.T
for i in range(6):
img = reconstructed_3[i, :] + avg
img = np.reshape(img, (h, w))
U_4 = U[:, 0:5]
S_4 = np.diag(S[0:5])
scores_4 = U_4 @ S_4
V_4 = V[:, 0:5]
reconstructed_4 = scores_4 @ V_4.T
for i in range(6):
img = reconstructed_4[i, :] + avg
img = np.reshape(img, (h, w))
​Comparison between the previously shown fourth picture and its reconstructions:

4th Training Picture, Grey

3rd Singular Value

4th Singular Value
Here, we can see that the middle picture, reconstructed using the third singular value, jumbled and not clear, as it only contains > 90% of the original information. On the other hand, the image on the far right, reconstructed using the fourth singular value, is almost indistinguishable from the original on the far left. This is because it contains > 99%. If the images utilized RGB values, maybe there would be some slight differences in color, however, that is currently not an option to follow for now.
Beginning the Finalized Version
Once the reconstruction method was complete in the small test, I decided to use what I had written to begin the official version of the project. Aside from copying the code, I also included functions within the conversion and reconstruction files to be able to work with several different directories rather than a single directory or value. This was because I would have included three levels of difficulties regarding image comparison, meaning I would have three different testing datasets and directories in addition to the training set. Another addition was a file I named run_files.py that would be used to run everything in one click rather than running all of the different files one at a time. In the future, this will allow me to calculate the runtime of different methods of reconstruction and recognition.
There were a couple of issues that arose when running the final conversion and reconstruction files. At times, the image data would change if the single run_files.py was run after running the image conversion file. It took a lot of commenting out lines to realize there wasn't a solution that allowed all of the images to be converted in one Python script. As a result, I had to separate the data that kept being affected, such that instead of one image conversion file, there were three - one for the training set and medium difficulty test set, one for the easy difficulty, and one for the hard.
Another issue I ran into was regarding the converted data. I included much more than 6 pictures in the training and testing sets of the final version, meaning that the size of the converted data was way too large for Github to handle normally. Fortunately, they allow the configuring of Large File Storage, however, this needs to be set up before committing and attempting to push a large amount of data, and any time the data is changed it must be tracked and committed specifically using lfs.
This time, I plotted the singular values and their scaled energies:

Singular Values and Scaled Energies
As a final addition, I also included a method calculating the average accuracy of reconstruction by comparing the reconstructed images directly to the originals. This value will be manually input into a separate CSV as more reconstruction methods are tested.
def accuracy(original, reconstructed, count):
h = w = 300
err = []
avg = []
# calculate error (without percent)
for i in range(count):
o = original[i]
r = reconstructed[i]
err.append(abs(o-r)/r)
# calculate accuracy
err = np.array(err)
acc = np.ones((count, h*w)) - err
# calculate average accuracy per image
for i in range(count):
tmp = acc[i]
avg.append(np.sum(tmp)/len(tmp))
avg = np.array(avg)
Reconstruction Observations
There is a noticeable difference in the reconstructed fourth photo, due to having several images affecting the SVD rather than only a handful:

4th Training Picture, Grey

4th Picture Official Reconstruct > 90

4th Picture Official Reconstruct > 99
Comparing the small test's reconstructions to the official version, we can see that there is a lot less clarity in the official versions, even though both contain either > 90% or > 99% of the original dataset's information. It appears that the size of the dataset greatly impacts the reconstruction of the images.

Small 4th Reconstruction > 90

Small 4th Reconstruction > 90

4th Picture Official Reconstruct > 90

4th Picture Official Reconstruct > 90
Recognition
The last step of this project was to test recognition between the training and testing sets. The training set contains 60 images, where the first 30 are of the first character used in the small set, and the rest are evenly split among 6 unrelated characters. All three testing sets are comprised similarly: the first 10 images are of the first character, with the next 10 being of two of the unrelated characters, where these characters are based on the difficulty of the testing set. The easy difficulty contains two characters that are meant to be easily distinguishable from the first character, with the medium and hard difficulties containing characters that are meant to be increasingly more difficult for the computer to distinguish from the first character.
To implement the recognition method, we want to find the scores of the testing data. This will then be compared with the scores from the training data, where the goal is to find the smallest difference between each row of the scores, with the hope that the smallest distance accurately identifies the correct character.
def recognition(test_data, scores, V):
avg = np.mean(test_data)
Y = test_data - np.ones((len(test_data), 1)) @ avg.reshape((1, -1))
scores_test = Y @ V
scores_len = len(scores)
scores_test_len = len(scores_test)
min_ind = np.zeros(scores_test_len)
dist = np.zeros(scores_len)
"""
find difference between each row of scores_test and scores
smallest distance correctly identifies each character?
"""
for i in range(scores_test_len):
for j in range(scores_len):
dist[j] = np.linalg.norm(scores_test[i] - scores[j], 2)
ind = np.argmin(dist)
min_ind[i] = ind
To conduct the comparison, I wrote a quick helper function that grouped the expected indices values as tuples and used loops to pass them to the accuracy check method.
def check_tuple(testE_reco, testM_reco, testH_reco):
# initial character
e_init = (testE_reco, 0, 30, 0, 10)
m_init = (testM_reco, 0, 30, 0, 10)
h_init = (testH_reco, 0, 30, 0, 10)
init = [e_init, m_init, h_init]
init_actual = []
​
for i in range(len(init)):
init_actual.append(accu(init[i]))
# unrelated characters, e/m/h
e_1 = (testE_reco, 30, 35, 10, 15)
e_2 = (testE_reco, 35, 40, 15, 20)
m_1 = (testM_reco, 40, 45, 10, 15)
m_2 = (testM_reco, 45, 50, 15, 20)
h_1 = (testH_reco, 50, 55, 10, 15)
h_2 = (testH_reco, 55, 60, 15, 20)
unrelated = [e_1, e_2, m_1, m_2, h_1, h_2]
unrel_actual = []
for j in range(len(unrelated)):
unrel_actual.append(accu(unrelated[j]))
return [init_actual, unrel_actual]
After saving the shortest distance values as an array, I compared the expected and actual results of the recognition method as a percentage by passing in the starting and ending indicies corresponding to the expected values, as well as the starting and ending indicies corresponding to the results. This was done several times to check the accuracy for each character. (The expected parameter is not used in calculations and is mainly for printing out results for viewing).
def accu(temp):
indicies = temp[0]
start = temp[1]
end = temp[2]
in_start = temp[3]
in_end = temp[4]
actual = 0
for i in range(in_start, in_end):
if indicies[i] >= start and indicies[i] < end:
actual += 1
accuracy = actual / len(indicies[in_start:in_end]) * 100
After saving the shortest distance values as an array, I compared the expected and actual results of the recognition method as a percentage by passing in the starting and ending indicies corresponding to the expected values, as well as the starting and ending indicies corresponding to the results. This was done several times to check the accuracy for each character. (The expected parameter is not used in calculations and is mainly for printing out results for viewing).
Future Steps
With the conclusion of this recognition method, we have our first piece of data to eventually be used for analysis. As previously mentioned, the plan is to further use varying methods of reconstruction and recognition to analyze results and compare efficiency. In the future, there will be more data to include with the singular values that contain > 90% and > 99% of the training set's information, as well as the comparison of expected versus the actual image recognized.
Click here to view the data and scripts.
Recent Posts
Related Posts