Building a mass college surveillance system using google’s facenet + faiss + postgres.

Consider this a part 3 of the series , where I scraped a (1) college’s vulnerable library database and (2) made a face mash/hot||not clone. For the third part, I am going to deploy a SOTA facial recognition model for all the 8000+ student faces I scraped from the library’s site.

If you, like me, haven’t been following the state of the art in facial recognition, would’t be aware that how GOOD these models have gotten over the years.

Experiments show that human beings have 97.53% accuracy on facial recognition tasks whereas some models like googles facenet have already reached and passed that accuracy level. (facet has 99.65% accuracy on Labeled Faces in the Wild  benchmark)

serengil has made an awesome wrapper for all the popular soda models like facenet, egg-face etc. called deepFace (not to be confused with facebook’s deepface model).

First order of business is getting all the images from our database with all the enums.

We do so by using this simple command >

Next we’ll need to convert these images to vector embeddings generated by the facenet mode. We’ll use deepFace deepface.represent() method to do the same. It can take in base64 image as its input, which is perfect for our use case.

Then we package this data into a 1d vector and push into a representation list. faiss library we’ll use in the future will required a 2d vector. So we’ll use a simple for loop to get a vector of (8000,512) dims (facenet512 produces a array of 512 chunks).

Then we encode everything into a numpy array and save it on the disk. This way we won’t have to generate the embeddings again when we deploy it to production later.

Next we pass this numpy array to the faiss library to index. Without faiss we would have to calculate the cosine distance between each vector when we search for a similarity match for a face. Faiss is pretty neat overall and allow us to run a nearest neighbour search that runs in millisecond even on gigantic multi-million vector datasets. check it out here.

To search for a match in an image we first convert the image to its vector representation then make a 3d np array and use method to find the k most similar faces in the database.

Faiss return the index of the vector, we can pass this index to our representation vector to get the enum of the face.

And voila! we have made ourselves a super scalable surveillance system. Now to actually deploy this we’ll first have to go over some trivial hurdles.

Here’s the quick and dirty jupyter notebook showcasing what I described above.


Connecting DB and Making it deployable to ☁️