[ GitHub ] - Apr 2022
This work was done when I was working as a Graduate Data Science Researcher for Viasat (The Data Mine). Super-resolution (SR) models aim to convert low resolution images to high resolution images. Training practical SR models are typically challenging because these models tend to get biased toward training data distribution (type of degradation of low resolution (LR) images). Traditional SR models don’t generalize well to real-world unknown test time images. Blind image SR task aims to super-resolved LR images that result from an unknown degradation process and generate high resolution (HR) images. Blind SR models are more robust such that they become invariant to the degradation process of the LR image input. In this project, I carry out a detailed comparative analysis of two recent state-of-the-art Blind and one older but prominent Non-blind SR method. All three methods were originally trained (by their authors) in different training and testing environments. And so, these pre-trained models cannot be compared directly, which is the primary motivation of this project. To compare these models fairly, in this project, I carry out detailed experiments of training and evaluating these models in a common training and testing setting.
[ GitHub ] [ Youtube ] - May 2021
This was my Bachelor's Final Year project. It was an attempt to build a system like Alexa, Google Home, and Siri with a knowledge database and basic face animation. Although designed for multiple general use cases, this project was built around a specific use-case of Airport and flight-based customer service. It is developed using open-sourced libraries like NVIDIA’s Flowtron library, used for text to speech or speech generation (having 100s of voices). The Joint-BERT pre-trained model was used for intent classification and tagging. And, the SpeechRecognition library was used for speech recognition. All of them integrated as a single system with an SQL database that contained all of the data (knowledge) of flights, destinations, seats, bookings, etc. An automatic face animation generation was also developed which used the output sound frequencies to render the face animation result in real-time. The current system is robust enough to intelligently and accurately identify and assist users with around five types of queries.
[ GitHub ] [ Videos ] - Aug 2020
This project idea was selected at the Smart India Hackathon 2020 grand finale. It is a three orthogonal camera (front, top, side) based dynamic human authentication system for an arbitrary building gate. Humans passing through the gate get authenticated using their face, gait, pose, etc., features. The front camera recognizes Face and Body measurements. The side camera tracks the gait, and the top camera detects piggybacking attacks. The state-of-the-art algorithms for Face Detection, Face Landmarking, Face Recognition, Face Anti-Spoofing, Gait recognition, and Pose Estimation are implemented and benchmarked. The system was deployed as a web-Application using Django, SQLite, and AJAX. It had features like dynamic user registration, automated video saving & future retrieval, activity logging, users database and profiling, and real-time monitoring. A python-based library for easy integration and experimentation of various state-of-the-art face detection, landmarking, and recognition models was also developed.
[ GitHub ] [ Paper ] [ Youtube ] - May 2020
This work got published at the CVPR 2020 conference. CascadTabNet is an automatic table recognition method to convert tabular data in document images to its digital format. It is an end-to-end approach for solving both problems (table detection and structure recognition) using a single Convolution Neural Network (CNN). The model used is a Cascade mask R-CNN HRNet based image segmentation model that predicts the regions of tables and recognizes the structural body cells from the detected tables in a single inference. The table structure cells predictions of the model are corrected by a robust rule-based algorithm. This model was trained on publically available datasets, and we improved the training by using novel image augmentation techniques. In its time, it attained the best accuracy results for Table detection and Structure recognition on public datasets, being the state-of-the-art. The mmdetection open-sourced library was used to develop this project.
[ GitHub ] [ Paper ] [ Youtube1 ] [ Youtube2 ] - Sep 2019
This work got published at the IEEE PuneCon 2019 conference. The project involved building an intelligent traffic junction that would intelligently estimate the traffic present in each of the four ways and control the traffic lights accordingly. We used four IP cameras (one for each lane) and developed a novel algorithm to estimate the traffic density using image processing and machine learning techniques. The cameras are read by the raspberry pi board using a local wireless network. The pi is also connected with traffic signal lights using its I/O pins and relay switches. It efficiently controls the traffic based on traffic density (which it computes using the algorithm) and appropriately increases or decreases the time for the green light for each lane. The pi is also connected to the internet and a remote server. A web application is developed and deployed on this remote server. It monitors the traffic junction status remotely and enables the user to control the junction remotely. Real-time logging also makes it easy to do the historical analysis.
[ GitHub ] - Apr 2019
This project was selected for the Smart India Hackathon 2019 Grand Finale. Tobacco classification is a difficult problem, and only highly expert humans can correctly tell the grade of tobacco. This project was an effort to automate this process using an AI-based system. Our method predicts the grade of tobacco leaves using images based on various features like color, texture, and ripeness. We extract various features out of images using classical algorithms like Local Binary Patterns, Haralik, Gabor, Color channels, etc. Machine Learning models like SVM are trained on these features. We achieve over 90% accuracy using this approach. Deep learning-based CNN models struggle and do not attain more than 60% accuracy due to the scarcity of the data. The system was deployed as a local machine application using PyQT based User interface. This project was built for ITC Ltd.