Portfolio

About Me 🧑🏻‍🎓

Passionate and Self-taught Data Scientist.
I have been working in the Field of Data Science since 2016. Always engaged in learning and self-improvement.
Part time Quoran and enjoy some of my time watching TV Series. A Marvel Fan.

Work Experience 🖥

Data Scientist (NLP)
Shaw Academy (January 2021 - Present)

As an NLP enthusiast i got the wonderful opportunity to work with world's largest EduTech leaders Shaw Academy as an NLP expert.
At Shaw
- We are continuously working on enhancing student learning and interaction experience through Morpheus, which is a Chatbot having QnA capabilities.
- It is trained to answer general FAQ questions and technical questions which students have during online lessons on various Topics.
- Working on self-improving chat experience through deep reinforcement learning
- Morpheus is powered by some state of the are NLP models like T5 and BERT and has abilities to generate question-answer pair from unstructured text.
- Apart from Morpheus i am currently engaged in developing Interactive analytics dashboard for Webinar Chat and mining important hidden trends and topics from student-agent conversations.
- Enhancing user search with ElasticSearch - Suggesters.

Lead Data Scientist
Societe Generale Global Solution Centre (April 2020 - Present)

Got the opportunity to lead my team. Responsibility to conduct and deliver Proof of Concepts by carefully evaluating the Return of Interests. As part of the Innovation Team , I have contributed to different projects and Proof of Concepts. It involved interacting with different business groups to understand the existing process and give value propositions attached to process transformation. I have been working on state of the art NLP techniques, Image Processing/Computer Vision and Deep Learning architectures.

Data Scientist
Societe Generale Global Solution Centre (June 2017 - March 2020)

Joined SocGen as a Data Scientist. Worked on Multiple projects which involved OCR quality enhancement and text extraction from scanned documents. Implemented state of the art Object Detection and Natural Language Processing Modules (NER) for various projects. Travelled to Paris for 1 month to study and build an Solution for Entity extraction from Scanned documents of French clients. Lead of Auto-ML platform during pilot phase. Engaged in continuous improvement by adding different state of the art Machine Learning and Deep Learning piepilines to the platform.

Data Science Intern
Delhivery Logistics (Jan 2017 - April 2017)

Joined Delhivery as an Data Science Intern. Delhivey is a leading partner in Logistics in India. Developed an CNN based Image Recognition system for scanned goods at their Delivery Centers

Industry Projects

Smart Inbox

Developed Smart Inbox for Business Line which deals with client on-boarding and verification. An outlook connector to listen to the mailbox and gets triggered when email is received from client. The subject and content of email is used to classify the mail and the documents/images attached are downloaded and passed through recognition and extraction pipeline. The final report is generated and sent to analyst for final verification.

Find out more

KYC Engine

Worked on building KYC Engine for French Clients. It involved Image Pre-processing, Document Recognition, Text Extraction, Named Entity Extraction and Face Recognition in the pipeline. All documents for a particular client are passed through these pipelines and final consolidation of results is displayed on a editable dashboard.OCR related inaccuracies are handled on the basis of sensitivity of fields.

Find out more

Load Document Verification

Designed and Developed Events driven pipeline for Document Classification, Signature Detection & Verification with State of the art Named Entity Recognition. All documents (Payslips, Self-Certifications , ID cards) were scanned and were french. We trained all models from scratch using French Embedding. For Document Recognition we used Inception-v3 for transfer learning and fine tuning. RetinaNET for signature detection and Flair NLP framework for NER.

Find out more

Auto ML Platform

Core member of platform design and development. Built for conducting AI/ML related experiments.On-fly OCR functionality with option to choose Tesseract (Open source) Engine or ABBYY Engine. Ability to create multiple projects with different versions and tag documents on the platform itself for recognition and extraction purpose. Tag the documents with fieldsDeveloped template based extraction algorithms for documents having fixed template and for semi-structured/un-structured documents

Find out more

Research and Experiments

DWT-DCT based blind audio watermarking using Arnold scrambling and Cyclic codes Research

A novel audio watermarking based algorithm is proposed using Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT). Furthermore, the Arnold transform and error correction technique are utilized to improve the performance of proposed algorithm. The performance is measured using Bit Error Rate (BER), Peak Sound to Noise Ratio (PSNR) and Structural Similarity Index (SSIM) between the extracted watermark and original watermark.

Download Reseach paper

Sound Source Localization using GCC-PHAT with TDOA Estimation Research

This paper mainly focuses on the Localization of the sound source in 2-D plane using the concept of Time Delay of Arrival of Signals to the respective microphones. TDE between replicas of signals is intrinsic to many signal processing applications and DOA estimation of acoustic signals using a set of spatially separated microphones has many practical applications in everyday life. DOA estimated from the set of microphones can be used to automatically steer cameras to the speaker in a conference room.

Download Reseach paper

Text Summarization Experiment

A text summarizer API built with Flask Rest-plus. It has two end points. BertSummarizer endpoint which uses gmm based approach to generate summary of input text blob and Sentence Similarity Summarizer which builds similarity matrix from word vectors and similarity matrix. The it uses pagerank algorithm to rank important sentences.

Miscellaneous Experiments

This repository has all the ML/DL modelling and Data Analysis tasks which i performed for different startup and companies i interviewed for recently

Subir Verma

Data Scientist (NLP)👨🏼‍💻

About Me 🧑🏻‍🎓

Work Experience 🖥

Data Scientist (NLP)
Shaw Academy (January 2021 - Present)

Lead Data Scientist
Societe Generale Global Solution Centre (April 2020 - Present)

Data Scientist
Societe Generale Global Solution Centre (June 2017 - March 2020)

Data Science Intern
Delhivery Logistics (Jan 2017 - April 2017)