Arnav Arnav

I am a Data Scientist and a Software Engineer with a Masters in Data Science from Indiana University, Bloomington, and a Bachelors Degree in Computer Science and Engineering from Tezpur University Assam. I am passionate about data science and I have experience working on various forecasting, machine learning, deep learning, NLP and data analysis projects in my academic and professional career. I also have experience building web and mobile applications and working with various big data technologies. I am excited about the advances in applications of machine learning models for signal processing and computer vision and I like to explore new research in graphical models, reinforcement learning and deep learning. I love pyaling my guitar and going on hikes on my vacations. I speak 4 languages and I love to learn more.

Email | CV | LinkedIn | Github

Skills

Machine Learning Topics: Classification, Regression, Clustering, Recommendation System, Bayesian Inference, Natural Language Understanding, Computer Vision
Machine Learning Models: Linear/Logistic Regression, Generalied Linear Models, Decision Trees, Random Forest, SVM, Naive Bayes, Ensemble Methods, K-Means, DBSCAN, Agglomerative & Divisive Clustering, KNN, Bag-of-Words, Collaborative Filtering, Deep Neural Networks (CNN, RNN, LSTM, VAE), Boltzmann Machines
Programming Languages: Scala, Java, SQL, Python, R, SAS, C++, MATLAB
Libraries and Software: NumPy, Pandas, Scikit-Learn, SciPy, Pytorch, TensorFlow, Keras, NLTK, Spacy, Stanford CoreNLP PySpark, networkx
Data Visualization: Seaborn, Matplotlib, ggplot, Kibana, Altair, plotly, Tableau
Web Development: HTML, CSS, Javascript, Coffeescript, Bootstrap, Materialize CSS, Django, Ruby on Rails, Ember.js
Software Dev Tools: Swagger Codegen, Docker CE, Kubernetes, Travis-CI, Virtualbox, Devstack, MQTT
Version Control: Git
Databases: PostgreSQL, MySQL, MongoDB, Neo4j
Preferred Operating Systems: Linux (Ubuntu, Fedora, Centos, Mint), Windows
Preferred Editors: Vim, GNU Emacs, Jupyter Notebook, VSCode

Recent Projects

	Enhancing Predictions on High Variance Time Series Data Decision Science Internship, The Walt Disney Company, Orlando, August'19 - April'20 With high variance data, the mean forecasts are good at tracking trends but can be far from actual behavior for individual items, making them less informative. Implemented a state space approach to dynamically adjust regression based demand forecasts to match actual behavior based on recently observed data points using python and SAS that improved 70% of Disney Cruise Line booking forecasts with a mean improvement of 35% in absolute error.
	Deep Gaussian Processes for Representation Learning Python, Sklearn, GPy, GPFlow Studied the use of variational inference in along with hierarchical graphical models for relresentaion learning and implemented a hierarchical Gaussian Process Latent Variable model for representation learning for supervised and unsupervised tasks Tested the performance of the learned representations on image reconstruction and classification tasks on oil flow, MNIST handwritten characters and Frey faces datasets.
	Speaker Identification and Verification from Audio Python, Pytorch, Librosa, Audio Processing, Deep Learning Trained a convolutional Siamese network with contrastive loss on the STFT representations of audio from a subset of the VoxCeleb dataset on AWS to uiquely identify a small subset of speakers from a dataset of a huge number of speakers in a text independent manner. Evaluated the network based on precision and recall on the dataset and achieved 0.78 precision and 0.84 recall on the dataset. Used T-SNE based dmensionality reduction on the learned embeddings as a sanity check for class separation after training the model and developed a terminal application for speaker identification and verification.
	Open Domain Information Extraction Python, NLTK, Stanford CoreNLP, Spacy, XMLRPC, Knowledge Graphs, Ontologies Extracting subject-predicate-object relations from text data and stord them in a Neo4j knowledge graph using Stanford CoreNLP and Spacy for generating parse trees and extracting relations. Used existing large knowledge graphs like MS Concept Graph and DBpedia to get a set of possible closest hypernym(type) of each of the entities and used Word2vec word embeddings and cosine similarity based SSE function to disambiguate the extracted hypernyms (type). Enabled semantic search on this Knowledge Graph via Neo4j
	Flask App for Image Captioning using Deep Lrarning Python, Flask, Keras, VGG16, VGG19, ResNet50, LSTM, Flickr8K Extracted Image features from different CNN object detection models, and trained a multi-input sequence to sequence LSTM model to learn Image to Caption mappings. Trained the model with image features extracted from differnt object detection CNN models and compared performance on the image captioning task. Performed hyperparameter tuning to learn model parameters (learning rate, LSTM size, embedding_size, dropout). Evaluated performance of the model using BLEU1 - BLEU4 scores on Flickr8K dataset. Built a Flask application using to caption images using the trained model.
	Swagger Service for Openstack Python, Swagger API, openstack, docker, Used Swagger codegen and python to create REST endpoints for openstack services (start, stop, delete, create, and list VMs), from swagger YAML specifications, and potentially provide an on-demand additional layer for user privillege and access management Deploy the application with docker, and test on chameleon cloud
	IoT monitoring application using MQTT and Raspberry pi Python, MQTT, IoT, Raspberry Pi Used lightweight MQTT protocol to remotely control the robot car over wifi to aid monitoring and surveying applications. Used MQTT to stream image frames in real time from Raspberry Pi onboard camera to aid navigatoin.

Credits: Jon Barron