Arnav Arnav

I am a Data Scientist and a Software Engineer with a Masters in Data Science from Indiana University, Bloomington, and a Bachelors Degree in Computer Science and Engineering from Tezpur University Assam. I am passionate about data science and I have experience working on various forecasting, machine learning, deep learning, NLP and data analysis projects in my academic and professional career. I also have experience building web and mobile applications and working with various big data technologies. I am excited about the advances in applications of machine learning models for signal processing and computer vision and I like to explore new research in graphical models, reinforcement learning and deep learning. I love pyaling my guitar and going on hikes on my vacations. I speak 4 languages and I love to learn more.

Email  |  CV  |  LinkedIn  |  Github


Skills
  • Machine Learning Topics: Classification, Regression, Clustering, Recommendation System, Bayesian Inference, Natural Language Understanding, Computer Vision
  • Machine Learning Models: Linear/Logistic Regression, Generalied Linear Models, Decision Trees, Random Forest, SVM, Naive Bayes, Ensemble Methods, K-Means, DBSCAN, Agglomerative & Divisive Clustering, KNN, Bag-of-Words, Collaborative Filtering, Deep Neural Networks (CNN, RNN, LSTM, VAE), Boltzmann Machines
  • Programming Languages: Scala, Java, SQL, Python, R, SAS, C++, MATLAB
  • Libraries and Software: NumPy, Pandas, Scikit-Learn, SciPy, Pytorch, TensorFlow, Keras, NLTK, Spacy, Stanford CoreNLP PySpark, networkx
  • Data Visualization: Seaborn, Matplotlib, ggplot, Kibana, Altair, plotly, Tableau
  • Web Development: HTML, CSS, Javascript, Coffeescript, Bootstrap, Materialize CSS, Django, Ruby on Rails, Ember.js
  • Software Dev Tools: Swagger Codegen, Docker CE, Kubernetes, Travis-CI, Virtualbox, Devstack, MQTT
  • Version Control: Git
  • Databases: PostgreSQL, MySQL, MongoDB, Neo4j
  • Preferred Operating Systems: Linux (Ubuntu, Fedora, Centos, Mint), Windows
  • Preferred Editors: Vim, GNU Emacs, Jupyter Notebook, VSCode

Recent Projects

Enhancing Predictions on High Variance Time Series Data
Decision Science Internship, The Walt Disney Company, Orlando, August'19 - April'20

With high variance data, the mean forecasts are good at tracking trends but can be far from actual behavior for individual items, making them less informative. Implemented a state space approach to dynamically adjust regression based demand forecasts to match actual behavior based on recently observed data points using python and SAS that improved 70% of Disney Cruise Line booking forecasts with a mean improvement of 35% in absolute error.

prl

Deep Gaussian Processes for Representation Learning
Python, Sklearn, GPy, GPFlow

Studied the use of variational inference in along with hierarchical graphical models for relresentaion learning and implemented a hierarchical Gaussian Process Latent Variable model for representation learning for supervised and unsupervised tasks Tested the performance of the learned representations on image reconstruction and classification tasks on oil flow, MNIST handwritten characters and Frey faces datasets.

prl

Speaker Identification and Verification from Audio
Python, Pytorch, Librosa, Audio Processing, Deep Learning

Trained a convolutional Siamese network with contrastive loss on the STFT representations of audio from a subset of the VoxCeleb dataset on AWS to uiquely identify a small subset of speakers from a dataset of a huge number of speakers in a text independent manner. Evaluated the network based on precision and recall on the dataset and achieved 0.78 precision and 0.84 recall on the dataset. Used T-SNE based dmensionality reduction on the learned embeddings as a sanity check for class separation after training the model and developed a terminal application for speaker identification and verification.

prl

Open Domain Information Extraction
Python, NLTK, Stanford CoreNLP, Spacy, XMLRPC, Knowledge Graphs, Ontologies

Extracting subject-predicate-object relations from text data and stord them in a Neo4j knowledge graph using Stanford CoreNLP and Spacy for generating parse trees and extracting relations. Used existing large knowledge graphs like MS Concept Graph and DBpedia to get a set of possible closest hypernym(type) of each of the entities and used Word2vec word embeddings and cosine similarity based SSE function to disambiguate the extracted hypernyms (type). Enabled semantic search on this Knowledge Graph via Neo4j

prl

Flask App for Image Captioning using Deep Lrarning
Python, Flask, Keras, VGG16, VGG19, ResNet50, LSTM, Flickr8K

Extracted Image features from different CNN object detection models, and trained a multi-input sequence to sequence LSTM model to learn Image to Caption mappings. Trained the model with image features extracted from differnt object detection CNN models and compared performance on the image captioning task. Performed hyperparameter tuning to learn model parameters (learning rate, LSTM size, embedding_size, dropout). Evaluated performance of the model using BLEU1 - BLEU4 scores on Flickr8K dataset. Built a Flask application using to caption images using the trained model.

swagger

Swagger Service for Openstack
Python, Swagger API, openstack, docker,

Used Swagger codegen and python to create REST endpoints for openstack services (start, stop, delete, create, and list VMs), from swagger YAML specifications, and potentially provide an on-demand additional layer for user privillege and access management Deploy the application with docker, and test on chameleon cloud

prl

IoT monitoring application using MQTT and Raspberry pi
Python, MQTT, IoT, Raspberry Pi

Used lightweight MQTT protocol to remotely control the robot car over wifi to aid monitoring and surveying applications. Used MQTT to stream image frames in real time from Raspberry Pi onboard camera to aid navigatoin.


Credits: Jon Barron