Arnav Arnav
I am a Data Scientist and a Software Engineer with a Masters in Data Science from
Indiana University, Bloomington, and a Bachelors Degree in Computer Science and Engineering
from Tezpur University Assam. I am passionate about data science and I have
experience working on various forecasting, machine learning, deep learning, NLP and data analysis projects in my academic and professional career.
I also have experience building web and mobile applications and working with various big data technologies.
I am excited about the advances in applications of machine learning models for signal processing and computer vision and
I like to explore new research in graphical models, reinforcement learning and deep learning. I love pyaling my guitar and
going on hikes on my vacations. I speak 4 languages and I love to learn more.
Email  | 
CV  | 
LinkedIn  | 
Github
|
|
Skills
- Machine Learning Topics: Classification, Regression, Clustering, Recommendation System, Bayesian Inference, Natural Language Understanding, Computer Vision
- Machine Learning Models: Linear/Logistic Regression, Generalied Linear Models, Decision Trees, Random Forest, SVM, Naive Bayes, Ensemble Methods, K-Means, DBSCAN, Agglomerative & Divisive Clustering, KNN, Bag-of-Words, Collaborative Filtering, Deep Neural Networks (CNN, RNN, LSTM, VAE), Boltzmann Machines
- Programming Languages: Scala, Java, SQL, Python, R, SAS, C++, MATLAB
- Libraries and Software: NumPy, Pandas, Scikit-Learn, SciPy, Pytorch, TensorFlow, Keras, NLTK, Spacy, Stanford CoreNLP PySpark, networkx
- Data Visualization: Seaborn, Matplotlib, ggplot, Kibana, Altair, plotly, Tableau
- Web Development: HTML, CSS, Javascript, Coffeescript, Bootstrap, Materialize CSS, Django, Ruby on Rails, Ember.js
- Software Dev Tools: Swagger Codegen, Docker CE, Kubernetes, Travis-CI, Virtualbox, Devstack, MQTT
- Version Control: Git
- Databases: PostgreSQL, MySQL, MongoDB, Neo4j
- Preferred Operating Systems: Linux (Ubuntu, Fedora, Centos, Mint), Windows
- Preferred Editors: Vim, GNU Emacs, Jupyter Notebook, VSCode
|
|
Enhancing Predictions on High Variance Time Series Data
Decision Science Internship, The Walt Disney Company, Orlando, August'19 - April'20
With high variance data, the mean forecasts are good at tracking trends but
can be far from actual behavior for individual items, making them less informative.
Implemented a state space approach to dynamically
adjust regression based demand forecasts to match actual
behavior based on recently observed data points using python and
SAS that
improved 70% of Disney Cruise Line booking forecasts
with a mean improvement of 35% in absolute error.
|
 |
Deep Gaussian Processes for Representation Learning
Python, Sklearn, GPy, GPFlow
Studied the use of variational inference in along with hierarchical graphical models for relresentaion learning and implemented a hierarchical Gaussian Process Latent Variable model for representation learning for supervised and unsupervised tasks
Tested the performance of the learned representations on image reconstruction and classification tasks on oil flow, MNIST handwritten characters and Frey faces datasets.
|
 |
Speaker Identification and Verification from Audio
Python, Pytorch, Librosa, Audio Processing, Deep Learning
Trained a convolutional Siamese network with contrastive loss on the STFT representations of audio from a subset of the VoxCeleb dataset on AWS
to uiquely identify a small subset of speakers from a dataset of a huge number of speakers in a text independent manner.
Evaluated the network based on precision and recall on the dataset and achieved 0.78 precision and 0.84 recall on the dataset.
Used T-SNE based dmensionality reduction on the learned embeddings as a sanity check for class separation after training the model and
developed a terminal application for speaker identification and verification.
|
 |
Open Domain Information Extraction
Python, NLTK, Stanford CoreNLP, Spacy, XMLRPC, Knowledge Graphs, Ontologies
Extracting subject-predicate-object relations from text data and stord them in a Neo4j knowledge graph
using Stanford CoreNLP and Spacy for generating parse
trees and extracting relations.
Used existing large knowledge graphs like MS Concept
Graph and DBpedia to get a set of possible closest
hypernym(type) of each of the entities and
used Word2vec word embeddings and cosine similarity
based SSE function to disambiguate the extracted hypernyms (type).
Enabled semantic search on this Knowledge Graph via Neo4j
|
 |
Flask App for Image Captioning using Deep Lrarning
Python, Flask, Keras, VGG16, VGG19, ResNet50, LSTM, Flickr8K
Extracted Image features from different CNN object
detection models, and trained a multi-input sequence to sequence LSTM model
to learn Image to Caption mappings.
Trained the model with image features extracted from
differnt object detection CNN models and compared performance on the
image captioning task.
Performed hyperparameter tuning to learn model
parameters (learning rate, LSTM size, embedding_size,
dropout).
Evaluated performance of the model using BLEU1 -
BLEU4 scores on Flickr8K dataset.
Built a Flask application using to caption images
using the trained model.
|
 |
Swagger Service for Openstack
Python, Swagger API, openstack, docker,
Used Swagger codegen and python to create REST endpoints
for openstack services (start, stop, delete, create, and
list VMs), from swagger YAML specifications, and potentially provide an
on-demand additional layer for
user privillege and access management
Deploy the application with docker, and test on
chameleon cloud
|
 |
IoT monitoring application using MQTT and Raspberry pi
Python, MQTT, IoT, Raspberry Pi
Used lightweight MQTT protocol to remotely control the robot car over wifi to
aid monitoring and surveying applications.
Used MQTT to stream image frames in real time from
Raspberry Pi onboard camera to aid navigatoin.
|
|