About Me
"I don't fear failures, I fear of not trying"
I am an aspiring data scientist and currently working my way to a specialization in Data Science from University of Washington. With my previous background in programing, I am applying my python, shell, command-line, UNIX, and R programming skills to consolidate, clean, and transform data. I am performing exploratory data analysis on data to explore and find patterns in data and applying statistical and computational models to infer from data. Next step in learning is to build automated methods of data analysis following predictive and descriptive methods of machine learning. I am learning Hadoop and Spark frameworks to analyze and derive insight from large-scale, heterogeneous data and using cloud services such as AWS for scaling and distributed computing.
I have developed enough skill sets to showcase my Big Data project portfolio. Please have a look and leave any valuable comments. Thank you.
Projects
- Machine Learning
- Twitter Analysis
- Crime Incident Analysis - Seattle
Using Boston housing price data from UCI Machine Learning Repository to perform supervised machine learning algorithms - K-Nearest Neighbor, Gradient Descent, Naive Bayes, PCA. I have implemented K-Nearest neighbor and Gradient descent from scratch to understand the underlying functionality of the machine learning algorithm. Naive Bayes and PCA were implemented using sci-kit learn python library. Regression is a powerful tool to unearth the patterns in your data and do predictions with machine learning algorithms. Follow the Supervised Machine Learning link to see the working of these machine learning algorithms on Boston housing data.
Skills Used: Machine Learning, Python Programming, UNIX, Exploratory Data Analysis
In this project, I used my data analysis and programming skills in python to explore the trends in the tweets for a reality tv series. I collected live tweets using Twitter REST APIs by writing a twitter listener in python, which takes keywords and twitter handle to collect the related tweets. The next step in the analysis was data cleansing, data transformation, data consolidation, and finally finding various patterns in the data. The in-depth details on the project and github code can be found here Twitter-Analysis
Skills Used: Python Programming, UNIX, mongoDB, AWS, Exploratory Data Analysis, Pattern Matching
This project is performed on the data sourced from [data.seattle.gov] website. The data is related to the crime incidents reported to Seattle police departments. I performed exploratory data analysis using R programming and found interesting patterns in the data. I plotted different visualizations to spot crime trends or patterns in the greater Seattle area. The final visualization shows interesting patterns in crime committed at different hours of the day. Please follow the link Crime Incidents Analysis - Seattle to see in-depth analysis and crime trends in Seattle.
Skills Used: R Programming, Exploratory Data Analysis
Support or Contact
Sumeet Sharma
[sumeets@uw.edu]