Work Journal
Learning journey and work stuff. Updated weekly.
Week of June 7 - June 13, 2021
💻 Work
- I did some more writing on my master thesis.
📚 Learnings
- I have implemented Linear and Logistic Regression algorithms from scratch. Link
💡 Interesting stuff
- Combining rule engines and machine learning. Link
- 6 Secrets of Early Data Science Career Success. Link
Week of May 31 - June 6, 2021
💻 Work
- I published a blog post covering the k-Nearest Neighbours algorithm.
💡 Interesting stuff
- Are Data Scientists Really Being Automated? Link
Week of May 24 - May 30, 2021
💡 Interesting stuff
- Analytics is a mess. Link
- Good Data Scientist, Bad Data Scientist. Link
- DataTalks.Club Podcast: How to Market Yourself (without Being a Celebrity). Link
- skorch: A scikit-learn compatible neural network library that wraps PyTorch. Link
Week of May 17 - May 23, 2021
💻 Work
- I did some more writing on my master thesis.
- I created a new notebook on Kaggle about Credit Card Fraud Detection. Link
- I published a blog post about Imbalanced Classification.
💡 Interesting stuff
- Cheat Sheets for Machine Learning and Data Science. Link
- Collection of resources for Data Science interview preparation. Link
- Data science interview questions and answers. Link
Week of May 10 - May 16, 2021
💻 Work
- I did some more writing on my master thesis.
- I published a blog post about building a Second Brain.
💡 Interesting stuff
- DataTalks.Club Podcast: What I Learned After Interviewing 300 Data Scientists. Link
Week of May 3 - May 9, 2021
💻 Work
- Created this work journal page! I realized it is an excellent complement to my #66DaysofData tweets. Link
- I did some writing on my master thesis. To keep up with the schedule.
- I published a blog post about the Titanic dataset.
📚 Learnings
- I have been learning about different methods of comparing datasets. While there are a few more commonly used, there is no one solution that fits all kinds of data distributions. In my particular case, I have two groups of non-normal data distributions with continuous numerical features. After doing some research, I selected the following options: Wilcoxon and Mann-Whitney tests. And I will also use Spearman's rank correlation coefficient and relative entropy.
💡 Interesting stuff
- Kaggle released a new feature that allows sharing of notebooks. Link
- Emerging Architectures for Modern Data Infrastructure. Link
- Repository about Data Science in Production. Link