Work Journal

Learning journey and work stuff. Updated weekly.

Week of June 7 - June 13, 2021

💻 Work

  • I did some more writing on my master thesis.

📚 Learnings

  • I have implemented Linear and Logistic Regression algorithms from scratch. Link

💡 Interesting stuff

  • Combining rule engines and machine learning. Link
  • 6 Secrets of Early Data Science Career Success. Link

Week of May 31 - June 6, 2021

💻 Work

  • I published a blog post covering the k-Nearest Neighbours algorithm.

💡 Interesting stuff

  • Are Data Scientists Really Being Automated? Link

Week of May 24 - May 30, 2021

💡 Interesting stuff

  • Analytics is a mess. Link
  • Good Data Scientist, Bad Data Scientist. Link
  • DataTalks.Club Podcast: How to Market Yourself (without Being a Celebrity). Link
  • skorch: A scikit-learn compatible neural network library that wraps PyTorch. Link

Week of May 17 - May 23, 2021

💻 Work

  • I did some more writing on my master thesis.
  • I created a new notebook on Kaggle about Credit Card Fraud Detection. Link
  • I published a blog post about Imbalanced Classification.

💡 Interesting stuff

  • Cheat Sheets for Machine Learning and Data Science. Link
  • Collection of resources for Data Science interview preparation. Link
  • Data science interview questions and answers. Link

Week of May 10 - May 16, 2021

💻 Work

  • I did some more writing on my master thesis.
  • I published a blog post about building a Second Brain.

💡 Interesting stuff

  • DataTalks.Club Podcast: What I Learned After Interviewing 300 Data Scientists. Link

Week of May 3 - May 9, 2021

💻 Work

  • Created this work journal page! I realized it is an excellent complement to my #66DaysofData tweets. Link
  • I did some writing on my master thesis. To keep up with the schedule.
  • I published a blog post about the Titanic dataset.

📚 Learnings

  • I have been learning about different methods of comparing datasets. While there are a few more commonly used, there is no one solution that fits all kinds of data distributions. In my particular case, I have two groups of non-normal data distributions with continuous numerical features. After doing some research, I selected the following options: Wilcoxon and Mann-Whitney tests. And I will also use Spearman's rank correlation coefficient and relative entropy.

💡 Interesting stuff

  • Kaggle released a new feature that allows sharing of notebooks. Link
  • Emerging Architectures for Modern Data Infrastructure. Link
  • Repository about Data Science in Production. Link