I recently looked into how to implement a deterministic rule-based model on batches of data in Python and was surprised by the complexity of potential solutions I found. I want to implement a set of rules that when not obeyed will trigger an alert. It is basically a framework for applying a glorified set if-else/switch […]
The next coding frontier- comparing about Julia, Go & Rust with Python
Currently, Python is the dominant programming language of data science and machine learning and is popular for more general scripting. It’s pretty awesome compared to its predecessors like C/C++, FORTRAN due to its ease of use, flexibility and readability. Python also has an active and robust library culture after over 30 years of existence. However, […]
Integrating Both Python & R into Data Science Workflows
These days, I highly prefer coding in Python as compared to other languages that I previously used like Matlab or R. However, I have always wondered when data science teams should use one programming language over another for certain tasks. If all team members know R and Python equally well and need to train a […]
Top 10 mistakes to avoid when using Hive/Impala on Hadoop
I recently took a deep dive into Hadoop for a project where I needed to automate the population of tables using JSONs and CSVs. Inevitably, I made some mistakes along the way and would like to share the lessons learned. By sharing them, I hope to save you some time! Here are 10 mistakes to […]
Software Engineering as a Data Scientist
Many of us in Data Science come from math, biology, chemistry or engineering or other non-Computer Science backgrounds, which may mean that we don’t have much experience writing and maintaining large code bases. Recently, I found myself getting frustrated with the structure of some of my code and searching for a better way to structure […]
Using Decorators in Python
In Python, decorators allow Data Scientists to extend and modify callables, such as functions, methods and classes, without explicitly changing the callable. Using decorators can improve the readability of your code as well code flexibility and modularity. In this article, we’ll discuss why we would use decorators, how to implement decorators and give a few […]
Binder & Repl.it
I recently discovered two great tools for easily creating interactive coding environments without installing a thing. These tools facilitate sharing of code in multiple languages and are wonderful resources for demonstrating programming concepts when teaching a course. Binder The first tool is called Binder, is open-source and was released in 2017. It is awesome because […]
Creating Projects from Cookiecutter Templates
Ever want to generate a new repo based on a predefined template? Now you can using Cookiecutter! I will show you how to easily spin up a fresh Cookiecutter repo for your latest data science project in Python. Cookiecutter is an awesome command-line tool and Python package that creates projects (aka populates repo folders) based […]
Best 2021 Resources for Learning about AI/ML
For upskilling on AI/ML, I prefer taking a top-down approach i.e. starting with high level concepts then proceeding to more foundational topics (read: delve more into the theory) . I liked taking the breadth-first approach (rather than a depth-first approach) to initially understand AI/ML. Once I had a solid foundation, I easily pivoted to learning […]
PyCon 2020
Hey Folks! I finally got around to watching a bunch of the talks and found several of the talks useful for improving my Python coding skills in general and/or in the context of doing Data Science. Here are some interesting talks from PyCon 2020: Beautiful Python Refactoring video. The talk was simple but powerful in demonstrating […]