Picture this: generating web or phone apps is no longer a daunting task – you can simply describe your desired functionality in plain English and watch as lines of high-quality code are generated before your eyes. The ability to understand, learn and create code using cutting-edge code Generative AI (GenAI) tools has far-reaching implications, such […]
A Machine Learning Engineer’s Top 5 Predictions for the Future of Generative AI
What is GenAI? Generative AI (GenAI) empowers end-users to generate content, such as images and text, quickly and easily. Entrepreneurs are taking advantage of this technology to create a growing number of startups that utilize GenAI models for various aspects of content creation. In the coming year, we can expect to see a proliferation of […]
Considerations for building a rules engine in Python
I recently looked into how to implement a deterministic rule-based model on batches of data in Python and was surprised by the complexity of potential solutions I found. I want to implement a set of rules that when not obeyed will trigger an alert. It is basically a framework for applying a glorified set if-else/switch […]
The next coding frontier- comparing about Julia, Go & Rust with Python
Currently, Python is the dominant programming language of data science and machine learning and is popular for more general scripting. It’s pretty awesome compared to its predecessors like C/C++, FORTRAN due to its ease of use, flexibility and readability. Python also has an active and robust library culture after over 30 years of existence. However, […]
Integrating Both Python & R into Data Science Workflows
These days, I highly prefer coding in Python as compared to other languages that I previously used like Matlab or R. However, I have always wondered when data science teams should use one programming language over another for certain tasks. If all team members know R and Python equally well and need to train a […]
Top 10 mistakes to avoid when using Hive/Impala on Hadoop
I recently took a deep dive into Hadoop for a project where I needed to automate the population of tables using JSONs and CSVs. Inevitably, I made some mistakes along the way and would like to share the lessons learned. By sharing them, I hope to save you some time! Here are 10 mistakes to […]
Software Engineering as a Data Scientist
Many of us in Data Science come from math, biology, chemistry or engineering or other non-Computer Science backgrounds, which may mean that we don’t have much experience writing and maintaining large code bases. Recently, I found myself getting frustrated with the structure of some of my code and searching for a better way to structure […]
Using Decorators in Python
In Python, decorators allow Data Scientists to extend and modify callables, such as functions, methods and classes, without explicitly changing the callable. Using decorators can improve the readability of your code as well code flexibility and modularity. In this article, we’ll discuss why we would use decorators, how to implement decorators and give a few […]
Creating Projects from Cookiecutter Templates
Ever want to generate a new repo based on a predefined template? Now you can using Cookiecutter! I will show you how to easily spin up a fresh Cookiecutter repo for your latest data science project in Python. Cookiecutter is an awesome command-line tool and Python package that creates projects (aka populates repo folders) based […]
Best 2021 Resources for Learning about AI/ML
For upskilling on AI/ML, I prefer taking a top-down approach i.e. starting with high level concepts then proceeding to more foundational topics (read: delve more into the theory) . I liked taking the breadth-first approach (rather than a depth-first approach) to initially understand AI/ML. Once I had a solid foundation, I easily pivoted to learning […]