These days, I highly prefer coding in Python as compared to other languages that I previously used like Matlab or R. However, I have always wondered when data science teams should use one programming language over another for certain tasks. If all team members know R and Python equally well and need to train a […]
Top 10 mistakes to avoid when using Hive/Impala on Hadoop
I recently took a deep dive into Hadoop for a project where I needed to automate the population of tables using JSONs and CSVs. Inevitably, I made some mistakes along the way and would like to share the lessons learned. By sharing them, I hope to save you some time! Here are 10 mistakes to […]