Buy @ Amazon

Search This Blog

August 27, 2018

Correlation And Causation By Example


What you should know quickly?

Given the observations of two features A and B: we observe correlation between features A and B, when we see a pattern where A and B change its values at the same time. When the values of A and B, increase or decrease together, we say they are positively correlated. When the value of A increases if we find that the value of B decreases proportionately and vice-versa, we say they are negatively correlated.

Correlation is what we can visually identify by plotting the values of features in the graph and compare their trends for patterns. With this we cannot say what is causing what. In other words, you cannot claim one feature causing the other.

When the change in one feature results in the change in the other, we call it causation. When we find correlated features, we dig around the domain, do more research and homework to increase our domain knowledge to claim that one feature is causing the other.

August 21, 2018

Using PySpark 2 to read CSV having HTML source code

When you have a CSV file that has one of its fields as HTML Web-page source code, it becomes a real pain to read it, and much more so with PySpark when used in Jupyter Notebook.

Advice from a Patient to a Doctor

What a Doctor should understand?

The source of this photograph is unknown. But I guess this is the best piece of advice that a Doctor can get and one that every Doctor should read at least once everyday until this becomes their second nature in practice.

August 15, 2018

10 secrets to becoming an Engineering Leader — quickly!

1. Wear a blazer and a torn jean. You can boost it further by wearing spectacles. All this with a big fat belly gives that desired first impression.
2. Suffix the certifications next to your name. Don’t be shy about it. You have after all, p̶u̶r̶c̶h̶a̶s̶e̶d̶ earned it.

August 9, 2018

Quick Lessons in Data Science practice

  1. Basic statistics and mathematics is your compass in the field of Data Science. You must have it to have a sense of direction.
  2. You are better-off choosing one of R or Python or Matlab or any other platform, and become a master in it than attempting to get dirty with all of these.
  3. Right from the start, version-control your work with a tool like Git/Github/Bitbucket. It can save you from a lot of headaches.
  4. Don’t under-estimate the importance of understanding the problem domain. Domain knowledge is the secret weapon for Data-Manipulation.
  5. Don’t under-estimate data cleansing. It pays to cleanse your data.