Dear Aspiring Data Scientist,
Before you start using ‘low code’ or ‘drag & drop’ data science tools, please learn the fundamentals.
Why aspire to be ‘Citizen Data Scientist’ when you can truly become a ‘Data Scientist.’
Don’t get swayed by the fancy titles like ‘Citizen Data Scientist.’ It is funny that so much hard selling is happening in data science.
I mean, just because we know how to use a thermometer or operate BP machine, should we start calling ourselves ‘Citizen Doctor’?
Image credit: KDnuggets.com
Strategy — undermine the difficulty of doing data science!
The undermining of difficulty in doing data science is not healthy. Many ‘become a data scientist in a 1-month course’ sellers and ‘low code data science solution’ sellers use this strategy.
The ‘low code/no-code solution’ sellers will often argue that one could gain intuition by *doing* things. The counter-argument to that is, using a low code/no-code solution is like using a calculator. Before one can operate a calculator, one needs to have numeracy skills. Learning the fundamentals in data science is like acquiring numeracy skills.
Image credit : https://www.sciencenewsforstudents.org/article/animals-can-do-almost-math
Why 85 % of Data Science projects fail? (hint: No skin in the game)
85 % of Data Science projects fail in the enterprise because people think it is easy to do data science but only do it wrongly. The realization often comes late.
Many fall victim to the ‘become a data scientist in 1 month/ 6 months type courses’ and often wonder why they are not being hired.
The market is the ultimate truth-teller.
It somehow knows who the good players are and operates an excellent filtering mechanism. The reason being, the market is comprised of companies that have ‘skin in the game.’
Companies having ‘skin in the game’ don’t gamble. They hire genuine talent. The simple ‘skin in the game’ test one can do by themselves is ask one simple question. Would I use the machine learning classifier myself?
I came across a Linkedin post where a person built a heart disease prediction model using one of the low code libraries. The real question is whether that person would use that model on his/her kith and kin?
Also, the real utility of heart disease prediction or earthquake prediction is not the prediction that it will happen with x% certainty, but WHEN will it happen.
This ‘temporal’ part no model can predict accurately.
Doing Data Science is easy. Or is it?
One of the reasons data science seems *easy to do* is because many algorithms can be fit in 2–3 lines of code. There is simply no intellectual pain.
Compare this to programming. A person has to think about the syntax, design pattern, and logic. When things go astray in programming, there are multiple checkpoints in the form of error alerts like Runtime, Syntax error, and compiler error. One gets an immediate reality check on how good or bad a programmer he/she is. As a result, one does not go up and about calling themselves ‘citizen software engineer.’
On the flip side, When it comes to data science, there is no runtime or syntax error equivalent. There are no warning signs that says one can’t apply a particular algorithm on the data. There is no immediate reality of check-in data science.
This is one reason why people who advocate ‘learning the fundamentals is not important’ go scot-free. This is why fancy but harmful titles like ‘citizen Data Scientist’ arise.
The above criticism might sound rude/bitter, but it is all in the hope that one day we can all say 85% of Data Science projects succeed rather than fail.
I would also encourage the readers to read the articles below:
https://www.kdnuggets.com/2016/03/mirage-citizen-data-scientist.html
https://medium.com/@luis.moreira.matias/zero-stack-data-scientist-part-i-beginnings-1691afa2b510
Your comments and feedback are welcome.
You can reach out to me on