Data science is a very popular term these days and it gets applied to so many things that eat meanings have become very vague. So I would like to start this lecture by giving you that definition that I use. I have found that this one gets right to the heart of what sets it apart from other disciplines. Here goes Data science means doing analytics work that for one reason or another requires a substantial
amount of sharper engineering skills. Sometimes the final variable is the kind of thing a statistician or business analyst might provide. But achieving that goal demands somebody skills that your people can Ellen is simply doesn't have. For example, a data set might be so large that you need to use distributed computing to analyze it or so convoluted in each format that many lines of code are required to pass it. In many cases, data scientists also have to write big chunks of production there that implement their analytics ideas in real-time.
In practice, there are usually other differences as well.
For example, the data scientists usually have to extract features from raw data which means that they tackle very open-ended problems such as how to quantify that famine is often email aren't data scientists just overpaid statisticians. Nate Silver a statistician famous for accurate forecasting of US elections once famously said I think data scientist e.g. sexed-up DA for a statistician. He has a point but what he said is only partly true that discipline up statistics deals mostly with rigorous mathematical methods for solving well-defined problems. Data Scientists spend most of their time getting that eye into a form where do statistical methods could even be applied.
This involves making sure that the analytics problem is a good match to business objectives extracting meaningful features from the raw data and coping with any pathologies of the data are worth it. Cases once that heavy lifting is done you can apply statistical tools to get the final result.
Although in practice you often don't even need them professional statisticians need to do a certain
amount of pre-processing themselves. But there is a massive difference in degree historically data science imposed as a field independently from statistics.
Most of the past data scientists were computer programmers are machine learning experts we were working one big data problems they were analyzing data sets of decline that a statistician doesn't touch is dismal pages. Image files emails raw output logs of web servers and so on. These datasets don't fit the mold of realists rash relational database or statistical tools. So for decades, they were just filling up with doubt being analyzed data science came into being as a way to finally milk them for insights why is it all in Python anyway. The example code in this lecture is all in Python except for a few domains is basic language such as a squirrel.
My goal isn't to push you to use it by tone. There are lots of good tools out there and you can use whichever one you want. However, I want to use one language for all of my examples.
This keeps the lecture readable and easy to also let the reader follow the whole lecture will one live
knowing one language while the various languages are available. There are two reasons why I just fight on the number one python is the most popular language for a data scientist are easy to only measure competitors at least when it comes to free tools.
I have used both extensively and I think that Python is flat out better.
Number two I like to say that for any task python is the second wasted language.
It said Jack of all trades if you want Lee to need to worry about saying statistics or numerical computation or way parsing. Then there are better options out there but if you need to do all of these things within a single project then python is your best option.
Since data sciences are short inherently multidisciplinary this me makes it a perfect fit.
Example code and data sets. This lecture is a recent example code in fairly long chunks.
This was done for two reasons. Number one as a data scientist you need to be able to read the longest piece of code. This is in non-optional skills and if you aren't used. Then this will give you a chance to practice. Number two I wanted to make it easier for you to watch that quote from this lecture.
If you feel so inclined thank you.
#machine_learning_algorithms #machine_learning #machine_learning_course #machine_learning_projects #machine_learning_jobs #machine_learning_applications #data_science
0 Comments