Telegram-канал datasciencefun - Data Science & Machine Learning: Unsorted

Data Science & Machine Learning

07 May 2023 14:10

Data Science and Analytics Strategy
Kailash Awati, 2023

Читать полностью…

Data Science & Machine Learning

02 May 2023 18:43

Interpretability in Deep Learning
Ayush Somani, 2023

Читать полностью…

Data Science & Machine Learning

01 May 2023 05:42

Cheatsheet on Numpy and pandas for easy viewing 👀

Читать полностью…

Data Science & Machine Learning

24 April 2023 09:12

Amazing Hackthon Solved Data Science/ML Project Collection

⭐️ 167

https://github.com/analyticsindiamagazine/MachineHack/tree/master/Hackathon_Solutions

𝗘𝗡𝗝𝗢𝗬 𝗟𝗘𝗔𝗥𝗡𝗜𝗡𝗚 👍👍

Читать полностью…

Data Science & Machine Learning

21 April 2023 09:07

Mastering Ubuntu Server
Jay LaCroix, 2022

Читать полностью…

Data Science & Machine Learning

21 April 2023 09:07

Analytic SQL in SQL Server 2014/2016
Riadh Ghlala, 2019

Читать полностью…

Data Science & Machine Learning

02 August 2021 12:30

🤓 Technical Python concepts tested in the data science job interviews are:

- Data types.
- Built-in data structures.
- User-defined data structures.
- Built-in functions.
- Loops and conditionals.
- External libraries (Pandas).

Source Article: https://www.kdnuggets.com/2021/07/top-python-data-science-interview-questions.html

Читать полностью…

Data Science & Machine Learning

24 July 2021 22:20

What are the decision trees?

This is a type of supervised learning algorithm that is mostly used for classification problems. Surprisingly, it works for both categorical and continuous dependent variables.

In this algorithm, we split the population into two or more homogeneous sets. This is done based on most significant attributes/ independent variables to make as distinct groups as possible.

A decision tree is a flowchart-like tree structure, where each internal node (non-leaf node) denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a value for the target variable.

Various techniques : like Gini, Information Gain, Chi-square, entropy.

Читать полностью…

Data Science & Machine Learning

24 July 2021 06:29

Quiz Explaination

Supervised Learning: All data is labeled and the algorithms learn to predict the output from the
input data

Unsupervised Learning: All data is unlabeled and the algorithms learn to inherent structure from
the input data.

Semi-supervised Learning: Some data is labeled but most of it is unlabeled and a mixture of
supervised and unsupervised techniques can be used to solve problem.

Unsupervised learning problems can be further grouped into clustering and association problems.

Clustering: A clustering problem is where you want to discover the inherent groupings
in the data, such as grouping customers by purchasing behavior.

Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy A also tend to buy B.

Читать полностью…

Data Science & Machine Learning

20 July 2021 08:15

🎓 Introduction to Deep Learning (by MIT) 🎓

This is one of the top high-quality courses to learn the foundational knowledge of deep learning.

All lectures have been uploaded. 100% Free!
https://youtube.com/playlist?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI

Читать полностью…

Data Science & Machine Learning

16 July 2021 22:01

How does L2 regularization look like in a linear model?

L2 regularization adds a penalty term to our cost function which is equal to the sum of squares of models coefficients multiplied by a lambda hyperparameter.

This technique makes sure that the coefficients are close to zero and is widely used in cases when we have a lot of features that might correlate with each other.

Читать полностью…

Data Science & Machine Learning

14 July 2021 07:16

Machine Learning for Everyone in some words

https://vas3k.com/blog/machine_learning/

Читать полностью…

Data Science & Machine Learning

13 July 2021 06:15

Today is the last day to get exclusive 75% discount by using coupon code JULY75

Читать полностью…

Data Science & Machine Learning

12 July 2021 06:41

What’s the difference between random forest and gradient boosting?

Random Forests builds each tree independently while Gradient Boosting builds one tree at a time.
Random Forests combine results at the end of the process (by averaging or "majority rules") while Gradient Boosting combines results along the way.

Читать полностью…

Data Science & Machine Learning

06 July 2021 06:32

What is the area under the PR curve? Is it a useful metric?

The Precision-Recall AUC is just like the ROC AUC, in that it summarizes the curve with a range of threshold values as a single score.

A high area under the curve represents both high recall and high precision, where high precision relates to a low false positive rate, and high recall relates to a low false negative rate.

Читать полностью…

Data Science & Machine Learning

03 May 2023 03:36

Harvard University offers a ton of FREE online courses.
From Computer Science to Artificial Intelligence.
Here are 10 FREE courses you don't want to miss

1. Introduction to Computer Science
An introduction to the intellectual enterprises of computer science and the art of programming.
Check here 👇
https://pll.harvard.edu/course/cs50-introduction-computer-science?delta=0

2. Web Programming with Python and JavaScript
This course takes you deeply into the design and implementation of web apps with Python, JavaScript, and SQL using frameworks like Django, React, and Bootstrap.
Check here 👇
https://pll.harvard.edu/course/cs50s-web-programming-python-and-javascript?delta=0

3. Introduction to Programming with Scratch

A gentle introduction to programming that prepares you for subsequent courses in coding.
Check here 👇
https://pll.harvard.edu/course/cs50s-introduction-programming-scratch?delta=0

4. Introduction to Programming with Python
An introduction to programming using Python, a popular language for general-purpose programming, data science, web programming, and more.
Check here 👇
https://edx.org/course/cs50s-introduction-to-programming-with-python

5. Understanding Technology
This is CS50’s introduction to technology for students who don’t (yet!) consider themselves computer persons.
Check here 👇
https://pll.harvard.edu/course/cs50s-understanding-technology-0?delta=0

6. Introduction to Artificial Intelligence with Python
Learn to use machine learning in Python in this introductory course on artificial intelligence.
Check here 👇
https://pll.harvard.edu/course/cs50s-introduction-artificial-intelligence-python?delta=0

7. Introduction to Game Development
Learn about the development of 2D and 3D interactive games in this hands-on course, as you explore the design of games such as Super Mario Bros., Pokémon, Angry Birds, and more.
Check here 👇
https://pll.harvard.edu/course/cs50s-introduction-game-development?delta=0

8. CS50's Computer Science for Business Professionals
This is CS50’s introduction to computer science for business professionals.
Check here 👇
https://pll.harvard.edu/course/cs50s-computer-science-business-professionals-0?delta=0

9. Mobile App Development with React Native
Learn about mobile app development with React Native, a popular framework maintained by Facebook that enables cross-platform native apps using JavaScript without Java or Swift.
Check here 👇
https://pll.harvard.edu/course/cs50s-mobile-app-development-react-native?delta=0

10. Introduction to Data Science with Python
Join Harvard University instructor Pavlos Protopapas in this online course to learn how to use Python to harness and analyze data.
Check here 👇
https://pll.harvard.edu/course/introduction-data-science-python?delta=0

Читать полностью…

Data Science & Machine Learning

01 May 2023 19:11

Python and R for the Modern Data Scientist
Rick Scavetta, 2021

Читать полностью…

Data Science & Machine Learning

27 April 2023 21:53

FREE DATASET BUILDING YOUR PORTFOLIO ⭐

1. Supermarket Sales - https://lnkd.in/e86UpCMv
2.Credit Card Fraud Detection - https://lnkd.in/eFTsZDCW
3. FIFA 22 complete player dataset - https://lnkd.in/eDScdUUM
4. Walmart Store Sales Forecasting - https://lnkd.in/eVT6h-CT
5. Netflix Movies and TV Shows - https://lnkd.in/eZ3cduwK
6.LinkedIn Data Analyst jobs listings - https://lnkd.in/ezqxcmrE
7. Top 50 Fast-Food Chains in USA - https://lnkd.in/esBjf5u4
8. Amazon and Best Buy Electronics - https://lnkd.in/e4fBZvJ3
9. Forecasting Book Sales - https://lnkd.in/eXHN2XsQ
10. Real / Fake Job Posting Prediction - https://lnkd.in/e5SDDW9G

Читать полностью…

Data Science & Machine Learning

23 April 2023 06:29

1. What is the Difference Between a Shallow Copy and Deep Copy in python?

Deepcopy creates a different object and populates it with the child objects of the original object. Therefore, changes in the original object are not reflected in the copy. copy.deepcopy() creates a Deep Copy. Shallow copy creates a different object and populates it with the references of the child objects within the original object. Therefore, changes in the original object are reflected in the copy. copy.copy creates a Shallow Copy.

2. How can you remove duplicate values in a range of cells?

1. To delete duplicate values in a column, select the highlighted cells, and press the delete button. After deleting the values, go to the ‘Conditional Formatting’ option present in the Home tab. Choose ‘Clear Rules’ to remove the rules from the sheet.

2. You can also delete duplicate values by selecting the ‘Remove Duplicates’ option under Data Tools present in the Data tab.

3. Define shelves and sets in Tableau?

Shelves: Every worksheet in Tableau will have shelves such as columns, rows, marks, filters, pages, and more. By placing filters on shelves we can build our own visualization structure. We can control the marks by including or excluding data.
Sets: The sets are used to compute a condition on which the dataset will be prepared. Data will be grouped together based on a condition. Fields which is responsible for grouping are known assets. For example – students having grades of more than 70%.

4. Given a table Employee having columns empName and empId, what will be the result of the SQL query below?

select empName from Employee order by 2 asc;

“Order by 2” is valid when there are at least 2 columns used in SELECT statement. Here this query will throw error because only one column is used in the SELECT statement.

ENJOY LEARNING 👍👍

Читать полностью…

Data Science & Machine Learning

21 April 2023 09:07

Adventures of a Computational Explorer
Stephen Wolfram, 2019

Читать полностью…

Data Science & Machine Learning

10 August 2021 10:14

Some interview questions related to Data science

1- what is difference between structured data and unstructured data.

2- what is multicollinearity.and how to remove them

3- which algorithms you use to find the most correlated features in the datasets.

4- define entropy

5- what is the workflow of principal component analysis

6- what are the applications of principal component analysis not with respect to dimensionality reduction

7- what is the Convolutional neural network. Explain me its working

Читать полностью…

Data Science & Machine Learning

31 July 2021 22:25

What are the benefits of a single decision tree compared to more complex models?

easy to implement
fast training
fast inference
good explainability

Читать полностью…

Data Science & Machine Learning

24 July 2021 22:20

What is feature selection? Why do we need it?

Feature Selection is a method used to select the relevant features for the model to train on. We need feature selection to remove the irrelevant features which leads the model to under-perform.

Читать полностью…

Data Science & Machine Learning

22 July 2021 12:15

What are the main parameters of the random forest model?

max_depth: Longest Path between root node and the leaf

min_sample_split: The minimum number of observations needed to split a given node

max_leaf_nodes: Conditions the splitting of the tree and hence, limits the growth of the trees

min_samples_leaf: minimum number of samples in the leaf node

n_estimators: Number of trees

max_sample: Fraction of original dataset given to any individual tree in the given model

max_features: Limits the maximum number of features provided to trees in random forest model

Читать полностью…

Data Science & Machine Learning

17 July 2021 08:15

What are the main parameters in the gradient boosting model?

There are many parameters, but below are a few key defaults.

learning_rate=0.1 (shrinkage).
n_estimators=100 (number of trees).
max_depth=3.
min_samples_split=2.
min_samples_leaf=1.
subsample=1.0.

Читать полностью…

Data Science & Machine Learning

14 July 2021 21:50

Which regularization techniques do you know?

There are mainly two types of regularization,

L1 Regularization (Lasso regularization) - Adds the sum of absolute values of the coefficients to the cost function.
L2 Regularization (Ridge regularization) - Adds the sum of squares of coefficients to the cost function

Here, Lambda determines the amount of regularization.

Читать полностью…

Data Science & Machine Learning

14 July 2021 07:05

Everything you need to know about TensorFlow 2.0
Keras-APIs, SavedModels, TensorBoard, Keras-Tuner and more.

https://hackernoon.com/everything-you-need-to-know-about-tensorflow-2-0-b0856960c074?

Читать полностью…

Data Science & Machine Learning

12 July 2021 15:32

What happens to our linear regression model if we have three columns in our data: x, y, z — and z is a sum of x and y?

We would not be able to perform the regression. Because z is linearly dependent on x and y so when performing the regression would be a singular (not invertible) matrix.

Читать полностью…

Data Science & Machine Learning

11 July 2021 07:46

What do we do with categorical variables?

Categorical variables must be encoded before they can be used as features to train a machine learning model. There are various encoding techniques, including:

One-hot encoding
Label encoding
Ordinal encoding
Target encoding

Читать полностью…

Data Science & Machine Learning

06 July 2021 06:31

What is the PR (precision-recall) curve?

A precision-recall curve (or PR Curve) is a plot of the precision (y-axis) and the recall (x-axis) for different probability thresholds. Precision-recall curves (PR curves) are recommended for highly skewed domains where ROC curves may provide an excessively optimistic view of the performance.

Читать полностью…