datascientology | Education

Telegram-канал datascientology - Data Scientology

1073

Hot data science related posts every hour. Chat: https://telegram.me/r_channels

Subscribe to a channel

Data Scientology

Red hair frequency in Europe

/r/MapPorn
https://redd.it/zks8gw

Читать полностью…

Data Scientology

Air traffic control zones in the USA

/r/MapPorn
https://redd.it/zk8mng

Читать полностью…

Data Scientology

Discussion Amazon's AutoML vs. open source statistical methods

>TL;DR: We paid USD $800 USD and spend 4 hours in the AWS Forecast console so you don't have to.

In this reproducible experiment, we compare Amazon Forecast and StatsForecast a python open-source library for statistical methods.

Since AWS Forecast specializes in demand forecasting, we selected the M5 competition dataset as a benchmark; the dataset contains 30,490 series of daily Walmart sales.

We found that Amazon Forecast is 60% less accurate and 669 times more expensive than running an open-source alternative in a simple cloud server.

We also provide a step-by-step guide to reproduce the results.

### Results

Amazon Forecast:

achieved 1.617 in error (measured in wRMSSE, the official evaluation metric used in the competition),
took 4.1 hours to run,
and cost 803.53 USD.

An ensemble of statistical methods trained on a c5d.24xlarge EC2 instance:

achieved 0.669 in error (wRMSSE),
took 14.5 minutes to run,
and cost only 1.2 USD.

For this data set, we show, therefore, that:

Amazon Forecast is 60% less accurate and 669 times more expensive than running an open-source alternative in a simple cloud server.
Classical methods outperform Machine Learning methods in terms of speed, accuracy, and cost.

Although using StatsForecast requires some basic knowledge of Python and cloud computing, the results are better for this dataset.


Table

https://preview.redd.it/vt9ru0149i5a1.png?width=1274&format=png&auto=webp&s=64e6d4519f5934d56d25d76d17a58e6d03d70512

/r/MachineLearning
https://redd.it/zk6h8q

Читать полностью…

Data Scientology

(OC) Nine ways to divide Argentina

/r/MapPorn
https://redd.it/zk2a7n

Читать полностью…

Data Scientology

[OC] Geospatial density of the biggest fast food chains in the USA

/r/dataisbeautiful
https://redd.it/zkercv

Читать полностью…

Data Scientology

Countries with mandated paid maternity leave

/r/MapPorn
https://redd.it/zkbvoz

Читать полностью…

Data Scientology

Are rule-based algorithms like PRISM/Ripper... competitive?

Hi!

Our professor at school likes Weka and makes us use it for training on the algorithms.

He also has slides on classification rules and talks about rule-based algorithms like PRISM, Ripper and DTNB and is asking us to use them on some dataset.

I was wondering, as I'm not finding much information about these algorithms, if they're outdated / not competitive enough nowadays, have you ever used them in a professionnal setting...

Thanks.

/r/datascience
https://redd.it/zjvf8t

Читать полностью…

Data Scientology

In which data science jobs/careers is the Agile/Scrum philosophy NOT used? Just wondering.

Hi everyone! I was just wondering: In which data science jobs/careers is the Agile/Scrum philosophy NOT used?

(Btw, I would appreciate it if we could please avoid becoming distracted by the pros and cons of Agile/Scrum. That is not my intent.

I am just curious which data science jobs/careers do NOT use it. Thanks!)

/r/datascience
https://redd.it/zj1i5g

Читать полностью…

Data Scientology

[OC] I was bored this week so I made this map about our world's fisheries, today's fish consumption, what we consume and where it comes from !

/r/dataisbeautiful
https://redd.it/zk3hu8

Читать полностью…

Data Scientology

3D graphs are helping me to visualise data across multiple dimensions

/r/datascience
https://redd.it/zjvnuw

Читать полностью…

Data Scientology

Please help: Data Scientist stuck in Consulting. How do I get out?

Profile - Canadian. 29y/o. Masters in Data Science.

Early 2019 - Data Scientist at the start of my career. Loved every second of it.

2020 - Became a Finance consultant for a $90k/year package (only for money) where I only use Excel.

2022 - 3 years on, I'm still stuck in this consulting firm with a $100k/year package and no promotion. I seem to have lost my Data Science skills.

My skillset includes Data Science technologies and bullshit consulting projects where I help settle bank books. I have a gift for bullshitting in interviews and at this point I've fabricated my entire resume with consulting projects turned into data science projects from the last 2 years.

I've been busting my head for the last 6 months on LinkedIn trying to find jobs where I can get back into the data science industry but I cant find anything where I can leverage the last 3 years of my life. I'm desperate to progress in my career. My peers who started on 60k, 3 years back have reached 100k but I'm still stuck at the same rung in my career ladder. I feel like I've wasted the last 3 years of my professional career.

But what should be my next step? I cant seem to find any niche/opportunities to develop in. How do I progress towards a better job? Should I stay in consulting or leave and go back to Data Science?

/r/datascience
https://redd.it/zjkj2i

Читать полностью…

Data Scientology

Population distribution in New York City by race/ethnicity of residents, circa 2015.

/r/MapPorn
https://redd.it/zjjhvz

Читать полностью…

Data Scientology

Q Where to get simple health-related datasets?

I am trying to do a final project in my class, and I need a good publicly available dataset such as one that contains the cholesterol level after using different drugs or drug concentration on tumor size, whatever. It doesn't need to be specific, but the measure/ Y-value needs to be a numerical value, and I've been looking through Kaggle, where most of the datasets have boolean/nominal values as the measure. Does any one have any guidance as to where I can go? I looked through some governmental websites and and university ones, but the datasets are extremely complex with thousands of columns, which would take a lot of time to figure out, and I don't have enough time for that.

​

Thanks, I would appreciate any help!

/r/statistics
https://redd.it/zj54fu

Читать полностью…

Data Scientology

If Us land were divided like us wealth

/r/MapPorn
https://redd.it/ziqi0q

Читать полностью…

Data Scientology

US states that Enforce seatbelts

/r/MapPorn
https://redd.it/zj9s2b

Читать полностью…

Data Scientology

[OC] UK housing most unaffordable since Victorian times

/r/dataisbeautiful
https://redd.it/zktc6r

Читать полностью…

Data Scientology

The US advises to stay away from the Middle East [OC]

/r/dataisbeautiful
https://redd.it/zk99sf

Читать полностью…

Data Scientology

Can you recommend a Python textbook to replace "An Introduction to Statistical Learning with Applications in R", Witten, J. et. al. E

I am migrating a course from R to Python, and am looking to replace this textbook with one that is as similar as possible, but uses Python as the application language.

There is a github which converts all the R to Python from this book, and that is very nice, but not quite as convenient as a new book.

/r/statistics
https://redd.it/zk8rbr

Читать полностью…

Data Scientology

[OC] Visualising Pfizer's latest income statement. Pharmaceutical profit margins are notoriously higher than most other industries

/r/dataisbeautiful
https://redd.it/zju4mr

Читать полностью…

Data Scientology

[OC] Average Home Sold Home Price in Canada, Q4 2022 in $USD/$CAD

/r/dataisbeautiful
https://redd.it/zk65kp

Читать полностью…

Data Scientology

Yet another 2022 Wrapped: Chat Messages per Day [OC]

/r/dataisbeautiful
https://redd.it/zk785k

Читать полностью…

Data Scientology

Benefits of Walking in the daily life

/r/Infographics
https://redd.it/zjqygk

Читать полностью…

Data Scientology

Programmatically create presentation slides with data visualisation graphs in Python

Hi all,

I am currently working on a project where I use Python’s data science libraries to generate graphs and various visualisations on data (eg using Pandas, Seaborn etc.). Ultimately, I’m looking to put all of these graphs and models into a PowerPoint- like presentation in a way that 1) the graphs are linked to a database, 2) the graphs get updated automatically if anything changes in the database, 3) I have a clean layout of text, pictures and models all together.

I am hence looking at tools that can help me achieve that. I see that Google slides integrate with Python through the gslides library but I haven’t found many examples of what it can generate. Jupyter notebook is another option but I’m not sure how a presentation like PowerPoint can be created in it (so far I’ve only really used JupyterNotebook for reporting purposes). Is there any tools I could look at?

Thanks, any help is much appreciated !

/r/datascience
https://redd.it/zjyleu

Читать полностью…

Data Scientology

World Heritage Sites by Country

/r/MapPorn
https://redd.it/zjt920

Читать полностью…

Data Scientology

Which country has the most attractive men according to Europe

/r/MapPorn
https://redd.it/zjuqlz

Читать полностью…

Data Scientology

I attempted to draw Europe with 1 hexagon representing 1 million people

/r/MapPorn
https://redd.it/zjpj9u

Читать полностью…

Data Scientology

[OC] 2022 World Cup - Probabilities of final victory according to the bookmakers

/r/dataisbeautiful
https://redd.it/zjeouy

Читать полностью…

Data Scientology

[OC] Places of birth of all the Morocco players who played and beat Portugal in the World Cup quarter-finals. A truly diverse team!

/r/dataisbeautiful
https://redd.it/ziv1hm

Читать полностью…

Data Scientology

Areas ISIS wanted to capture by 2020

/r/MapPorn
https://redd.it/zijd5z

Читать полностью…

Data Scientology

[OC] Yearly Average Temperature in the UK, 1884 - 2021

/r/dataisbeautiful
https://redd.it/ziwu4m

Читать полностью…
Subscribe to a channel