Discussion Amazon's AutoML vs. open source statistical methods
>TL;DR: We paid USD $800 USD and spend 4 hours in the AWS Forecast console so you don't have to.
In this reproducible experiment, we compare Amazon Forecast and StatsForecast a python open-source library for statistical methods.
Since AWS Forecast specializes in demand forecasting, we selected the M5 competition dataset as a benchmark; the dataset contains 30,490 series of daily Walmart sales.
We found that Amazon Forecast is 60% less accurate and 669 times more expensive than running an open-source alternative in a simple cloud server.
We also provide a step-by-step guide to reproduce the results.
### Results
Amazon Forecast:
achieved 1.617 in error (measured in wRMSSE, the official evaluation metric used in the competition),
took 4.1 hours to run,
and cost 803.53 USD.
An ensemble of statistical methods trained on a c5d.24xlarge EC2 instance:
achieved 0.669 in error (wRMSSE),
took 14.5 minutes to run,
and cost only 1.2 USD.
For this data set, we show, therefore, that:
Amazon Forecast is 60% less accurate and 669 times more expensive than running an open-source alternative in a simple cloud server.
Classical methods outperform Machine Learning methods in terms of speed, accuracy, and cost.
Although using StatsForecast requires some basic knowledge of Python and cloud computing, the results are better for this dataset.
Table
https://preview.redd.it/vt9ru0149i5a1.png?width=1274&format=png&auto=webp&s=64e6d4519f5934d56d25d76d17a58e6d03d70512
/r/MachineLearning
https://redd.it/zk6h8q
[OC] Geospatial density of the biggest fast food chains in the USA
/r/dataisbeautiful
https://redd.it/zkercv
Are rule-based algorithms like PRISM/Ripper... competitive?
Hi!
Our professor at school likes Weka and makes us use it for training on the algorithms.
He also has slides on classification rules and talks about rule-based algorithms like PRISM, Ripper and DTNB and is asking us to use them on some dataset.
I was wondering, as I'm not finding much information about these algorithms, if they're outdated / not competitive enough nowadays, have you ever used them in a professionnal setting...
Thanks.
/r/datascience
https://redd.it/zjvf8t
In which data science jobs/careers is the Agile/Scrum philosophy NOT used? Just wondering.
Hi everyone! I was just wondering: In which data science jobs/careers is the Agile/Scrum philosophy NOT used?
(Btw, I would appreciate it if we could please avoid becoming distracted by the pros and cons of Agile/Scrum. That is not my intent.
I am just curious which data science jobs/careers do NOT use it. Thanks!)
/r/datascience
https://redd.it/zj1i5g
[OC] I was bored this week so I made this map about our world's fisheries, today's fish consumption, what we consume and where it comes from !
/r/dataisbeautiful
https://redd.it/zk3hu8
3D graphs are helping me to visualise data across multiple dimensions
/r/datascience
https://redd.it/zjvnuw
Please help: Data Scientist stuck in Consulting. How do I get out?
Profile - Canadian. 29y/o. Masters in Data Science.
Early 2019 - Data Scientist at the start of my career. Loved every second of it.
2020 - Became a Finance consultant for a $90k/year package (only for money) where I only use Excel.
2022 - 3 years on, I'm still stuck in this consulting firm with a $100k/year package and no promotion. I seem to have lost my Data Science skills.
My skillset includes Data Science technologies and bullshit consulting projects where I help settle bank books. I have a gift for bullshitting in interviews and at this point I've fabricated my entire resume with consulting projects turned into data science projects from the last 2 years.
I've been busting my head for the last 6 months on LinkedIn trying to find jobs where I can get back into the data science industry but I cant find anything where I can leverage the last 3 years of my life. I'm desperate to progress in my career. My peers who started on 60k, 3 years back have reached 100k but I'm still stuck at the same rung in my career ladder. I feel like I've wasted the last 3 years of my professional career.
But what should be my next step? I cant seem to find any niche/opportunities to develop in. How do I progress towards a better job? Should I stay in consulting or leave and go back to Data Science?
/r/datascience
https://redd.it/zjkj2i
Population distribution in New York City by race/ethnicity of residents, circa 2015.
/r/MapPorn
https://redd.it/zjjhvz
Q Where to get simple health-related datasets?
I am trying to do a final project in my class, and I need a good publicly available dataset such as one that contains the cholesterol level after using different drugs or drug concentration on tumor size, whatever. It doesn't need to be specific, but the measure/ Y-value needs to be a numerical value, and I've been looking through Kaggle, where most of the datasets have boolean/nominal values as the measure. Does any one have any guidance as to where I can go? I looked through some governmental websites and and university ones, but the datasets are extremely complex with thousands of columns, which would take a lot of time to figure out, and I don't have enough time for that.
​
Thanks, I would appreciate any help!
/r/statistics
https://redd.it/zj54fu
[OC] UK housing most unaffordable since Victorian times
/r/dataisbeautiful
https://redd.it/zktc6r
The US advises to stay away from the Middle East [OC]
/r/dataisbeautiful
https://redd.it/zk99sf
Can you recommend a Python textbook to replace "An Introduction to Statistical Learning with Applications in R", Witten, J. et. al. E
I am migrating a course from R to Python, and am looking to replace this textbook with one that is as similar as possible, but uses Python as the application language.
There is a github which converts all the R to Python from this book, and that is very nice, but not quite as convenient as a new book.
/r/statistics
https://redd.it/zk8rbr
[OC] Visualising Pfizer's latest income statement. Pharmaceutical profit margins are notoriously higher than most other industries
/r/dataisbeautiful
https://redd.it/zju4mr
[OC] Average Home Sold Home Price in Canada, Q4 2022 in $USD/$CAD
/r/dataisbeautiful
https://redd.it/zk65kp
Yet another 2022 Wrapped: Chat Messages per Day [OC]
/r/dataisbeautiful
https://redd.it/zk785k
Programmatically create presentation slides with data visualisation graphs in Python
Hi all,
I am currently working on a project where I use Python’s data science libraries to generate graphs and various visualisations on data (eg using Pandas, Seaborn etc.). Ultimately, I’m looking to put all of these graphs and models into a PowerPoint- like presentation in a way that 1) the graphs are linked to a database, 2) the graphs get updated automatically if anything changes in the database, 3) I have a clean layout of text, pictures and models all together.
I am hence looking at tools that can help me achieve that. I see that Google slides integrate with Python through the gslides library but I haven’t found many examples of what it can generate. Jupyter notebook is another option but I’m not sure how a presentation like PowerPoint can be created in it (so far I’ve only really used JupyterNotebook for reporting purposes). Is there any tools I could look at?
Thanks, any help is much appreciated !
/r/datascience
https://redd.it/zjyleu
Which country has the most attractive men according to Europe
/r/MapPorn
https://redd.it/zjuqlz
I attempted to draw Europe with 1 hexagon representing 1 million people
/r/MapPorn
https://redd.it/zjpj9u
[OC] 2022 World Cup - Probabilities of final victory according to the bookmakers
/r/dataisbeautiful
https://redd.it/zjeouy
[OC] Places of birth of all the Morocco players who played and beat Portugal in the World Cup quarter-finals. A truly diverse team!
/r/dataisbeautiful
https://redd.it/ziv1hm
[OC] Yearly Average Temperature in the UK, 1884 - 2021
/r/dataisbeautiful
https://redd.it/ziwu4m