[OC] US states sorted by life expectancy, colored by Biden's share of the 2020 Presidential Election
/r/dataisbeautiful
https://redd.it/zslrnq
These data have been displayed much better than this
/r/dataisugly
https://redd.it/zq0bvx
Rural Equivalent Of New York City, Los Angeles And Chicago
/r/MapPorn
https://redd.it/zs1ptz
Pandas 1.5.0 or later has copy-on-write (CoW), which can be optionally enabled, removes inconsistencies, and speeds up many operations.
https://towardsdatascience.com/a-solution-for-inconsistencies-in-indexing-operations-in-pandas-b76e10719744
/r/datascience
https://redd.it/zsbxov
D Using "duplicates" during training?
I have collected experimental data for various conditions. In order to ensure repeatability, each test is replicated 5 times: which means same input but slightly different output due to experimental variability.
If you were to build a machine learning algorithm, would you use all 5 data points for each given test, hoping that your algorithm will learn to converge towards the mean response? Or it is advisable to pre-compute the means and only feed these to the model? ( so that you ensure that one input can only have one output)
I can see pros and cons to both approches and would welcome feedback. Thank you.
/r/MachineLearning
https://redd.it/zsbivc
'It is a duty towards society to have children' % that agree
/r/MapPorn
https://redd.it/zs1zbl
[OC] Among Big Tech, Amazon spends the most on R&D
/r/dataisbeautiful
https://redd.it/zrsfko
Petition to bring back images in posts.
Look at all time top rated posts. We like images, and some of them are actually really good. Statistics need visualization.
You can also vote in this survey
/r/SampleSize
https://redd.it/zrbz8f
[OC] Top googled games in Europe, December 2022
/r/dataisbeautiful
https://redd.it/zqpdgy
Some Images from a Treatise on the *Standing Accretion Shock Instability* Department of the Art of Core-Collapse Supernova Simulation
/r/mathpics
https://redd.it/zqw2vc
In 2021 I created these infographics to show the vast difference between one and a trillion. I never really did anything with them so I wanted to share them here to get some feedback.
https://redd.it/zs1bez
@datascientology
Map of percentages of respondents who say they would fight for their country
/r/MapPorn
https://redd.it/zshlqt
[OC] English Words of Spanish Origin and the Number of Mentions in Wikipedia
/r/dataisbeautiful
https://redd.it/zs48my
Are data science jobs affected by the tech bubble burst?
I teach a probability and statistics course to mostly computer science students, and I like to start the semester by talking about all the awesome job opportunities they'll have when they graduate.
I search Indeed for "data scientist" and share the number of active positions. Last semester there were substantially more than there are now - from roughly 24,000 down to around 14,000.
I imagine some students may have concerns that the number of job opportunities may be dwindling due to the supposedly bursting tech bubble, but I'm not sure if this more affects pure programming jobs (my background is not in data science).
I'd love to hear from people in the field, especially if you've been on the job hunt lately - any words of encouragement to current students?
Thanks for any info!
/r/datascience
https://redd.it/zrc1b5
[OC] I made these posters in late 2021 to illustrate the vast difference between one and a trillion. It served to help me better understand how massive some numbers really are. I never shared this out before so wanted to before the data become too out of date.
https://redd.it/zs1kow
@datascientology
Q Getting a Bachelors in statistics as female over 30yrs old!
Hey y’all! My wife is considering getting a statistics degree. She really likes statistics and even passed college statistics with an A while most of her classmates had to retake the class.
Our question is;
is getting a degree in statistics as a female in her early 30’s a good idea?
Is the R.O.I there?
Will employers overlook her due to age and/or being a female?
All replies or advice is welcome.
Thanks!
/r/statistics
https://redd.it/zs0mgi
Is it normal to be quite forgetful of techniques/methods in data science?
I’m currently working as a Data Analyst. My background is in Physics, so whilst I have a strong mathematical background and I’m used to remembering and working with a lot of equations, I’ve never had any “formal” statistics/data science training.
In my work, I’ve found myself using a range of analytical techniques. There’s the stuff I do every day, like computing basic summary statistics since I work mainly with categorical data, but also things like linear regression, various significance tests (t-test, chi squared), to more “complicated” techniques such as decision trees, and even things like forecasting.
However, every time I spend a few weeks away from one of these things (like decision trees), I completely forget how they work. I can remember things like there’s nodes and branches and it makes splits based on entropy, but beyond that it’s like I’ve forgotten everything I’ve read. Same with forecasting - I know that ARIMA models exist and that there’s different terms calculated which take into account trend and seasonality, but beyond that I’ve forgotten.
Is this normal?
/r/datascience
https://redd.it/zrtzf4
[OC] Mexico now leads the OG European beer countries in exports
/r/dataisbeautiful
https://redd.it/zrw8bb
N Point-E: a new Dalle-like model that generates 3D Point Clouds from Prompts
It's only been a month since OpenAI released ChatGPT, and yesterday they launched Point-E, a new Dalle-like model that generates 3D Point Clouds from Complex Prompts. As someone who is always interested in the latest advancements in machine learning, I was really excited to dig into this paper and see what it had to offer.
One of the key features of Point-E is its use of diffusion models to generate synthetic views and 3D point clouds. These models use text input to generate an image, which is then used as a reference for generating the 3D point cloud. This process takes only 1-2 minutes on a single GPU, making it much faster than previous state-of-the-art methods.
While the quality of the samples produced by Point-E may be lower than those produced by other methods, the speed of generation makes it a practical option for certain use cases.
If you're interested in learning more about this new model and how it was developed, I highly recommend giving the full paper a read. But if you're more into reading the gist of it, I added a link to an overview blog I published about.
The blog: https://dagshub.com/blog/overview-of-point-e/
The paper: https://arxiv.org/abs/2212.08751
I'm sure I have yet to reach all the insights while writing the blog, and I'd love to get your thoughts about the model and how OpenAI developed it.
/r/MachineLearning
https://redd.it/zrfy75
Got my first Data Science job!!!
I just graduated with a masters in Data Science last Friday and I got my first job in my degree field. I had applied for the position on December 1st, after 2 interviews I got the call this afternoon. My best advice is don’t get hung up on the job title, look at the description. Mine was listed as a programmer but it is working with SQL, Python and Tableau. I wouldn’t have found it based on the title.
/r/datascience
https://redd.it/zr3xli