datascientology | Education

Telegram-канал datascientology - Data Scientology

1073

Hot data science related posts every hour. Chat: https://telegram.me/r_channels

Subscribe to a channel

Data Scientology

Prostitution status world map

/r/MapPorn
https://redd.it/znsj6j

Читать полностью…

Data Scientology

[OC] saw this at the local vet. pets’ ages in human years

/r/dataisbeautiful
https://redd.it/zny2h4

Читать полностью…

Data Scientology

[OC] top 23* FIFA rankings over the last 25 years. Is there a correlation between the rankings and the world cup finals? Also, did you know that Morroco is the lowest-ranked team that qualified for the semi-finals over the last 25 years (*ranked #23) Link in the comments

/r/dataisbeautiful
https://redd.it/zneh00

Читать полностью…

Data Scientology

Status of gay marriage in Europe

/r/MapPorn
https://redd.it/znkhnd

Читать полностью…

Data Scientology

Child abuse in the U.S. - victims by perpetrator relationship 2020
https://www.statista.com/statistics/254893/child-abuse-in-the-us-by-perpetrator-relationship/

/r/dataisbeautiful
https://redd.it/znno50

Читать полностью…

Data Scientology

So far 2022 has had the second widest range of daily average temperatures in the Central England temperature series. This shows the 5 years with the widest and narrowest range of temperatures in the series from 1772. [OC]

/r/dataisbeautiful
https://redd.it/znccpl

Читать полностью…

Data Scientology

[P] XetHub: We scaled Git to support 1 TB repos

Thanks to everyone who replied to our [earlier post requesting pre-launch product feedback](https://www.reddit.com/r/mlops/comments/zd7hqy/feedback_requested_new_data_storage_tool_for/)! We’re excited to announce that we’ve now publicly launched [XetHub](https://xethub.com/?utm_source=reddit&utm_medium=organic&utm_campaign=xethub-intro&utm_content=link), a collaborative storage platform for data management.

I’ve been in the MLOps space for \~10 years, and data is still the hardest unsolved open problem. Code is versioned using Git, data is stored somewhere else, and context often lives in a 3rd location like Slack or GDocs.

This is why we built XetHub, a platform that enables teams to treat data like code, using Git.

Unlike Git LFS, XetHub doesn’t just store the files. It uses content-defined chunking and Merkle Trees to dedupe against everything in history, allowing small changes in large files to be stored compactly. Here’s how it works: [https://xethub.com/assets/docs/how-xet-deduplication-works](https://xethub.com/assets/docs/how-xet-deduplication-works)

XetHub includes a GitHub-like web interface that provides automatic CSV summaries and allows custom visualizations using Vega. And we know how painful downloading a huge repository can get, so we built Git-Xet mount—which, in seconds, provides a user-mode filesystem view over the repo.

Today, XetHub works for 1 TB repositories, and we plan to scale to 100 TB in the next year. Our implementation is in Rust (client & cache + storage) and our web application is written in Go.

XetHub is available today for Linux & Mac (Windows coming soon) and we’d love for you to try it out!

More info here:

* [https://xetdata.com/blog/2022/12/13/introducing-xethub](https://xetdata.com/blog/2022/12/13/introducing-xethub)
* [https://xetdata.com/blog/2022/10/15/why-xetdata](https://xetdata.com/blog/2022/10/15/why-xetdata)
* Hacker News discussion (launched on Show HN at #1): [https://news.ycombinator.com/item?id=33969908](https://news.ycombinator.com/item?id=33969908)

https://preview.redd.it/t9tf3kt5i96a1.png?width=1740&format=png&auto=webp&s=184dd57d9f3d4e1dea94f8ab02211f663e214e84

/r/MachineLearning
https://redd.it/znfgap

Читать полностью…

Data Scientology

[OC] How long is each US president's Wikipedia page?

/r/dataisbeautiful
https://redd.it/zms0td

Читать полностью…

Data Scientology

Easy to build and high-end visualizations for Google Slides and Notion

Hello community,

For all those like me, who were struggling to build visualizations on Google Slides or simply felt they lacked the high quality charts they needed, Rollstack has created a simple and powerful charting and visualization tool for Google slides and Notion.

Its users especially enjoy the massive time gains when building charts, slides, and documents. Analytics, strategy, bizops, finance, marketing, and sales teams

Here's a short product demo.

Let me know what is your current experience building charts on Google slides and Notion?

/r/visualization
https://redd.it/zmjfvp

Читать полностью…

Data Scientology

[OC] Fast fashion companies add new items to their sites all the time. Shein is the worst, with 60,000 new items each month.

/r/dataisbeautiful
https://redd.it/zmiezz

Читать полностью…

Data Scientology

WLB suddenly turned toxic

Everything was nice, the WLB was good. Then the company got acquired by another larger fish and the WLB changed badly. I am currently working on a project where the manager expects us to work all day long. He himself works till 2-3AM. And I don't want to know why and how, but this will always be a mystery

He keeps saying this is critical and has a tight deadline.
I wish I could just say f### this criticality and tight deadlines. I can't be working 12-14 hours everyday and exhaust myself. I literally see blurr and severe headaches after the 10-11th hour.

This has been going on for 3 weeks continuously now and every time he keeps saying how we need to pace up and match his level of speed and commitment and he literally asks us to be "robots" and keep working.

I did hear of past employees changing teams because of his way of working.

PS : there is a time difference of 6 hours and he literally keeps on messaging us on MS Teams for updates, asks us for calls and updates at midnight!!!

/r/datascience
https://redd.it/zmr3an

Читать полностью…

Data Scientology

Areas Under Arab control at one point or another in Europe

/r/MapPorn
https://redd.it/zmojg8

Читать полностью…

Data Scientology

Global Distribution of Penguins

/r/MapPorn
https://redd.it/zmbf0m

Читать полностью…

Data Scientology

The symmetry of the orbits of a double pendulum, dropped from rest, at every initial angles. [OC]

/r/mathpics
https://redd.it/zlh45q

Читать полностью…

Data Scientology

P Image search with localization and open-vocabulary reranking.

TL;DR

Image search with open vocabulary localization using both index and search time methods.

Article (no paywall): jesse_894/image-search-with-localization-and-open-vocabulary-reranking-using-marqo-yolox-clip-and-owl-vit-9c636350bf66?source=friends_link&sk=b4e94d9d4095a2b8b60c5d1904a60825">jesse" rel="nofollow">https://medium.com/@jesse\_894/image-search-with-localization-and-open-vocabulary-reranking-using-marqo-yolox-clip-and-owl-vit-9c636350bf66?source=friends\_link&sk=b4e94d9d4095a2b8b60c5d1904a60825

Markdown: https://github.com/marqo-ai/marqo/blob/mainline/examples/ImageSearchLocalization/article.md

Code: https://github.com/marqo-ai/marqo/blob/mainline/examples/ImageSearchLocalization/index\_all\_data.py

I wanted to have a few choices getting localization into image search (index and search time). I immediately thought of using a region proposal network (rpn) from mask-rcnn to create patches that can also be indexed and searched (and add the localisation). I figured it might be somewhat agnostic to classes. I did not want to use mmdetection or detectron2 due to their dependencies and just getting the rpn was not worth it. I was encouraged by the PyTorch native implementations of detection/segmentation models but ended up finding yolox the best.

I also implemented one based on the self attention maps from the DINO trained ViT’s. This worked pretty well when the attention maps were combined with some traditional computer vision to get bounding boxes. It seemed an ok compromise between domain specialization and location specificity. I did not try any saliency or gradient based methods as i was not sure on generalization and speed respectively. I know LAVIS has an implementation of grad cam and it seems to work well in the plug'n'play vqa.

For the indexing I cropped the images based on the proposed bounding boxes. I did not test blending methods but feel this might be better as more context can be in the image. If anyone has a perspective on this I would love to hear it.

For localisation at search time I ended up using OWL-ViT. This worked really well. I did not try Detic or CLIPseg but would be interested to hear if anyone else has tried these?

/r/MachineLearning
https://redd.it/zmigt1

Читать полностью…

Data Scientology

What is Apache Arrow? by Pandas Creator Wes McKinnley
https://youtu.be/DTqGMRYcEt0

/r/bigdata
https://redd.it/zl19xa

Читать полностью…

Data Scientology

Sexual Racism Experienced by Asian American LGBTQ+ Men in Online Dating (Asian-American & Pacific Islander men who have sex with men)

Open to all Asian American & Pacific Islander men who have sex with men (including transmasculine individuals and those who are nonbinary but use man/male as an identifier) and have experience with online dating.

Survey Link: https://umassboston.co1.qualtrics.com/jfe/form/SV\_bpvn9N6m3FdAiPQ

This study is sponsored by the UMass Boston Department of Psychology and supervised by Dr. David Pantalone. Our LGBTQIA+ affirmative research lab is dedicated to advancing the health of LBGTQIA+ communities. The study has been approved by the UMass Boston Institutional Review Board (IRB#2021228). For any questions, please contact the principal investigator, Christopher Chiu, at cchiu.umbstudy@gmail.com.

/r/SampleSize
https://redd.it/zni7jo

Читать полностью…

Data Scientology

Order 4 projective plane with parabolas

/r/mathpics
https://redd.it/zhfbeg

Читать полностью…

Data Scientology

If you've held Bitcoin for five years, you're now sitting on a negative return [OC]

/r/dataisbeautiful
https://redd.it/znpurr

Читать полностью…

Data Scientology

[OC] Top 10 YouTube Channels

/r/dataisbeautiful
https://redd.it/znkp2n

Читать полностью…

Data Scientology

[OC] The US leads the way by a mile in government space budgets, with the Artemis mission sending humans back to the moon for the first time in 50 years

/r/dataisbeautiful
https://redd.it/znb677

Читать полностью…

Data Scientology

ROAD FATALITIES IN EUROPE

/r/MapPorn
https://redd.it/znanhn

Читать полностью…

Data Scientology

[OC] The U.S. spends one third of its tax revenue on its military

/r/dataisbeautiful
https://redd.it/zn2l2l

Читать полностью…

Data Scientology

US States and Canadian Provinces' total sales tax.

/r/MapPorn
https://redd.it/zn72gr

Читать полностью…

Data Scientology

How do you abbreviate cumulative in your feature names?

Not trolling but I want to know how do you all abbreviate cumulative x for a feature name. For example cumulative streamed minutes for past 7 days can be cum_streamed_min_L7. I feel uncomfortable putting that name in presentations.

/r/datascience
https://redd.it/zmuchu

Читать полностью…

Data Scientology

[OC] Military expenditure (% of GDP) of the U.S. from 1993 to 2020

/r/dataisbeautiful
https://redd.it/zn2s5f

Читать полностью…

Data Scientology

D Trying to find paper about n-grams in early transformer layers

I remember reading a paper a while back that showed early attention layers in a transformer could be replaced with a simpler mechanism since most heads only modeled small n-grams. I think they used some kind of pooling?

Wondering if anyone knows which paper that was and had any thoughts about it since then. Thanks!

/r/MachineLearning
https://redd.it/zmoxp7

Читать полностью…

Data Scientology

shoe color frequency

/r/dataisugly
https://redd.it/zml4e6

Читать полностью…

Data Scientology

Are simulations done by data scientists or someone else?

​

\[Picture of a traffic simulation in Unreal Engine\](https://preview.redd.it/qfep38a8rv5a1.png?width=800&format=png&auto=webp&v=enabled&s=8237dce33de4a01dc3f07787f7c88f902aef791f)

I put together a blog this morning (https://whiteowleducation.substack.com/p/why-are-simulations-the-future-of) that builds off of a reddit discussion from yesterday.

I am genuinely curious though. Are any of you using simulations for your day-to-day work?

I ask because I see reports of the following:

Nvidia using simulations for weather prediction
BMW using simulations in order to optimize factory layout
There have been recent discoveries in Nuclear Fusion, and I have to believe that simulations were used to help set up those experiments.
I even see traffic Jams being simulated in Unreal Engine

​

Traffic Jam simulations are even photorealistic.

Long story short, it seems like people are doing simulations, but would this go under the "data science" job title, or is there a different profession that does this kind of work?

/r/datascience
https://redd.it/zlt5nt

Читать полностью…

Data Scientology

[OC] Mean World Cup players name length

/r/dataisbeautiful
https://redd.it/zm3jm0

Читать полностью…
Subscribe to a channel