snakers4 | Technologies

Telegram-канал snakers4 - Spark in me

2278

Lost like tears in rain. DS, ML, a bit of philosophy and math. No bs or ads.

Subscribe to a channel

Spark in me

Open Source Collective

⛓ Read between the lines ... only people from more equal countries will be accepted.

⚠️ Also kind of unsettling ... that the majority of OSC community supported FOSS projects (most of them from OECD countries) ... cannot appoint 2 maintainers.

☭ This is just hilarious that they have to outdo themselves in double-speak to make the FOSS community put their own chains.

Читать полностью…

Spark in me

ML Data Fair Use

... is still a grey zone and and a form of Western exploitation and molding the whole world to their image / system / etc:

By offer­ing Copi­lot as an alter­na­tive inter­face to a large body of open-source code, Microsoft is doing more than sev­er­ing the legal rela­tion­ship between open-source authors and users. Arguably, Microsoft is cre­at­ing a new walled gar­den that will inhibit pro­gram­mers from dis­cov­er­ing tra­di­tional open-source com­mu­ni­ties. Or at the very least, remove any incen­tive to do so. Over time, this process will starve these com­mu­ni­ties. User atten­tion and engage­ment will be shifted into the walled gar­den of Copi­lot and away from the open-source projects them­selves—away from their source repos, their issue track­ers, their mail­ing lists, their dis­cus­sion boards. This shift in energy will be a painful, per­ma­nent loss to open source.


https://githubcopilotinvestigation.com/?mc_cid=b1a21b7286

Unless you have some corporate / deep pocket sponsors and / or you are doing something truly unique, there is LESS AND LESS incentive to publish FOSS.

I like the story of libraries like nmslib ... where they get all the accolades possible, but no visible financial benefits. The system just does not function this way.

Читать полностью…

Spark in me

ML Digest 2022-10

📌
Digest link

📌 This month spotlight goes to:

- The State of AI Report itself
- My Lazy Take on State of AI Report 2022

#digest

Читать полностью…

Spark in me

Oh capitalism

https://digitstodollars.com/2022/10/27/how-the-might-have-fallen/

https://www.semianalysis.com/p/arm-changes-business-model-oem-partners

Читать полностью…

Spark in me

Why pay "Open"AI when another company boasts a valuation of 50% of your market cap and publishes a better model ... for free?

Читать полностью…

Spark in me

Silero VAD V4

📌 Major improvements:

- Major quality improvements;
- Improved performance (2-3x faster under some scenarios);
- Added ONNX support both for 8 kHz and 16 kHz;

Читать полностью…

Spark in me

AMD HPC Vendor Lock up

Forgot to drop 3 lines about AMD.

Having an R&D budget about 10% of Nvidia or Intel's, in 2018-2020 they managed to win the CPU market of PC power users (with their ThreadRippers) - ML / CG / VFX etc

Then they proceeded with ThreadRipper PRO, which was a buggy fail, while their enterprise EPYC line was a great success.

And as usual with success comes vendor lock up. New ML hardware platforms from SuperMicro for AMD EPYC processors ... are sold only assembled with A100 or H100 GPUs.

Why? Because you can charge 2-3x markup on "assembled" + trade wars.

Sad but true. Capitalism never changes. They always ride the wave of "enthusiasts and power users" to ultimately betray them and charge 10x for basically the same hardware.

Читать полностью…

Spark in me

https://youtu.be/spUNpyF58BY

Читать полностью…

Spark in me

https://youtu.be/SHTOI0KtZnU

Читать полностью…

Spark in me

When looking at WorldBank, WEF and some consulting company reports and white-papers I always wondered if anybody reads them.

Here is a possible answer - No
https://img.washingtonpost.com/blogs/wonkblog/files/2014/05/pdfs.jpg

They do not understand that making content more reachable and SEO-friendly helps long-term. But SEO-friendly websites are usually full of bullshit.

#internet

Читать полностью…

Spark in me

If after updating your Ubuntu packages your dockerized application suddenly fails to see the GPU(s), then you should migrate to nvidia-docker-2.

Links:
https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)

Do not forget to read section Removing nvidia-docker 1.0
In my case all was solved simply by copy-pasting their commands:

# https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
# migrate to NVIDIA docker 2
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo apt-get purge nvidia-docker

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

sudo apt-get install nvidia-docker2
sudo pkill -SIGHUP dockerd

# testing
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
#linux

Читать полностью…

Spark in me

Kaggle stats for 2017
- http://blog.kaggle.com/2018/01/22/reviewing-2017-and-previewing-2018/

I literally choked when I read this:
- $1.5MM competition with TSA to identify threat objects from body scans
- this was a competition where only US citizens were granted prizes => 10 stacked resnets won
- $1.2MM competition with Zillow to improve the Zestimate home valuation algorithm - this has 2 stages, first stage prize was US$50k
- $1MM competition with NIH and Booz Allen to diagnose lung cancer from CT scans - this one was really great - but I did not know much back then, it was early 2017 =(

Also I am not a great data scientist per se, but just comparing the amount of cringe and shitty train/test splits - DoubleData is much better than Kaggle in terms of data SCIENCE.

#data_science

Читать полностью…

Spark in me

A couple more articles about idiomatic pandas
- https://tomaugspurger.github.io/modern-4-performance
- https://tomaugspurger.github.io/modern-5-tidy

What was useful for me
- Faster reading of the dataframes
- Stack, unstack
- Melt

#data_science
#pandas

Читать полностью…

Spark in me

A new interesting competition on topcoder
- https://goo.gl/ix7xpx

At least at first glance)

#data_science
#deep_learning

Читать полностью…

Spark in me

So following out tweetsender (https://goo.gl/uqGRRA) (which has some bugs still) github.com/borsik helped us to create high-level telegram to twitter bot:

- Code https://github.com/snakers4/telegram2twitter
- Twitter channel https://twitter.com/AlexanderVeysov

Effectively this posts our channel to Twitter. Please star/use code if you need it.

#data_science

Читать полностью…

Spark in me

Choking Off China’s Access to the Future of AI

New U.S. Export Controls on AI and Semiconductors Mark a Transformation of U.S. Technology Competition with China

https://www.csis.org/analysis/choking-chinas-access-future-ai

PS

Wake up Mr. Anderson

I like the American arrogance kind of assuming that Taiwan is not China, that sanctions against Russia are very effective and the fact that all of these actions are lawful and not an act of open warfare.

The good thing is that the longer they are in denial, the easier it is for the rest of the world to topple the "chokeholds" over.

Читать полностью…

Spark in me

Other Digests 2022-10

Link

#digest

Читать полностью…

Spark in me

PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips

- torchaudio seems to be kind of frozen
- CUDA 10.2 and 11.3 deprecated
- BetterTransformer
- import functorch without installing another package
- Native builds for Apple's new M1 chip as a beta feature,
- Python 3.11 preview
- Default ONNX opset updated to 14

Looks like they are trimming the fat since the migration to the Linux foundation.

Links:

- https://pytorch.org/blog/PyTorch-1.13-release/
- https://github.com/pytorch/pytorch/releases/tag/v1.13.0

Читать полностью…

Spark in me

Yeah, looks like OEM model won

Читать полностью…

Spark in me

⬆️ https://habr.com/ru/post/695738/ ⬆️

Читать полностью…

Spark in me

Caliptra – First Open-Source Silicon Going Into All Datacenter Chips

https://www.semianalysis.com/p/caliptra-first-open-source-silicon

The first instance of the open-source revolution finally coming to silicon ...

I like the first comment:

> This seems very anti-open-source if it's meant to stop the owners of hardware from running their own code on it, which appears to be the case. Earlier versions of that are why the GPLv3 was written

Читать полностью…

Spark in me

What is amazing about tf and CUDA / CUDNN drivers - that documentation is not updated when newer versions are released - and they are always changing library file names which is annoying af.

Arguably Google and Nvidia are the richest companies from the whole DS stack - but their documentations is the worst of all the richest companies.

So if you are updating your docker container and libraries suddenly start producing weird errors - look for compatibility guidelines like this one - https://goo.gl/cF3Swy

Of course docs and release note will have no mention of this. Because Google.

Also docker hub contains all the versions of CUDA+CUDDNN packaged, which helps
- https://hub.docker.com/r/nvidia/cuda/

PS
Pytorch has all this embedded into their official repo list
- http://prntscr.com/i6nfsl

Google, why do you make us suffer?

#deep_learning

Читать полностью…

Spark in me

Best link about convolution arithmetic
-https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md

#deep_learning

Читать полностью…

Spark in me

New dimensionality reduction technique - UMAP
- https://github.com/lmcinnes/umap

I will write more as I test it / learn more.

Works well with HDBSCAN and CNNs I guess
- https://goo.gl/9hYAXL

Usage examples
- https://goo.gl/QuYWJF

#data_science

Читать полностью…

Spark in me

Playing with HDBSCAN in practice.

What I learned. If you have a non-sparse feature vector, i.e. 1000+ - 5000+ dimensions, then you should use PCA before using HDBSCAN.

Their scalability how-to (https://goo.gl/iR9HQu) does all the benchmarks on 10 dimension vectors. In practice anything above 50-100 dimensions faced some kind of bottle-neck - the memory consumption was low, the CPU consumption was also low - but nothing pretty much happened for hours.

Also if you want to have large clusters and set (https://goo.gl/eikRy4) min_samples value to >> 100, then there will me a memory explosion due to some kind of caching issue. So if your cluster size should be 5000+, then you are compelled to use min_samples ~ 100.

#data_science

Читать полностью…

Spark in me

Last post should have contained "DrivenData". I stand corrected.

Читать полностью…

Spark in me

Key / classic CNN papers

ShuffleNet

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- a small resnet-like network that uses pointwise separable covolutions and depthwise separable convolutions and a shuffle layer
- authors - Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun
- paper - http://arxiv.org/abs/1707.01083
- key
-- on ARM devices 13x faster that Alexnet
-- lower top1 error than MobileNet at 40 MFLOPs
- comparable to small versions of NASNET
- 2 ideas
-- use depth-wise separable convolutions for 3x3 and 1x1 convolutions
-- use shuffle layer (flatten, transpose, resize back to original dimension)
- illustrations
-- shuffle idea - https://goo.gl/zhTV4E
-- building blocks - https://goo.gl/kok7bL
-- vs. key architectures https://goo.gl/4usdM9
-- vs. MobileNet https://goo.gl/rGoPWX
-- actual inference on mobile device - https://goo.gl/X6vbnd

#deep_learning
#data_science

Читать полностью…

Spark in me

For new (!) people on the channel:

- This channel is a practicioner's channel on the following topics: internet, data science, math, deep learning, philosophy
- Focus is on data science
- Don't get your opinion in a twist if your opinion differs. You are welcome to contact me via telegram @snakers41 and email - aveysov@gmail.com
- No bs and ads

Give us a rating:
- /channel/tchannelsbot?start=snakers4

Donations
- Buy me a coffee https://buymeacoff.ee/8oneCIN
- Direct donations - https://goo.gl/kvsovi - 5011673505 (paste this agreement number)
- Yandex - https://goo.gl/zveIOr

Our website
- http://spark-in.me
Our chat
- https://goo.gl/IS6Kzz
DS courses review
- http://goo.gl/5VGU5A
- https://spark-in.me/post/learn-data-science
GAN papers review
- https://spark-in.me/post/gan-paper-review

Читать полностью…

Spark in me

Internet Digest
- Ben Evans - https://goo.gl/TPyLoD
- Youtube tightening moderation screws for small channels - https://goo.gl/SHpC2h
- Camera strapped to plane - https://vimeo.com/240106846
- Guardian online getting profitable - https://goo.gl/CDpNFb
- Amazon testing a shop wo cashiers - you just take goods and walk out - https://goo.gl/hvh63Z
- Drone saving a drowning person - https://goo.gl/RdGYDx

ГЫ
- А это отлично зайдет русским ко-ко-ко разрабам и культуре "обсирания всего", которая царит в нашем IT - https://goo.gl/S5poqv


#internet
#digest

Читать полностью…

Spark in me

Pytorch in a year review
http://pytorch.org/2018/01/19/a-year-in.html

#deep_learning

Читать полностью…
Subscribe to a channel