☭ Open Source Collective
⛓ Read between the lines ... only people from more equal countries will be accepted.
⚠️ Also kind of unsettling ... that the majority of OSC community supported FOSS projects (most of them from OECD countries) ... cannot appoint 2 maintainers.
☭ This is just hilarious that they have to outdo themselves in double-speak to make the FOSS community put their own chains.
ML Data Fair Use
... is still a grey zone and and a form of Western exploitation and molding the whole world to their image / system / etc:
By offering Copilot as an alternative interface to a large body of open-source code, Microsoft is doing more than severing the legal relationship between open-source authors and users. Arguably, Microsoft is creating a new walled garden that will inhibit programmers from discovering traditional open-source communities. Or at the very least, remove any incentive to do so. Over time, this process will starve these communities. User attention and engagement will be shifted into the walled garden of Copilot and away from the open-source projects themselves—away from their source repos, their issue trackers, their mailing lists, their discussion boards. This shift in energy will be a painful, permanent loss to open source.
nmslib
... where they get all the accolades possible, but no visible financial benefits. The system just does not function this way.
Читать полностью…
ML Digest 2022-10
📌 Digest link
📌 This month spotlight goes to:
- The State of AI Report itself
- My Lazy Take on State of AI Report 2022
#digest
Oh capitalism
https://digitstodollars.com/2022/10/27/how-the-might-have-fallen/
https://www.semianalysis.com/p/arm-changes-business-model-oem-partners
Why pay "Open"AI when another company boasts a valuation of 50% of your market cap and publishes a better model ... for free?
Читать полностью…Silero VAD V4
📌 Major improvements:
- Major quality improvements;
- Improved performance (2-3x faster under some scenarios);
- Added ONNX support both for 8 kHz and 16 kHz;
AMD HPC Vendor Lock up
Forgot to drop 3 lines about AMD.
Having an R&D budget about 10% of Nvidia or Intel's, in 2018-2020 they managed to win the CPU market of PC power users (with their ThreadRippers) - ML / CG / VFX etc
Then they proceeded with ThreadRipper PRO, which was a buggy fail, while their enterprise EPYC line was a great success.
And as usual with success comes vendor lock up. New ML hardware platforms from SuperMicro for AMD EPYC processors ... are sold only assembled with A100 or H100 GPUs.
Why? Because you can charge 2-3x markup on "assembled" + trade wars.
Sad but true. Capitalism never changes. They always ride the wave of "enthusiasts and power users" to ultimately betray them and charge 10x for basically the same hardware.
When looking at WorldBank, WEF and some consulting company reports and white-papers I always wondered if anybody reads them.
Here is a possible answer - No
https://img.washingtonpost.com/blogs/wonkblog/files/2014/05/pdfs.jpg
They do not understand that making content more reachable and SEO-friendly helps long-term. But SEO-friendly websites are usually full of bullshit.
#internet
If after updating your Ubuntu packages your dockerized application suddenly fails to see the GPU(s), then you should migrate to nvidia-docker-2.
Links:
https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
Do not forget to read section Removing nvidia-docker 1.0
In my case all was solved simply by copy-pasting their commands:
# https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
# migrate to NVIDIA docker 2
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo apt-get purge nvidia-docker
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install nvidia-docker2
sudo pkill -SIGHUP dockerd
# testing
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
#linux
Kaggle stats for 2017
- http://blog.kaggle.com/2018/01/22/reviewing-2017-and-previewing-2018/
I literally choked when I read this:
- $1.5MM competition with TSA to identify threat objects from body scans
- this was a competition where only US citizens were granted prizes => 10 stacked resnets won
- $1.2MM competition with Zillow to improve the Zestimate home valuation algorithm - this has 2 stages, first stage prize was US$50k
- $1MM competition with NIH and Booz Allen to diagnose lung cancer from CT scans - this one was really great - but I did not know much back then, it was early 2017 =(
Also I am not a great data scientist per se, but just comparing the amount of cringe and shitty train/test splits - DoubleData is much better than Kaggle in terms of data SCIENCE.
#data_science
A couple more articles about idiomatic pandas
- https://tomaugspurger.github.io/modern-4-performance
- https://tomaugspurger.github.io/modern-5-tidy
What was useful for me
- Faster reading of the dataframes
- Stack, unstack
- Melt
#data_science
#pandas
A new interesting competition on topcoder
- https://goo.gl/ix7xpx
At least at first glance)
#data_science
#deep_learning
So following out tweetsender (https://goo.gl/uqGRRA) (which has some bugs still) github.com/borsik helped us to create high-level telegram to twitter bot:
- Code https://github.com/snakers4/telegram2twitter
- Twitter channel https://twitter.com/AlexanderVeysov
Effectively this posts our channel to Twitter. Please star/use code if you need it.
#data_science
Choking Off China’s Access to the Future of AI
New U.S. Export Controls on AI and Semiconductors Mark a Transformation of U.S. Technology Competition with China
https://www.csis.org/analysis/choking-chinas-access-future-ai
PS
Wake up Mr. Anderson
I like the American arrogance kind of assuming that Taiwan is not China, that sanctions against Russia are very effective and the fact that all of these actions are lawful and not an act of open warfare.
The good thing is that the longer they are in denial, the easier it is for the rest of the world to topple the "chokeholds" over.
PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips
- torchaudio
seems to be kind of frozen
- CUDA 10.2 and 11.3 deprecated
- BetterTransformer
- import functorch
without installing another package
- Native builds for Apple's new M1 chip as a beta feature,
- Python 3.11 preview
- Default ONNX opset updated to 14
Looks like they are trimming the fat since the migration to the Linux foundation.
Links:
- https://pytorch.org/blog/PyTorch-1.13-release/
- https://github.com/pytorch/pytorch/releases/tag/v1.13.0
Caliptra – First Open-Source Silicon Going Into All Datacenter Chips
https://www.semianalysis.com/p/caliptra-first-open-source-silicon
The first instance of the open-source revolution finally coming to silicon ...
I like the first comment:
> This seems very anti-open-source if it's meant to stop the owners of hardware from running their own code on it, which appears to be the case. Earlier versions of that are why the GPLv3 was written
What is amazing about tf and CUDA / CUDNN drivers - that documentation is not updated when newer versions are released - and they are always changing library file names which is annoying af.
Arguably Google and Nvidia are the richest companies from the whole DS stack - but their documentations is the worst of all the richest companies.
So if you are updating your docker container and libraries suddenly start producing weird errors - look for compatibility guidelines like this one - https://goo.gl/cF3Swy
Of course docs and release note will have no mention of this. Because Google.
Also docker hub contains all the versions of CUDA+CUDDNN packaged, which helps
- https://hub.docker.com/r/nvidia/cuda/
PS
Pytorch has all this embedded into their official repo list
- http://prntscr.com/i6nfsl
Google, why do you make us suffer?
#deep_learning
Best link about convolution arithmetic
-https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
#deep_learning
New dimensionality reduction technique - UMAP
- https://github.com/lmcinnes/umap
I will write more as I test it / learn more.
Works well with HDBSCAN and CNNs I guess
- https://goo.gl/9hYAXL
Usage examples
- https://goo.gl/QuYWJF
#data_science
Playing with HDBSCAN in practice.
What I learned. If you have a non-sparse feature vector, i.e. 1000+ - 5000+ dimensions, then you should use PCA before using HDBSCAN.
Their scalability how-to (https://goo.gl/iR9HQu) does all the benchmarks on 10 dimension vectors. In practice anything above 50-100 dimensions faced some kind of bottle-neck - the memory consumption was low, the CPU consumption was also low - but nothing pretty much happened for hours.
Also if you want to have large clusters and set (https://goo.gl/eikRy4) min_samples value to >> 100, then there will me a memory explosion due to some kind of caching issue. So if your cluster size should be 5000+, then you are compelled to use min_samples ~ 100.
#data_science
Key / classic CNN papers
ShuffleNet
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- a small resnet-like network that uses pointwise separable covolutions and depthwise separable convolutions and a shuffle layer
- authors - Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun
- paper - http://arxiv.org/abs/1707.01083
- key
-- on ARM devices 13x faster that Alexnet
-- lower top1 error than MobileNet at 40 MFLOPs
- comparable to small versions of NASNET
- 2 ideas
-- use depth-wise separable convolutions for 3x3 and 1x1 convolutions
-- use shuffle layer (flatten, transpose, resize back to original dimension)
- illustrations
-- shuffle idea - https://goo.gl/zhTV4E
-- building blocks - https://goo.gl/kok7bL
-- vs. key architectures https://goo.gl/4usdM9
-- vs. MobileNet https://goo.gl/rGoPWX
-- actual inference on mobile device - https://goo.gl/X6vbnd
#deep_learning
#data_science
For new (!) people on the channel:
- This channel is a practicioner's channel on the following topics: internet, data science, math, deep learning, philosophy
- Focus is on data science
- Don't get your opinion in a twist if your opinion differs. You are welcome to contact me via telegram @snakers41 and email - aveysov@gmail.com
- No bs and ads
Give us a rating:
- /channel/tchannelsbot?start=snakers4
Donations
- Buy me a coffee https://buymeacoff.ee/8oneCIN
- Direct donations - https://goo.gl/kvsovi - 5011673505 (paste this agreement number)
- Yandex - https://goo.gl/zveIOr
Our website
- http://spark-in.me
Our chat
- https://goo.gl/IS6Kzz
DS courses review
- http://goo.gl/5VGU5A
- https://spark-in.me/post/learn-data-science
GAN papers review
- https://spark-in.me/post/gan-paper-review
Internet Digest
- Ben Evans - https://goo.gl/TPyLoD
- Youtube tightening moderation screws for small channels - https://goo.gl/SHpC2h
- Camera strapped to plane - https://vimeo.com/240106846
- Guardian online getting profitable - https://goo.gl/CDpNFb
- Amazon testing a shop wo cashiers - you just take goods and walk out - https://goo.gl/hvh63Z
- Drone saving a drowning person - https://goo.gl/RdGYDx
ГЫ
- А это отлично зайдет русским ко-ко-ко разрабам и культуре "обсирания всего", которая царит в нашем IT - https://goo.gl/S5poqv
#internet
#digest
Pytorch in a year review
http://pytorch.org/2018/01/19/a-year-in.html
#deep_learning