snakers4 | Technologies

Telegram-канал snakers4 - Spark in me

2278

Lost like tears in rain. DS, ML, a bit of philosophy and math. No bs or ads.

Subscribe to a channel

Spark in me

Kaggle launching ... MS/CNN/DS education course
- https://twitter.com/fchollet/status/955538373992591360
- https://www.kaggle.com/learn/overview

Once again syllabus looks ok, but I would not really have time now to analyze it.
Please pm me if you have watched it.
If you are just starting - Andrew Ng would be a safer bet.

#data_science
#course

Читать полностью…

Spark in me

A list of nice to read articles (RU)
- Nice article about credit score competition - https://goo.gl/7cy3Y1
- Feature engineering https://goo.gl/NkoxWQ
- If you a hardware strapped - RPI + Movidius stick may work for inference better that just RPI - https://goo.gl/HC7Uj8

#data_science
#deep_learning

Читать полностью…

Spark in me

Like the twitter repost idea?

Meh – 11
👍👍👍👍👍👍👍 52%

Yes – 8
👍👍👍👍👍 38%

No – 2
👍 10%

👥 21 people voted so far.

Читать полностью…

Spark in me

Soon we will be broadcasting out channel to Twitter (and release our code for that)
- https://twitter.com/AlexanderVeysov

This is the first test post.

Читать полностью…

Spark in me

List of impressive ML projects in 2017
- https://habrahabr.ru/company/cloud4y/blog/346968/

The majority of them are totally impractical, of course =)

#data_science

Читать полностью…

Spark in me

Following our tweet-sender I had an idea.

Both Twitter and Telegram have APIs and python bindings.

So why not stream our telegram channel to Twitter? If you want to help us write a class for a $$ reward - please contact me.

Читать полностью…

Spark in me

New fast, easy-to-use and efficient clustering Algorithm - HDBSCAN. It is really amazing. I am not joking.

Quick links:
- paper https://arxiv.org/pdf/1602.03730.pdf
- library http://hdbscan.readthedocs.io/en

List of plain vanilla algorithms:
- (data) https://goo.gl/6KexoU
- K-Means (https://goo.gl/kjbA1f)
- Affinity Propagation (https://goo.gl/VrX4sy)
- Mean Shift (https://goo.gl/TekyML)
- Spectral Clustering https://goo.gl/RUifoa
- Agglomerative Clustering - https://pypi.python.org/pypi/fastcluster

Newer ones
- DBSCAN (https://goo.gl/DQK2Z3)
-- 2 steps
--- transform - points in dense regions are left alone, while points in sparse regions are moved further away
--- apply single linkage clustering to the transformed space results in a dendrogram

- HDBSCAN (https://goo.gl/XD4y8T)
-- goal was to allow varying density clusters
-- transform the space according to density (!)
-- single linkage clustering on the transformed space
-- the dendrogram is condensed by viewing splits that result in a small number of points splitting off as points ‘falling out of a cluster’

Key evaluation criteria
- Don’t be wrong!
- Intuitive parameters
- Stability
- Performance

Plain English comparsion of different algorithms
- http://hdbscan.readthedocs.io/en/latest/comparing_clustering_algorithms.html
- notebook http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb

Naive Benching different clustering algorithms
- http://hdbscan.readthedocs.io/en/latest/performance_and_scalability.html
- http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations-v0.7.ipynb
- speed bench http://hdbscan.readthedocs.io/en/latest/_images/performance_and_scalability_9_1.png
- huge datasets http://hdbscan.readthedocs.io/en/latest/_images/performance_and_scalability_24_1.png

How HDBSCAN works
- http://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html
- Notebook https://goo.gl/iPy23p
- Transform the space according to the density/sparsity
- Build the minimum spanning tree of the distance weighted graph - https://goo.gl/EmQwd8
- Construct a cluster hierarchy of connected components - https://goo.gl/oHZo2W
- Condense the cluster hierarchy based on minimum cluster size - https://goo.gl/awSXjC
- Extract the stable clusters from the condensed tree

Also HDBSCAN has a notion of soft-clustering a custom cluster distans that works for oddly shaped clusters
- http://hdbscan.readthedocs.io/en/latest/soft_clustering_explanation.html
- http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/How%20Soft%20Clustering%20for%20HDBSCAN%20Works.ipynb

#data_science
#clustering

Читать полностью…

Spark in me

Digital Ocean just improved their tariffs (more storage and RAM) - best modern VDS provider.

Transferring your application literally takes 2 minutes
- https://goo.gl/AtVLns

#internet

Читать полностью…

Spark in me

Fast.ai about SF culture
- http://www.fast.ai/2018/01/08/startups/

Nice article about remote working
- https://hackernoon.com/the-stress-of-remote-working-38be5bdcf4da

Читать полностью…

Spark in me

A subscriber asked about being efficient under the end-of 2017 article.

Here is my rant in reply
https://spark-in.me/post/plain-efficiency

Let the war in comments begin.

#philosophy

Читать полностью…

Spark in me

Interesting links / news / reports / data

Technology
- TVs and household items being replaced by smartphones => good for ecology and resources - https://goo.gl/3nw15t
- Once again - Meltdown + Spectre - https://goo.gl/fNrZGV


Internet
- Ben Evans - https://goo.gl/usr11B
- Amazon business structure - https://goo.gl/YKAB9F - hundreds of separate business units
- Uber management planning to sell shares - https://goo.gl/yJMqgc
- Google sold 6M smart speakers in 2017 - https://goo.gl/TVnSyY
- Amazon will use Alexa ... for ads - https://goo.gl/tS3gTU
- Facebook vs fake news https://goo.gl/mabfp6
- Dark side of the Internet - moderation - https://goo.gl/gBcyXx

Mobile
- Apple cripples 3rd party AdTech - https://goo.gl/QdpWwX
- Stats about Facebook chat app - https://newsroom.fb.com/news/2017/12/messengers-2017-year-in-review/
- In USA instagram is dominated by bra commercials - https://goo.gl/Ch7ipB
- Dating apps kill gay bars - https://goo.gl/qyTTk9
- App store 2017 YoY +30% revenue growth - https://goo.gl/xQFBxz
- 50%+ households in the USA are wireless only - https://goo.gl/WUXNRY

ML / DS
- If you have not seen WaveNet speech generation examples go here - https://goo.gl/kbjWXJ
- Apple Maps vs Google Maps - https://goo.gl/yMNth3
-- Looks like google is using some processing and ML to enhance their maps constantly
-- 3D buildings, small buildings, areas of interest etc
-- Timeline http://prntscr.com/i0kf4x
- Solid state LIDARs will be much cheaper - https://goo.gl/YZomWc
- Creepy ML - Google street images => car models => predictions about race / income / job per household / address / zip-code - https://goo.gl/mTXyW5
- An astronomer shared his experience after spending 3 years getting a Data Science degree - https://goo.gl/KgTmNp

#digest

Читать полностью…

Spark in me

https://youtu.be/YuIIjLr6vUA

Читать полностью…

Spark in me

Interesting datasets from Kaggle

Predict breast cancer from slide images
https://goo.gl/rDxrpZ

High quality academic dataset of 26k images of 41 fruits
https://goo.gl/JLWvLD


Gorgeous illustration of different network algorithms
https://goo.gl/z7oori

Crowd-sourced translation of parallel sentence pairs
https://goo.gl/7ky8Vw


5 years of hourly weather data for 36 cities
https://goo.gl/jjkRSq

#data_science
#datasets

Читать полностью…

Spark in me

A small hack for using multi-line python CLI commands via bash.

Just paste your long python command into script.sh
python3 train_satellites.py \
--arch linknet34 --batch-size 16 \
--imsize 320 --preset mul_urban --augs True\
--workers 6 --epochs 30 --start-epoch 0 \
--seed 42 --print-freq 50 \
--lr 1e-4 --optimizer adam \
--tensorboard True --tensorboard_images True --lognumber test\Then just:
sh script.sh
#data_science

Читать полностью…

Spark in me

https://youtu.be/_BPJFFkxSbw

Читать полностью…

Spark in me

Tested bcolz on a simple premise - how fast can it process 1M (1,3476) feature vectors from CNN. Also looks like it provides 2-3x compression straight out of the box. Nice.

Blazingly fast!
- https://goo.gl/z1MKmH

#data_science

Читать полностью…

Spark in me

2017 DS/ML digest 1

Did not do digests quite for some time =)

1. Annual digests
1.1 Google Brain one - https://goo.gl/VQhZmP two https://goo.gl/XkTRhp
Highlights
- Speech generation https://goo.gl/MEDv7M
- Speech recognition https://goo.gl/tCEkVz
- Auto ML https://goo.gl/fx2FuP
-- NASNET - https://goo.gl/becAET

1.2
Posted before - but WildML 2017 summary is also awesome https://goo.gl/ZFtFVT

2. Datasets
→ YouTube-8M (https://goo.gl/nyP9gp): >7 million YouTube → videos annotated with 4,716 different classes
→ YouTube-Bounding Boxes (https://goo.gl/c3K6YY): 5 million bounding boxes from 210,000 YouTube videos
→ Speech Commands Dataset (https://goo.gl/TWsTi8): thousands of speakers saying short command words
→ AudioSet (https://goo.gl/TVA3LJ): 2 million 10-second → → YouTube clips labeled with 527 different sound events
→ Atomic Visual Actions (AVA) (https://goo.gl/Ba4U73): 210,000 action labels across 57,000 video clips
→ Open Images (https://goo.gl/2Xj8Xd): 9M creative-commons licensed images labeled with 6000 classes
→ Open Images with Bounding Boxes (https://goo.gl/qRkvMy): 1.2M bounding boxes for 600 classes
→ QuickDraw dataset (https://goo.gl/FSsfYm)

3.
Uber about genetic approach to neural networks - https://eng.uber.com/deep-neuroevolution/

#digest
#data_science
#deep_learning
#machine_learning

Читать полностью…

Spark in me

@vote Like the twitter repost idea?

Читать полностью…

Spark in me

Just found out about Facebook's fast text
- https://github.com/facebookresearch/fastText

Seems to be really promising

#data_science
#nlp

Читать полностью…

Spark in me

Wine3.0 - зарелизилась третья мажорная версия эмулятора системных вызовов Windows. Именно эта система лежит в основе портов бОльшей части старых игр на мак и линукс. Ну и на сегодняшний момент это единственный нормальный способ запустить на линуксе Майкрософт Офис и последний Фотошоп. Все время удивляюсь, как у ребят хватает энтузиазма уже больше десяти лет развивать этот продукт, мои большие поздравления команде!

В этом релизе практически полная совместимость с базовыми уровнями DirectX/3D 11 и поддержка Андроида.

https://www.winehq.org/news/2018011801

Читать полностью…

Spark in me

Nice presentation to learn about Semantic Segmentation
http://slides.com/vladimiriglovikov/title-texttitle-text/fullscreen#/0/5
https://www.youtube.com/watch?v=MYp3OwkiJAs

#data_science
#deep_learning

Читать полностью…

Spark in me

Internet digest
- Ben Evans - https://goo.gl/Cymhkf
- New post about chain effects in retail / TV / technology - https://goo.gl/gwuynK
- 39M smart speakers in the US https://goo.gl/nkvUc4
- US$1bn ticketing IPO in China - https://goo.gl/Zt1CmZ

Social Media
- FB updates its news feed algorithm to promote content you are more likely to interact with
https://newsroom.fb.com/news/2018/01/news-feed-fyi-bringing-people-closer-together/

Trivia
- Magnetic disks work after 30 years - https://goo.gl/oWoaWi
- Self-driving cars being DEPLOYED for SECOND time for one district with retired people - https://goo.gl/AKowqX

#internet
#digest

Читать полностью…

Spark in me

TF speech competition ended.
- https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/leaderboard
In my opinion it was a very interesting domain, but on day one it was apparent that there is a public repo with 87% accuracy. So I guess 90% is a decent improvement, but judging by team sizes - it is just stacking. Also in such competitions there is no chance in winning money. Also also - this was just blatant TF marketing.

New competitions
- https://www.kaggle.com/c/data-science-bowl-2018
-- This year it sucks - small prizes, small data, will be just stacking 100 Unets =( Last year I was too unexperienced to participate =(

- https://goo.gl/qXPUoG - Intel Movidius competition. It also sucks - because you have to use only limited types of hardware and software. Basically this is a marketing campaign

#data_science
#competitions

Читать полностью…

Spark in me

Next hobby project?

Something more community related - like making a pack of ML-themed stickers – 3
👍👍👍👍👍👍👍 75%

Satellite imaging => roads => road graphs – 1
👍👍 25%

GANs for specific domain search problem
▫️ 0%

Your idea (message me)
▫️ 0%

👥 4 people voted so far.

Читать полностью…

Spark in me

Amazing spoofers
- looks like they scrape domains periodically
- looks like they know basic domain registration timelines
- I chose to protect my whois information => they scrape websites for emails
- https://pics.spark-in.me/upload/26e127ea9c9aed8d004528f1d599defe.jpg

#internet

Читать полностью…

Spark in me

Interesting - somebody is developing linear regressing bindings for PyTorch
- https://twitter.com/i/web/status/950712702325936128

#data_science

Читать полностью…

Spark in me

Now I know how to make my remote learning rig perfect - add a hardware reboot watchdog
https://aminux.wordpress.com/2018/01/12/usb-watchdogs-opendev-vs-bitroleum/

#hardware

Читать полностью…

Spark in me

A nice post of ML predictions for 2018
- https://blog.goodaudience.com/ai-in-2018-for-researchers-8955df0caaf9

#data_science

Читать полностью…

Spark in me

Trick for image preprocessing - histogram equalization
- http://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_equalize.html

#cv

Читать полностью…

Spark in me

US$1 million prize US-citizen exclusive Kaggle challenge ... for just stacking Resnets?
- https://www.kaggle.com/c/passenger-screening-algorithm-challenge/discussion/45805

America is fucked up bad...

Also notice the shake-up and top scores
- Public https://goo.gl/2utoDC
- Private https://goo.gl/GXpnWe

#data_science
#sick_sad_worlds

Читать полностью…
Subscribe to a channel