D Simple Questions Thread
Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
Thanks to everyone for answering questions in the previous thread!
/r/MachineLearning
https://redd.it/140fmf3
TokenMonster Ungreedy ~ 35% faster inference and 35% increased context-length for large language models (compared to tiktoken). Benchmarks included
**From the** [**GitHub**](https://github.com/alasdairforsythe/tokenmonster)**:**
TokenMonster is an ungreedy tokenizer and vocabulary builder, outperforming tiktoken by 35%. In fact, TokenMonster's smallest 24000 vocabulary consistently uses less tokens than tiktoken's largest 100256 vocabulary to tokenize the same text. Save the tokens! [See benchmark](https://github.com/alasdairforsythe/tokenmonster/blob/main/benchmark).
Given a text dataset, a vocabulary-size and a maximum-token-length, TokenMonster selects the tokens that optimally represent your dataset at that vocabulary size. It can do this at reasonable speed (within 24 hours) on server hardware, at a cost of around $8. [Prebuilt vocabularies](https://github.com/alasdairforsythe/tokenmonster#prebuilt-vocabularies) are provided, as well as tools to train your own vocabularies & native implementations in Go, Python & Javascript for tokenization and detokenization using the prebuilt or your own vocabularies.
You can [test TokenMonster in your browser here](https://bot.co/tokenmonster/), tokenizing live in native Javascript.
TokenMonster is a novel approach to tokenization with broad-ranging use potential, but its primary motivation is to increase the inference speed and context-length of large language models. By selecting better tokens, text can be represented with 35% less tokens compared to other modern tokenizing methods, increasing the speed of inference, training and the length of text by 35%. The code-optimized tokenizers do even better, [see for yourself](https://bot.co/tokenmonster/).
I also believe that TokenMonster vocabularies will improve the comprehension of Large Language Models. For more details see [The Philosophy of Tokenization](https://github.com/alasdairforsythe/tokenmonster#the-philosophy-of-tokenization).
Features
* Outperforms other tokenization algorithms ([benchmark](https://github.com/alasdairforsythe/tokenmonster/blob/main/benchmark))
* Longer text generation at faster speed
* Selects the optimal vocabulary
* Ungreedy
* Supports UTF-8, UTF-16 and binary
* Successfully identifies words, subwords, common phrases and figures of speech by itself
* Works with HTML tags, sequential spaces, tabs, etc. without wasting context
* Averages 5.5 characters per token
* No GPU needed
/r/LanguageTechnology
https://redd.it/140evta
PlaNeRF: SVD Unsupervised 3D Plane Regularization for NeRF Large-Scale Scene Reconstruction
By: Fusang Wang, Arnaud Louys, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou tl;dr: SVD based plane regularization+SSIM supervision
https://arxiv.org/pdf/2305.16914.pdf
>Neural Radiance Fields (NeRF) enable 3D scene reconstruction from 2D images and camera poses for Novel View Synthesis (NVS). Although NeRF can produce photorealistic results, it often suffers from overfitting to training views, leading to poor geometry reconstruction, especially in lowtexture areas. This limitation restricts many important applications which require accurate geometry, such as extrapolated NVS, HD mapping and scene editing. To address this limitation, we propose a new method to improve NeRF’s 3D structure using only RGB images and semantic maps. Our approach introduces a novel plane regularization based on Singular Value Decomposition (SVD), that does not rely on any geometric prior. In addition, we leverage the Structural Similarity Index Measure (SSIM) in our loss design to properly initialize the volumetric representation of NeRF. Quantitative and qualitative results show that our method outperforms popular regularization approaches in accurate geometry reconstruction for large-scale outdoor scenes and achieves SoTA rendering quality on the KITTI-360 NVS benchmark.
​
https://preview.redd.it/s9vwi4g0103b1.png?width=1408&format=png&auto=webp&v=enabled&s=2e6b4d2ecde7e451efcbe7ab549ca013caa64544
/r/computervision
https://redd.it/13vo6aw
I made a free online text-to-speech tool as an implementation of Meta's Massively Multilingual Speech (MMS) – Supports 1144 Languages and Dialects!
https://www.mmstts.com
/r/LanguageTechnology
https://redd.it/13qxvtt
(Pt. 3) Neural Networks Temporal Logic Verification with STL Net
https://youtube.com/watch?v=Jts45lJKiRI&feature=share
/r/deeplearning
https://redd.it/13o3f4c
Domain specific chatbot. Semantic search isn't enough.
Hi guys, I'm struggling to find a reliable solution to this specific problem.
I have a huge dataset with chat conversations, about several topics. I want to ask questions and retrieve information about these conversations in a chatbot way.
I have tried semantic search with chat gpt to answer questions about these conversations. The problem is that semantic search only returns top similar sentences, and doesn't ‘read’ all conversations, that’s not enough to answer generic questions, just very specific ones. For example, if I ask “What are these people talking about person X?” it will return only the top sentences (through semantic similarity) and that will not tell the whole story. The LLM’s models have a limit of tokens, so I can’t send the whole dataset as context.
Is there any approach to giving a reliable answer based on reading all the messages?
Any ideas on how to approach this problem?
/r/LanguageTechnology
https://redd.it/13grik5
RelPose++: Recovering 6D Poses from Sparse-view Observations
https://amyxlase.github.io/relpose-plus-plus/
By: Amy Lin, Jason Y. Zhang, Deva Ramanan, Shubham Tulsiani, Carnegie Mellon University
​
>Estimating 6D Camera Poses from Sparse Views. RelPose++ extracts per-image features (with positionally encoded image index and bounding box parameters) and jointly processes these features using a Transformer. We used an energy-based framework to recover coherent sets of camera rotations by using a score-predictor for pairs of relative rotations. RelPose++ also predicts camera translations by defining an appropriate coordinate system that decouples the ambiguity in rotation estimation from translation prediction. Altogether, RelPose++ is able to predict accurate 6D camera poses from 2-8 images.
/r/computervision
https://redd.it/13co28p
RMultiple gpus
I want to train a large network and I have 4 gpus on the server. My model is trained on just one gpus and it can’t be trained because of “cuda out of memory” error. How can I train my model on all available gpus? Is it complex or easy to do so? Do you have any ideas to solve this error?
/r/deeplearning
https://redd.it/13awigr
I did a beginners mistake
Sharing this so maybe it'll save at least one of you the same trouble...
I used to use open cv a lot, and open cv resize function takes the input shape as (width, height).
I'm doing my graduation internship. I've been doing benchmarking in SOTA segmentation models and trying stuff to develop the company's algorithm, I've managed to improve 8% performance (mIoU) and -30% inference time.
Only problem: my tutor did some trainings with his code, just to verify, and his results were better than mine (with my architecture implementation). Everything was the same, loss function, hyperparameters, data augmentation... everything. But one thing. I used torch vision resize function, and guess what, for them it's (height, width).
I was training my models on a fucked up dataset cuz I inverted height and width.
The good news is thatI have more than 10% improvement now... the bad news is that tomorrow I have to tell my director that I fucked up from the start and that I have to redo the benchmarks.
I feel like shit tbh, stupidest mistake ever
/r/computervision
https://redd.it/1360a14
Your First Recommendation System: From Data Preparation to ML Debugging and Improvements Assessment
https://towardsdatascience.com/your-first-recommendation-system-from-data-preparation-to-ml-debugging-and-improvements-assessment-eb628573436
/r/deeplearning
https://redd.it/131bigx
Training YOLOV8 on Custom Dataset in batches of 128!
yolo task=detect mode=train model=yolov8x.pt epochs=100 batch=128 data=data/trash_piles.yaml device=\'0,1\'
Training with some rather large batches on Dual RTX 6000 Ada.
​
https://preview.redd.it/8np0jv15hjva1.png?width=1310&format=png&auto=webp&v=enabled&s=d68a23c96a0e0b788bc2512634cfc66ffe64d45e
/r/deeplearning
https://redd.it/12vr6ag
R 🚀🧠 Introducing 3 New LoRA Models Trained with LLaMA on the OASST Dataset at 2048 seq length! 📊🔥
We are super excited to announce the release of 3 brand new LoRA models trained using the LLaMA model! These state-of-the-art models have been trained on the full 2048 sequence length for 4 epochs, using the OASST dataset. 🌐💡
Shoutout to LAION and Open-Assistant for giving us early research access to the dataset 🎉
Checkout this and more over on our serpai/chat-llama">FREE gumroad if you want to sign up for future releases and guides as well.
Checkout out our website for a post with more info: https://serp.ai/chat-llama/
\- LoRA-7B 🚀
\- LoRA-13B 💥
\- LoRA-30B 🌌
We can't wait to see what amazing things you'll be able to accomplish with these new models! 🌟 So, feel free to share your experiences, ask questions, or discuss the potential applications for these models. 🧪🔬
Happy experimenting, and let's revolutionize the world of machine learning together! 💻🌍
Checkout our github for LLaMA LoRA training repos, inferencing guis, chat plugins (that you can also use with llama), and more.
Cheers! 🍻
/r/MachineLearning
https://redd.it/12rds2h
How we picked a vector database for our open-source app
https://stablecog.com/blog/the-best-vector-database-for-stablecogs-semantic-search
/r/deeplearning
https://redd.it/12krpup
Why Do You Think a Model Like GPT-4 Works So Well in non-English Languages?
I don't know what other "3rd party data" GPT-4 was trained on, but I wonder if part of the success of these models is that languages when projected into a large embedding space are fairly isomorphic. (Generally) humans have universal aspects: senses, family structures, they are diurnal, etc. There's variation in how you apologize or express social closeness in each language, but human beings have to do a lot of similar things like apologize or express social distance.
There's some direct evidence of this in language pragmatics, where an English rhetorical taxonomy that is useful for many NLP applications, has all been bootstrapped into Arabic and functioned well for clustering. The idea is that the sociocultural moves we make in English (sounding more or less certain, expressing emotion, distinguishing between the abstract and concrete) are made in all languages--different at the lexical level but the same at the lexicogrammatical (there's a Russian version that works well in testing but there's no publicly available papers on it).
What's your take on this?
/r/LanguageTechnology
https://redd.it/12glwwh
How Open Ai’s Andrej Karpathy Made One of the Best Tutorials in Deep Learning
I want you to check 0ssamaak0/how-open-ais-andrej-karpathy-made-one-of-the-best-tutorials-in-deep-learning-e6b6445a2d05">my review on Andrej Karpathy amazing work on explaining how GPT is built
GitHub Repo for code & more details
​
https://preview.redd.it/z204zwtzn44b1.png?width=720&format=png&auto=webp&v=enabled&s=58f7ff9cdbe418064d77162d71386f6037669e9f
/r/deeplearning
https://redd.it/141282u
D Simple Questions Thread
Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
Thanks to everyone for answering questions in the previous thread!
/r/MachineLearning
https://redd.it/13nx7t0
Research directions not related to include optimization of Transformers (compute, memory, etc.)?
Honestly, I'm a bit fed up with the strong focus on squeezing the last bit of performance out of transformers. To lighten my mood I wanted to ask the community, if they've come across something interesting/different in their area.
For example, I've found "Thinking Like Transformers" by to be an enlightening fresh take.
/r/LanguageTechnology
https://redd.it/13t2dzt
python tools to load, save, split, and convert computer vision datasets | link in comment
/r/computervision
https://redd.it/13kvq5n
D Simple Questions Thread
Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
Thanks to everyone for answering questions in the previous thread!
/r/MachineLearning
https://redd.it/13as0ej
DirectStorage - Loading data to GPU directly from the SSD drive, almost without using CPU
Hello forum!
Is it possible in major frameworks (TF or PyTorch) to load all data to GPU directly from a fast SSD disc, without using the CPU? Is it possible today? I read about it 2-3 years ago. As far as I remember, Linux was the first to deliver such an option.
I am talking about DirectStorage :
Nvidia: How to connect GPUs direct to SSDs for a speed boost • The Register
https://www.theregister.com/2022/03/14/nvidia\_gpu\_data/
GPUDirect Storage: A Direct Path Between Storage and GPU Memory | NVIDIA Technical Blog
https://developer.nvidia.com/blog/gpudirect-storage/
DirectStorage API: Does it allow direct SSD-to-GPU data streaming? | ResetEra
https://www.resetera.com/threads/directstorage-api-does-it-allow-direct-ssd-to-gpu-data-streaming.278642/
​
Is it possible today? I am asking because I am thinking about a fast disc, like Samsung 990 PRO 2TB.
I have read that NVME is needed, SATA3 is not sufficient.
​
Regards
/r/deeplearning
https://redd.it/13aks7g
Jetson/DeepStream Learning Resources
I recently got into the world of Nvidia DeepStream and Jetson modules and I'm finding it difficult to find solid explanations of how all of this stuff works together. Most of the videos on YouTube are marketing stuff, 2 hour long seminars, or people just flashing a Jetson module and then opening the demo apps.
I am a self taught python programmer of about 8 years but came from the 3D world so have a pretty abstract understanding of ML. I am really wanting to understand this stuff on a lower-level but I feel like I am getting tripped up on the VAST amount of terms and tools out there in the Nvidia CV space.
So does anyone know of any really good resources for learning this stuff. I'm more of a visual learner so videos would be great but well done documentation can sometimes be just as good.
I know there's the Nvidia DLI courses but most cost money and seem pretty pricey. Is it worth it or is it just more dry talking that leaves you more confused after? Any specific ones that people would recommend?
I would really appreciate any advice anyone can give. Feel free to dump as many links as you have saved lol. Sorry for the long post. Thanks in advance!
/r/computervision
https://redd.it/13867rb
How to get started in a language not that common, not in Latin script?
Hi all,
I have been trying to work in NLP bettering for Gujarati and am at my wit’s end in how to begin/contribute.
The language is not exactly in a script that is common, and there’s a lot more that can be done here. I need help on what the right way/place to contribute would be.
My skills - English and Gujarati, Python, ML and NLP
Interests - trying to uplift Gujarati translation/audio/generative models from where it is today!
/r/LanguageTechnology
https://redd.it/13403we
“Track-Anything”: video object tracking & segmentation tool
https://github.com/gaomingqi/Track-Anything
Track-Anything is a flexible and interactive tool for video object tracking and segmentation. It is developed upon Segment Anything, can specify anything to track and segment via user clicks only. During tracking, users can flexibly change the objects they wanna track or correct the region of interest if there are any ambiguities. These characteristics enable Track-Anything to be suitable for:
Video object tracking and segmentation with shot changes.
Visualized development and data annnotation for video object tracking and segmentation.
Object-centric downstream video tasks, such as video inpainting and editing.
/r/computervision
https://redd.it/12z1av5
Segment Anything Model (SAM) explained in detail
Here is a video explaining the latest SAM model from Meta AI. It covers the model training, data engine, SA-1B data collection and finally the results. https://youtu.be/qa3uK3Ewd9Q
Hope its useful.
/r/deeplearning
https://redd.it/12ssm42
Reminder: Use the report button and read the rules!
/r/MachineLearning
https://redd.it/120f4oy
D Simple Questions Thread
Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
Thanks to everyone for answering questions in the previous thread!
/r/MachineLearning
https://redd.it/12gls93
Meta's new Segment Anything Model Explained
https://youtu.be/bx0He5eE8fE
/r/deeplearning
https://redd.it/12dn9t5