Coding LLaMA 2 from scratch in PyTorch, with step by step explanation of KV Cache, Grouped Query Attention, Rotary Positional Embedding, RMS Normalization, SwiGLU and much more!
https://www.youtube.com/watch?v=oM4VmoabDAI
/r/deeplearning
https://redd.it/168onwq
Do you really need a strong Math ( and ML ) knowledge be a NLP engineer ?
Let me explain a bit. I come from a humanities bachelor's degree background, but with a strong passion for linguistics. I wanted to specialize in computational linguistics, but gradually I also became very interested in NLP and jobs related to NLP. That being said, I hope the repressed computer engineers don't show up now lol
I'm about to start a master's degree called “ Digital Humanities” but which is actually only about language technologies. The program includes various subjects like NLP, computational linguistics, data mining, programming, data analysis, etc. However, I know that the Machine Learning (ML) course is fundamental for NLP, but the university's ML course requires strong math foundations, designed for those who have a bachelor's degree in computer science or computer engineering. So, I had thought about giving it up and instead taking the course called “ Computational Intelligence and Deep Learning” that focuses more on topics like fuzzy logic and especially artificial neural networks, RNNs, etc., without requiring initial math foundations.
And maybe adding also an Algorithms class (a good class but not too advanced) to have an additional foundation for NLP.
And then I might study ML on my own through private courses like the one from Stanford on platforms like Coursera.
Or would it be better for me to study the math part (linear algebra, integral and differential calculus, functions) and attempt the ML exam? Keep in mind that I've already taken a statistics course and enjoyed it, but honestly, I don't have that much motivation to study math extensively, especially because I might invest so much effort for none since I might only find jobs like data linguist or computational linguist (given my background in humanistic informatics) where these strong math and ML knowledge are not necessary.
Certainly, my career goal in NLP isn't to engage in researching new algorithms and statistical models, I want to use more my linguistics knowledge in NLP but not only to do annotations.
I've noticed there are many people working more as "NLP engineers" many practical NLP tasks can be accomplished using existing libraries and tools without delving deep into the underlying mathematical concepts and who directly apply algorithms. So obviously you need t know algorithms and deep learning but not too much deep into math research right?
Or would it be better for me to just give up and focus solely on computational linguistics?
/r/LanguageTechnology
https://redd.it/165epjv
Getting data from physical circular chart.
/r/computervision
https://redd.it/162xdyo
Is CV evolving beyond bounding boxes?
Hi all - We (team of Stanford researchers) wrote a new blogpost on "Video Analysis Beyond Bounding Boxes" collecting some of our thoughts on the direction the CV field is heading.
We're actively researching&developing in this space so would love to hear some feedback on this vision for the future of CV and video analysis.
/r/computervision
https://redd.it/15ydds0
Your Neural Network Doesn't Know What It Doesn't Know
Hi everyone,
I made a repo trying to collect every high-quality source for Out-of-distribution detection, ranging from articles and talks for beginners to research papers at top conferences. It also has a primer if you are not familiar with the topic. Check it out and give it a star to support me if you find it helpful. Thanks a lot ;)
https://github.com/continuousml
​
https://preview.redd.it/3dsy0ameoxhb1.png?width=868&format=png&auto=webp&s=4a0c016ab9ad6baeb603bedac1d798572fc41152
/r/computervision
https://redd.it/15q8mx0
Looking for good learning sources around generative AI, specifically LLM
Are there any good video content sources that explains all the concepts associated with generative AI (ex: RL, RLHF, transformer, etc) from the ground up in extremely simple language (using analogies/stories of things that would be familiar to say a 10-12 year old)? Also would prefer channels which explain the concepts in a sequential manner (so that easy to follow) and make short and crisp videos
If yes, could you kindly comment below with the suggestions. If not, could you comment whether something like that would be useful to you and ideally why also?
Big thanks in advance 🙏
/r/deeplearning
https://redd.it/15hdu5v
D Simple Questions Thread
Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
Thanks to everyone for answering questions in the previous thread!
/r/MachineLearning
https://redd.it/15dnok8
Promptify 2.0: More Structured, More Powerful LLMs with Prompt-Optimization, Prompt-Engineering, and Structured Json Parsing with GPT-n Models! 🚀
Hello fellow coders and AI enthusiasts!First up, a huge Thank You for making Promptify a hit with over **2.3k+ stars on Github** ! 🌟Back in 2022, we were the first one to tackle the common challenge of uncontrolled, unstructured outputs from large language models like GPT-3. , and your support has pushed us to keep improving.Today, we're thrilled to share some major updates that make Promptify even more powerful
​
* **Unified Architecture 🧭**: Introducing Prompter, Model & Pipeline Solution
* **Detailed Output Logs 📔**: Comprehensive structured JSON format output within the log folder.
* **Wider Model Support 🤝:** Supporting models from OpenAI, Azure, Cohere, Anthropic, Huggingface and more - think of it as your universal language model adapter.
* **Robust Parser 🦸♂️**: Parser to handle incomplete or unstructured JSON outputs from any LLMs.
* **Ready-Made Jinja Templates 📝:** Jinja prompt templates for NER, Text Classification, QA, Relation-Extraction, Tabular data, etc.
* **Database Integration 🔗**: Soon, Promptify directly to Mongodb integration. Stay tuned!
* **Effortless Embedding Generation 🧬**: Generate embeddings from various LLMs effortlessly with the new update.
Check out the examples and take Promptify for a spin on GitHub. If you like what you see, we'd be honored if you gave us a star!
**Github**: [https://github.com/promptslab/Promptify](https://github.com/promptslab/Promptify)
Thank you again for your support - here's to more structured AI!
from promptify import Prompter,OpenAI, Pipeline
sentence = "The patient is a 93-year-old female with a medical..."
model = OpenAI(api_key)
result = pipe.fit(sentence, domain="medical", labels=None)
Output
[ {"E": "93-year-old", "T": "Age"}, {"E": "chronic right hip pain", "T": "Medical Condition"}, {"E": "osteoporosis", "T": "Medical Condition"}, {"E": "hypertension", "T": "Medical Condition"}, {"E": "depression", "T": "Medical Condition"}, {"E": "chronic atrial fibrillation", "T": "Medical Condition"}, {"E": "severe nausea and vomiting", "T": "Symptom"}, {"E": "urinary tract infection", "T": "Medical Condition"}, {"Branch": "Internal Medicine", "Group": "Geriatrics"}, ]
​
/r/LanguageTechnology
https://redd.it/15dfttb
D Simple Questions Thread
Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
Thanks to everyone for answering questions in the previous thread!
/r/MachineLearning
https://redd.it/1518fj5
How essential are strong math and statistics skills for NLP Engineers?
My initial belief was that math and stats would be extremely vital for this field, but I'm seeing some mixed information online. Ironically, Google Bard also was stating that math and stats are not vital. (Though I can't help but think that this is inaccurate).
Can anyone confirm and give some feedback? What are the needed core skills?
/r/LanguageTechnology
https://redd.it/150sew0
I Hit 700K Views in 3 Months with my Opensource Shorts automation framework, ShortGPT
/r/computervision
https://redd.it/150mzll
A Comparison of Large Language Models (LLMs) in Biomedical Domain
https://provectus.com/blog/comparison-large-language-models-biomedical-domain/
/r/LanguageTechnology
https://redd.it/14x5cge
P TomoSAM, a 3D Slicer extension using SAM to aid the segmentation of 3D data from tomography or other imaging techniques
We are a team at NASA working on modeling the material response of Thermal Protection Systems (TPS). We developed this tool to streamline the segmentation process of micro-tomography data, a necessary step before using the physics solvers within PuMA. However, we believe that TomoSAM is general enough to be useful in other fields, such as medical imaging. The release is fully open-source and you can find more information in the links below:
TomoSAM extension within 3D Slicer
🔗 Github: https://github.com/fsemerar/SlicerTomoSAM
🔗 YouTube tutorial: https://www.youtube.com/watch?v=4nXCYrvBSjk
🔗 Publication: https://arxiv.org/abs/2306.08609
🔬 TomoSAM combines the power of Segment Anything Model (SAM), a cutting-edge deep learning model, with the capabilities of 3D Slicer, a software platform useful for visualization and segmentation.
💡 SAM is a promptable deep learning model developed by Meta AI that can identify objects and generate image masks in a zero-shot manner, requiring only a few user clicks.
⚙️ This integration reduces the need for laborious manual segmentation processes, saving significant time and effort for researchers working with volumetric data.
📄 Our paper outlines the methodology and showcases the capabilities of TomoSAM.
TomoSAM's usage, architecture, and communication system
Feel free to reach out if you have any questions or comments! 🚀
/r/MachineLearning
https://redd.it/14sroe6
Additional Resources
Hi everyone,
After an [extended blackout](https://old.reddit.com/r/MachineLearning/comments/146ue8q/rmachinelearning_is_joining_the_reddit_blackout), we've decided to reopen the sub since it became pretty clear that if we didn't then the [admins would likely replace us and just reopen](https://kbin.social/m/machinelearning/t/68966/r-MachineLearning-finally-received-a-warning-from-u-ModCodeOfConduct) anyway.
We know lots of you contacted us during the blackout trying to understand how to stay up to date with the latest ML research, news, and discussions. For that reason we are providing additional resources below that either exclusively focus on ml or often discuss ml:
* ~~[taggernews](http://www.taggernews.com/tags/ai/machine%20learning/) - an ml powered classifier for [hackernews](https://news.ycombinator.com/) posts tagged as ai/ml~~
* use [hackernews](https://news.ycombinator.com/) RSS feeds like this one to keep up with posted research https://hnrss.org/newest?q=arxiv+OR+cvpr+OR+aaai+OR+iclr+OR+icml+OR+neurips+OR+emnlp+OR+acl
* reddit has rss feeds for just about any link by just adding `.rss` to the end of the url, so you can follow r/machinelearning using https://reddit.com/r/machinelearning.rss or even follow all posts that link to arxiv using https://reddit.com/domain/arxiv.org.rss
* [lobste.rs/t/ai](https://lobste.rs/t/ai) - posts tagged as ai on [lobste.rs](https://lobste.rs/t/ai)
* [m/machinelearning](https://kbin.social/m/machinelearning/) - a nascent space for ml discussion on [kbin](https://kbin.social)
You can also find a more thorough list of subs with additional resources here: https://sub.rehab
If you have additional resources you think would be useful, please comment below and we can add them to the list.
EDIT: removed taggernews since its long defunct
/r/MachineLearning
https://redd.it/14ionyi
My master's research has beaten state-of-the-art R. I am not sure what to do about it D.
Hello,
My research (Dissertation for MSc in AI) on applying LLMs to drug binding affinity prediction has beaten previous state-of-the-art in single sequence prediction tasks.
My method yields a correlation of 0.7079 for SMILES and 0.7007 for AA-pockets, which improves upon the previous state-of-the-art correlations of 0.485 and 0.501, respectively. The prior state-of-the-art is described and documented in the paper: "Improved Protein−Ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference" -> https://pubs.acs.org/doi/10.1021/acs.jcim.0c01306.
However, I don't really know what to do with this information. I did not have a supervisor who lead me with this research who I can discuss this with. This is because my supervisor was in a different field to ML (my university assigned you a supervisor semi-randomly, and I was given someone who focuses on string algorithms) so we agreed and I went down my own path (for the past year) as I really wanted to undertake LLM research. Unfortunately then, no one I know is knowledgeable in the field. My work is currently being marked (I submitted it 2 weeks ago) and I won't get any feedback until November.
My ideas are to put it on ArXiv, but that's it really. As a MSc student I'm still pretty new to research so I'm unsure what to do next. Any advice on what I should do next would be useful
The GitHub to my work can be found here (still a bit of a WIP) https://github.com/jacobmcasey/large-language-models-for-protein-ligand-binding/tree/main
/r/MachineLearning
https://redd.it/169mdnf
Introducing Code Llama, a state-of-the-art large language model for coding
https://ai.meta.com/blog/code-llama-large-language-model-coding/
/r/deeplearning
https://redd.it/1605opp
Fast CV App: Cross Platform Computer Vision Using Multiprocessing
**Why is this relevant to computer vision?**
In my project I show that a pure python app that does 1080p 30fps on both Windows and Mac is possible. It's good for prototyping, for testing (especially if you can just go to a C variant and make it really fast) and I hope in the future, for making "serious" apps.
I'm sharing this because I have never seen anybody talk about using multiprocessing, data compression, and a pure python GUI packaged to windows/mac in the context of computer vision. This might be due to people on reddit/discord/stack exchange just not talking about it but I really do think that this information is just locked to the industry professionals.
This is probably because people don't need it if they have a team of people working on a qt frontend and have another team working on computer vision specifically.
I haven't seen anybody working on this information publicly. All the good stuff is closed source in big corporations:
* examples: Mediapipe's slack channel require a google email: https://github.com/google/mediapipe/issues/779#issuecomment-1101212500
* I DEFINITELY do not have access to instagram filters or very specifically how they apply their filter processing. What I do know is that their more complex filters are not 30 fps at all on mobile phones.
* I can't recall off the top of my head other industry standard pose estimation apps that have open source code/documentation...
**What is my project?**
Here I show with Fast CV App that it is possible and that there is room for improvement. For example, I could "blit buffer" to a shared datatype instead of uploading the whole frame to shared memory, or even convert to YUV so that blit buffer on the kivy frontend is even faster, etc etc.
**How it works**
I gave up on threading because I just could not get mediapipe threading on 1080p frames to hit 30fps. As in the mediapipe docs, it actually drops frames to maintain framerate. I go one step further and actually analyze each frame. I do that by cheating and reading the future frames using opencv/ffmpeg, sending future frames to a multiprocessing subprocess to analyze, then recieve frames in kivy to display at the right time. This is where data compression kicks in, because inter-process communication was hell on this pipeline, taking up ~20-30ms which basically negated the benefits of multiprocessing. This delay made it so that instead of 3-4 subprocesses being sufficient, you needed to run ~6-8 subprocesses which is just not ok. I was stumped on this problem for ~3 months until I realized I could use a compression library like blosc to make the 1080p frames I was sending and receiving go from 6MB to 3.8MB, spending ~5ms on IPC on a task that previously took ~20-30ms. In hindsight, I think this step is actually a basic solution/ probably an industry standard, but all the multiprocessing tutorials never talked about compression so I never thought about it.
A couple tricks/hints:
* try/except blocks using a print(<error message here>, flush=True) was pretty good at catching silent errors from multiprocessing subprocesses
* start your multiprocessing code in AFTER an "if name == main" check or a similar guard so that you don't infinitely spawn subprocesses.
**Fast CV App links**
Github link:
https://github.com/AccelQuasarDragon/FastCVApp
Multiprocessing/Threading Analysis Video:
https://youtu.be/7-UdBUSfafo
Getting Started:
https://youtu.be/YnhHaKEx7pY
Thanks for your time and have a great day, hope this helps even one person out. Good luck!
/r/computervision
https://redd.it/15wdp3o
OpenAI Notebooks which are really helpful.
The OpenAI cookbook is one of the most underrated and underused developer resources available today. Here are 7 notebooks you should know about:
1. Improve LLM reliability:
https://github.com/openai/openai-cookbook/blob/main/techniques\_to\_improve\_reliability.md
2. Embedding long text inputs:
https://github.com/openai/openai-cookbook/blob/main/examples/Embedding\_long\_inputs.ipynb
3. Dynamic masks with DALLE:
https://github.com/openai/openai-cookbook/blob/main/examples/dalle/How\_to\_create\_dynamic\_masks\_with\_DALL-E\_and\_Segment\_Anything.ipynb
4. Function calling to find places nearby:
https://github.com/openai/openai-cookbook/blob/main/examples/Function\_calling\_finding\_nearby\_places.ipynb
5. Visualize embeddings in 3D:
https://github.com/openai/openai-cookbook/blob/main/examples/Visualizing\_embeddings\_in\_3D.ipynb
6. Pre and post-processing of Whisper transcripts:
https://github.com/openai/openai-cookbook/blob/main/examples/Whisper\_processing\_guide.ipynb
7. Search, Retrieval, and Chat:
https://github.com/openai/openai-cookbook/blob/main/examples/Question\_answering\_using\_a\_search\_API.ipynb
Big thanks to the creators of these notebooks!
/r/deeplearning
https://redd.it/15rihgo
D How to stay on the cutting edge of applied ML/AI while doing my PhD?
A lot of my PhD work will be in using different types of ML/NN approaches to characterizing problems in my field. It's kind of weird, since for my undergrad I came from a more traditional science background where we research off papers that were written like 2-20 years ago. Since a lot of these architectures and whatever are updating so fast, I wanted to see if there's a good way to keep up with the latest information so my work wouldn't be outdated by the time I publish. Is there a general workflow that those of you in the field follow in regards to this?
/r/MachineLearning
https://redd.it/15lnt4g
resources to learn about training LLMs?
I'd like to train a mini-LLM on a CPU just to get some experience with LLM training. Do y'all have any resources/links to relevant tutorials? I've looked around myself, but I couldn't find too many in-depth tutorials. I'm also interested in building my own toy LLM from scratch, just for better understanding.
/r/deeplearning
https://redd.it/15j3ls5
D NeurIPS 2023 Paper Reviews
NeurIPS 2023 paper reviews are visible on OpenReview. See this tweet. I thought to create a discussion thread for us to discuss any issue/complain/celebration or anything else.
There is so much noise in the reviews every year. Some good work that the authors are proud of might get a low score because of the noisy system, given that NeurIPS is growing so large these years. We should keep in mind that the work is still valuable no matter what the score is.
/r/MachineLearning
https://redd.it/15fo7td
Attention Is Off By One
https://www.evanmiller.org/attention-is-off-by-one.html
/r/deeplearning
https://redd.it/158xmbw
YoloV8 Body Pose Estimation TensorRT C++ Tutorial (link in comments)
/r/computervision
https://redd.it/156v3e5
Meta/Facebook just release Llama 2
https://huggingface.co/models?other=llama-2
/r/LanguageTechnology
https://redd.it/1533kuf
Questions about Transformers
I just started reading about Transformers model. I have barely scratched the surface of this concept. For starters, I have the following 2 questions
1. How positional encoding are incorporated in the transformer model? I see that immediately after the word embedding, they have positional encoding. But I'm not getting in which part of the entire network it is being used?
2. For a given sentence, the weight matrices of the query, key and value, all of these 3 have the length of the sentence itself as one of its dimensions. But the length of the sentence is a variable, how to they handle this issue when they pass in subsequent sentences?
/r/computervision
https://redd.it/14xutf2
CoViz - A Neural Network Playground built with WebGPU🔥(Compute) and ReactFlow
/r/deeplearning
https://redd.it/14ri42r
LLMOps.space - curated resources related to LLM & LLMOps
LLMOps space is a community for LLM enthusiasts, researchers, and practitioners. The community will focus on content, discussions, and events around topics related to deploying LLMs into production. 🚀
This includes-
✅ 50+ LLMOps companies
📅 Upcoming events
📚 Educational resources
👩💻 Open-source LLM modules
💰 Funding news
Check out the LLMOps community website-
http://llmops.space/
/r/deeplearning
https://redd.it/14qlpzi
Realistic personal projects to demonstrate knowledge of 3D computer vision
I currently work as an ML engineer with a focus in computer vision. I'm interested in pursuing jobs related to photogrammetry/3D reconstruction/computer graphics and am looking for advice on how to land these kinds of jobs. I have a Masters Degree, and, ideally, would not want to go back for a PhD.
I have picked up Multi-view Geometry by Zisserman and plan on working through the book. However, I'm also interested in gaining more hands-on/practical experience in this area. What are some realistic projects I could work on which would showcase my knowledge of 3D vision?
/r/computervision
https://redd.it/14mciux