Good replacement for Tensorflow's Object detection API
The TF Object detection api has been deprecated for a while now, but I really liked the fact that it provided a standardized interface to train and test multiple model architectures. I was wondering if there was a popular alternative today?
I know the new big boy in object detection is YoloV8 so maybe I should just switch to using that model and ecosystem instead.
Edit: Never mind, Ultralytics and yolov8 slaps, I will be using that from now on.
/r/computervision
https://redd.it/10uq4c5
How to visualize CNN feature maps?
I have been working on CNN but cant figure how to visualize feature maps between layers.
/r/deeplearning
https://redd.it/10q44ld
D Simple Questions Thread
Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
Thanks to everyone for answering questions in the previous thread!
/r/MachineLearning
https://redd.it/10cn8pw
[P] paper-hero: Yet Another Paper Search Tool
Hi guys, thanks for reading this post. I built a simplistic paper search tool that integrates ACL Anthology, arXiv API, and DBLP API.
Github address: [Spico197/paper-hero](https://github.com/Spico197/paper-hero)
**Motivation:** I'm majoring NLP and I'd like to search for papers with "Event Extraction" as titles in specific proceedings (e.g. ACL, EMNLP).
**Challenge:** There are lots of search tools and APIs, but few of them provide field-specific searches, like authors, titles, abstracts, and venues.
**Methodology:** I integrate ACL Anthology, arXiv API, and DBLP API, and provide a two-stage search toolkit, which first stores target papers via the official fuzzy search API, and then matches specific fields.
**Advantages:** This tool satisfies my need to stockpile papers and it can dump checklists in markdown format, or complete paper information in jsonl. AND and OR logics are supported in search queries.
**Limitations:** This tool is based on simple string matching, so you have to know some terminologies in the target fields.
You are warmly welcome to have a try and feel free to drop me an issue!
from src.interfaces.aclanthology import AclanthologyPaperList
from src.utils import dump_paper_list_to_markdown_checklist
if __name__ == "__main__":
# use `bash scripts/get_aclanthology.sh` to download and prepare anthology data first
paper_list = AclanthologyPaperList("cache/aclanthology.json")
ee_query = {
"title": [
# Any of the strings below is matched
["information extraction"],
["event", "extraction"], # title must include `event` and `extraction`
["event", "argument", "extraction"],
["event", "detection"],
["event", "classification"],
["event", "tracking"],
["event", "relation", "extraction"],
],
# Besides the title constraint, venue must also meet the needs
"venue": [
["acl"],
["emnlp"],
["naacl"],
["coling"],
["findings"],
["tacl"],
["cl"],
],
}
ee_papers = paper_list.search(ee_query)
dump_paper_list_to_markdown_checklist(ee_papers, "results/ee-paper-list.md")
​
[markdown checklist](https://preview.redd.it/myy4kbut15da1.png?width=2038&format=png&auto=webp&v=enabled&s=4fc3cacedd22bf6290bef3d94ec00bdfe16f61c7)
/r/MachineLearning
https://redd.it/10gp7rm
Automatic generation of image-segmentation mask pairs with StableDiffusion
/r/computervision
https://redd.it/107h6at
Train YOLOv8 ObjectDetection on Custom Dataset Tutorial
/r/computervision
https://redd.it/108616o
VizWiz Launches 4 AI Challenges to help blind/low vision community
Greetings!
We are pleased to announce the fourth annual VizWiz Grand Challenge workshop, which will be held in conjunction with CVPR 2023. The workshop is running 4 AI Challenges to drive the development of assistive technologies for people who are blind or low-vision. Please share this post with those who might be interested in participating.
This workshop is motivated in part by our observation that people who are blind have relied on (human-based) visual assistance services to learn about images and videos they capture for over a decade. We introduce visual question answering, few shot recognition, and object localization dataset challenges for the AI community to represent authentic use cases. A few more details:
· Friday, May 5: submissions of algorithm results due to the evaluation server
· Monday, June 19: results will be announced at the VizWiz Grand Challenge workshop at CVPR 2023
Visual Question Answering (VQA) Challenge here
· VQA Answer Grounding Challenge here
· Few-Shot Object Recognition Challenge here
· Salient Object Detection Challenge here
We are looking forward to your participation in the Challenges this year!
/r/computervision
https://redd.it/10anp57
Other than "Multiple View Geometry in Computer Vision" by Hartley & Zisserman, what are the most essential books(!) for 3D Vision?
I love the book by Hartley & Zisserman and was wondering if there are other, similarly essential books for someone interested in getting into 3D Vision. Any suggestions?
/r/computervision
https://redd.it/10ashym
Photorealistic human image editing using attention with GANs
/r/computervision
https://redd.it/10bw49g
Computer Vision News, the magazine of the algorithm community - January 2023
Dear all,
Here is Computer Vision News of January 2023.
It includes reviews of 2 Best Paper Award winning research papers.
Read 44 pages about AI, Deep Learning, Computer Vision and more - with code!
Read online version for free (recommended)
PDF version
Free subscription on page 44.
Enjoy!
https://preview.redd.it/c0q3fax2k7ca1.jpg?width=400&format=pjpg&auto=webp&v=enabled&s=686185794db8bad40417f77399de94bd5edda595
/r/computervision
https://redd.it/10cjrad
[P] I built Adrenaline, a debugger that fixes errors and explains them with GPT-3
/r/MachineLearning
https://redd.it/106q6m9
[OC] Revision of my last Country Distribution + EU aggregated into one slice
/r/dataisbeautiful
https://redd.it/106j8l0
Discussion Is there any alternative of deep learning ?
Increasingly deep learning is becoming the default face of modern AI. So my question is are there any other machine learning theories or ideas different from deep learning which have potential to be big in the future ?
/r/MachineLearning
https://redd.it/105syyz
My Brother told me that sum of all natural numbers is -1\12 , I didn't believe. Then He showed me How Ramanajun did it [ YouTube ] I was not Satisfied, and tried it myself by another method and it's variations . Can you guys review it ?
/r/mathpics
https://redd.it/1033426
Fine tuning mt5
How do I fine-tune an MT5 model for generating Bengali paraphrases? I have enough datasets but I can't find a working script to fine-tune an MT5 model.
/r/LanguageTechnology
https://redd.it/10rvura
Easily Build Your Own GPT from Scratch using AWS: A Comprehensive Guide for Domain Adaptation
🔥🤖Get ready to train your own GPT-2 model from scratch using AWS SageMaker!🤖🔥
This comprehensive guide will take you through the entire process of creating a custom-built GPT-2 model, tailored to your specific domain or industry. 💻
You'll learn how to acquire and prepare raw data, create custom vocabularies and tokenizers, pre-train large language models, and evaluate the performance of your custom model. 📈
Not only that, but you'll also delve into the intricacies of training a GPT-2 model to generate cohesive news articles related to the COVID-19 pandemic! 🦠
And the best part? It comes with 9 Jupyter notebooks and all the necessary Python scripts to help you get started right away! 🚀
You'll also gain a solid understanding of key concepts like generative AI, foundational models, language alignment, and prompt engineering with a focus on GPT. 💡 https://tinyurl.com/hvrjkm5r
/r/LanguageTechnology
https://redd.it/10ohy1m
The ChatGPT Cheat Sheet
😁 Happy to introduce one of the most comprehesive ChatGPT cheat sheets: a 30 pg. paper highlighting various prompts to manage ChatGPT for generating text. The document not only highlights what ChatGPT can generate but also how it can generate it! Here is the TOC:
1. NLP Tasks
2. Code
3. Structured Output Styles
4. Unstructured Output Styles
5. Media Types
6. Meta ChatGPT
7. Expert Prompting
Google Doc: https://drive.google.com/file/d/1OcHn2NWWnLGBCBLYsHg7xdOMVsehiuBK/view?usp=share\_link
/r/LanguageTechnology
https://redd.it/10k67l1
DensePose From WiFi
By Jiaqi Geng, Dong Huang, Fernando De la Torre
https://arxiv.org/abs/2301.00250
>Advances in computer vision and machine learning techniques have led to significant development in 2D and 3D human pose estimation from RGB cameras, LiDAR, and radars. However, human pose estimation from images is adversely affected by occlusion and lighting, which are common in many scenarios of interest. Radar and LiDAR technologies, on the other hand, need specialized hardware that is expensive and power-intensive. Furthermore, placing these sensors in non-public areas raises significant privacy concerns. To address these limitations, recent research has explored the use of WiFi antennas (1D sensors) for body segmentation and key-point body detection. This paper further expands on the use of the WiFi signal in combination with deep learning architectures, commonly used in computer vision, to estimate dense human pose correspondence. We developed a deep neural network that maps the phase and amplitude of WiFi signals to UV coordinates within 24 human regions. The results of the study reveal that our model can estimate the dense pose of multiple subjects, with comparable performance to image-based approaches, by utilizing WiFi signals as the only input. This paves the way for low-cost, broadly accessible, and privacy-preserving algorithms for human sensing.
/r/computervision
https://redd.it/10eg0d6
Using computer vision to find shortest paths on cross stitching patterns (code on comments)
/r/computervision
https://redd.it/108emlz
Nvidia DeepStream 101: A beginner’s guide to real-time computer vision
https://chirag4798.medium.com/nvidia-deepstream-101-a-beginners-guide-to-real-time-computer-vision-afefcb5d7fba?source=friends_link&sk=b5bdfe8e2fb1b387ac3db8b8c08b5e7f
/r/computervision
https://redd.it/109fi7a
Laptop with GPU for Work vs Cloud, Best Practices
Hey guys, in my last job as an ML CV Engineer, we were given laptops with dedicated GPU for our work and I hated it. I think there is no point explaining why crappy gaming laptops (even expensive ones) can be worse than some good-quality laptops without GPU, especially if you care about portability. Of course, we had certain cloud solutions for model training, but these laptops were always justified as "something you can quickly check and debug things before starting long training runs on the server".
Now, I got a similar role in a new company, by default they offer similar kinds of GPU laptops for ML Engineers, but we managed to have a deal that I will have a machine without GPU and see how it goes.
That got me thinking, how do you cope with such cases when you need to quickly experiment/debug your ongoing code changes in a GPU-intensive applications? Do you connect to your cloud instances and do everything there, or maybe have a separate company server, or something else? I hardly believe that having a gaming laptop is the best solution we've come so far for ML CV Researchers/Engineers. Would be interested to read what are your takes on that.
/r/computervision
https://redd.it/10boise
text to 3d open source blender addon
open source pipeline setup to generate 3d , seems they didnt finish, but looks better using point e, and dmtet for mesh, dream for texture. Firework-Games-AI-Division/dmt-meshes (github.com)
UPDATE: was prompting an alien ship, found an alien inside the ship... shiiit
​
https://preview.redd.it/lipgio7o41ca1.jpg?width=813&format=pjpg&auto=webp&v=enabled&s=9de794ab03445954f272f89b1da8e7c2fa92fbdd
/r/computervision
https://redd.it/10b52ao
Introducing Visionner (Your image dataset toolkit)
Hi guys my name is Charles, and I'm the creator of Visionner.
Visionner is a open source python package that help you Import, Normalize , Save and Manage Your custom image dataset for your computer vision task .
Why ? :
Because most of the time when we learn to create computer vision models , we just use Tensorflow or Pytorch built-in datasets , but in real world project we need to use custom dataset. And I was surprise to see that the difficult things is not what model architecture to use but how to import and normalize my custom dataset to pass it in the neural architecture.
So that is why I decide to automate this step with Visionner.
You can check the code source on my github: https://github.com/charleslf2/Visionner
You can view some showcase on Visionner webpage : https://charleslf2.github.io/Visionner/
Some outputs:
​
Import your image for any supervised computer vision tasks
​
Visualize the first 10 images of your dataset
​
Visualize your labels and save your custom dataset
/r/computervision
https://redd.it/10cet92
How good is the new YOLO? (YOLOv8)
A brief reviev of YOLOv8 capabilities, link is below: (No mailwall)
https://www.flyps.io/blog/a-new-yolo-is-here-yolov8
​
https://i.redd.it/zc8538mhgmba1.gif
/r/computervision
https://redd.it/10a0uvz
[OC] Distribution of 19 Types of Berries Native to North America + Approx. Berry Diversity/Density in NA
/r/dataisbeautiful
https://redd.it/106rh4r
When you can't afford a trip to the United States
/r/MapPorn
https://redd.it/1069x6y
[OC] Desktop Search Engine Global Market Share
/r/dataisbeautiful
https://redd.it/105zq9j