[repo](https://github.com/Saswatm123/3D-Brain-Tumor-Segmentation-PyTorch), and the [README](https://github.com/Saswatm123/3D-Brain-Tumor-Segmentation-PyTorch) has more in-depth images of everything explained here. Thanks for reading, and I am open to hearing about job opportunities at the moment :)
/r/computervision
https://redd.it/12bhdfz
3D Brain Tumor Segmentation in PyTorch using U-Net & Eigenvector projection for color invariance
[https://github.com/Saswatm123/3D-Brain-Tumor-Segmentation-PyTorch](https://github.com/Saswatm123/3D-Brain-Tumor-Segmentation-PyTorch)
**Preface: I'm looking for a CV or ML/MLE job since getting my last offer rescinded in December.**
I created a 3D Brain Tumor segmentation model using PyTorch that takes in human brain MRI scan data as input, and outputs a segmentation map of the part of the brain that has a tumor, if it is present.
[Input images example](https://i.redd.it/e4pf5o20mura1.gif)
​
[Predicted segmentation map for previous input](https://i.redd.it/dii6ijo3mura1.gif)
There are many more pictures/GIFs on the README, but I would like to briefly go over the project here.
I settled on a U-Net architecture after experimenting with a couple other architecture styles. The skip connections really make a difference in the pixel-by-pixel accuracy of the segmentation map produced.
One interesting issue that took a bit of extra math to solve was regarding image color. The images would often be random hues since these MRI images come from different machines. For example, one brain image would be blue-on-black, while another one might be orange-on-green. Tumorous tissue is detected by a deviation in color, as can be seen in the first GIF, where even an untrained eye can pick out the tumor location, and the specific hue does not matter. Examples of this multiple hue effect can be seen in the README. This not only increases the dimensionality of the problem space, but also overactivates the residual connections often. For example, if they are used to lower-intensity color schemes (blue on black), a brighter color scheme (orange on green) would create an almost fully activated segmentation map since the first skip-connection would simply forward the image to the last couple layers. I needed a way to create color invariance in images. This is usually solved through grayscaling the image, which takes the L2 norm per pixel and uses that as a "brightness" value. However, this does not work for this use case. An L2 norm takes the shell of a 3D sphere & compresses it to a single point. This means that points on the same sphere shell, but separate from each other,(\[0,1,1\],\[0,1,0\],\[1,1,0\]) would all be considered the same, and a tumor would go undetected. We need to maintain 3D distance between points while ignoring the actual color.
Solution: We view each image as a 5D point cloud of (x, y, R, G, B), where (x, y) are the coordinates per pixel, and (R, G, B) are the values for the pixel. We may ignore (x, y) for now and focus on the (R, G, B) values. Color invariance while maintaining shape is now simply a problem of scale, translation, and rotation invariance of a 3D point cloud.
Translation invariance is trivial - we simply center the means per axis. This means that any configuration of this point cloud that has the same shape, but is translated differently, maps to the same position.
Rotation invariance then can be achieved by taking the Eigenvectors of our centered point cloud, ordering them by length, and mapping them to axes (largest EV = axis 0, second largest = axis 1, etc.) We can then simply rotate our point cloud according to our eigenvector projections. This ends up being a 1-sample PCA, where the sample is the point cloud image. The README shows a table of various images with this technique applied to it, along with their point cloud representations.
This technique helps my model beat the human accuracy benchmark. The problem of residual channels being overwhelmed/thrown off by various color schemes is not an issue anymore.
I prefer solutions involving invariance & explicit bias over augmentation because augmentation is exponential in time & space. If there are 5 factors with 3 levels each that we wish to make our model robust to, the extra multiplier is 3\^5, and we can get rid of this with some ML craftiness. The augmented solution is also much more vulnerable to adversarial attacks in a way that
Estimation of Z height of a ball in flight.
/r/computervision
https://redd.it/129s8jr
OpenAI API for text extraction
Hi, I have a corpus of several extracted and labeled items. I want to use these to find similar items in an unseen long text document using an openAI endpoint. Is there something like semantic search but with learned embeddings? Which route should I take? Thank you in advance.
/r/LanguageTechnology
https://redd.it/124h7sy
(Soon) NLP graduate and feel completely inferior on the job market
I am a master student in NLP/Computational linguistics and currently looking for jobs after graduation. Prepare for long panicked post, hope this is the right place to ask/vent..
Both my bachelor and master were a specialized NLP degree. Especially the bachelor was pretty general: I took all the same intro to linguistics (syntax, phonetics, morphology etc.) classes as the theoretical linguistics. I had a lot of „traditional“ NLP methods such as parsing based on formal languages, automata theory, search algorithms. Basic maths, statistics, linear algebra. Specialized seminars on coreference, sentiment analysis etc but those were mostly in the style of reading-papers-and-discussing-them. My master offered more technical and applied courses, but I did not feel well prepared since I never learned how to program neural networks myself except for a very basic numpy and pandas based classifier, but suddenly everyone was talking about transformer models. I had theoretical ML classes, but somehow we were just expected to know how to implement them into our projects too? I am now doing my thesis where I am using an existing system (pytorch-based) and adapting and tuning it for a slightly different task. While I (thought I) know how to program and the basic of how machine learning, the reality is I feel soooo out of place. I have a hard time even understanding the pytorch documentation, and I feel like there are a million things to consider. Shapes don’t match, cuda out of memory, suddenly i need to do gradient clipping which I feel I was taught about in 30min 2 years ago maybe. I usually make it work somehow after 5 nervous breakdows, but I constantly feel like I am half-assing everything, just trying to get it to run at least. If I were to build such a system, even a way simple one, from scratch, I would die.
Now looking at jobs, most of those that advertise with NLP require „practical machine learning experience with frameworks such as TensorFlow, PyTorch…“, and nearly every job is also equally directed at graduates from EITHER data science, mathematics, computer science, NLP … How can I keep up with data scientists in this aspect? Did I mess up by not practicing how to actually code and understand complex systems during my degree? I know a few other students who expressed similar concerns, at least from my school. I definitely see potential for me in areas with highly specialized use cases/messy/non-standard data, but wonder if this really needed >3 years of linguistic basics. Will employers actually care about my linguistic background compared to a data scientist with some NLP experience? Currently I feel like I would have done better doing a data science degree and then taking a few classes on linguistics later on to specialize…. I guess I will find a job one way or another but I am already scared of interviews because of these inadequacies.
/r/LanguageTechnology
https://redd.it/11zvsnj
Best GPUs for pretraining roBERTa-size LLMs with a $50K budget, 4x RTX A6000 v.s. 4x A6000 ADA v.s. 2x A100 80GB
Hi folks,
Our lab plans to purchase a server with some decent GPUs to perform some pertaining tasks for program codes. We won't work on very large LLM and we even may not try the T5 model. Currently, we want to first try the roBERTa model. We have a $50K budget. And it's our first time purchasing GPU servers.
I did some preliminary study and found the suggested GPU is A6000 ADA which has 48 GB GPU memory, according to https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/. Since our tasks require lots of GPU memory, we think a GPU with more than 32 GB will be good for us. So our alternative choices are RTX A6000 and A100 80GB HBM2 cards.
Based on these, we got three server specs from Exxact ( https://www.exxactcorp.com/TWS-115999024/configurator), (1) a $43K spec with 4 A6000 ADA cards, (2) a $32K spec with 4 RTX A6000 cards, and (3) a $41K spec with 2 A100 80GB cards. The other parts in the specs, e.g., CPU and RAM, are almost the same. I have attached the specs in screenshots.
Now, I have some questions.
1. A6000 ADA removed NVLink (https://forums.developer.nvidia.com/t/rtx-a6000-ada-no-more-nv-link-even-on-pro-gpus/230874) which is very important for performance boosting and GPU memory pooling. Does this mean it's a good choice to have multiple A6000 ADA cards on a server?
2. A6000 ADA is a very new GPU improved from RTX A6000. But it has the NVLink, which means the server GPU memory can reach 48 * 4 GB when connecting 4 RTX A6000 cards. However, we are going to use the GPU server for several years. For IT products, it's always better to purchase the latest ones. Is that true for GPU cards? And A6000 ADA has more tensor and cuda cores than RTX A6000.
3. For the A100 80GB spec, we can only have 2 cards wondering the budget. For the LLM pertaining, more cards usually mean more parallelism and faster training. Based on my study, A6000 ADA has comparable performance to A100 on DL benchmarks. Is this A100 80GB spec a good choice?
4. Except for the ahead-mentioned specs, what else would you recommend for our pretraining tasks, especially for GPUs?
Thanks for your time! We really appreciate any suggestions.
/r/deeplearning
https://redd.it/11vb220
Finno-Ugric open-source machine translation
We here at the University of Tartu created an NMT engine for 23 Finno-Ugric languages, targeting low-resource languages: Livonian, Komi, Udmurt, Võro and several others. Most of the covered low-res languages are not part of Meta's M2M100 or NLLB, nor are they part of Google Translate, Bing Translator or DeepL yet.
FairSeq translation model and full list of supported languages here: https://huggingface.co/tartuNLP/smugri3-finno-ugric-nmt. Online demo here: https://translate.ut.ee/, submitting corrected translations is also supported, in case you speak any of these languages - we are hoping to use the feedback to improve translation quality in the near future.
/r/LanguageTechnology
https://redd.it/11r0izu
Why face filter is faster on phones, when my simple opencv script to detect circles can't even get upto 20fps on i9 with 3060?
I know its not apple to apple comparison, but my point is how does a mobile app produce 30 to 60 fps facedetection and image processing, when opencv on i9 cant even do simple image proccessing fast enough.
Sorry for this dumb question, i am nust curios to know...
/r/computervision
https://redd.it/11mfe0k
NLP Research fully remote?
Hi everyone,
Is anyone doing NLP Research for companies/start-ups/research labs fully remote? If so:
1. How is your experience so far?
2. Are such openings common?
3. How did you find yours?
4. Are you still able to able to publish in top venues?
5. Can you still advance in your career?
Any information is greatly appreciated!
Thank you!
/r/LanguageTechnology
https://redd.it/11iv3kn
SpikeGPT: 230M-parameter Spiking Neural Network trained to be a language model
https://arxiv.org/abs/2302.13939v1
/r/MachineLearning
https://redd.it/11eqinv
We used text-to-location models to find Twitter mentions of "Rihanna" and "Riri" during the Super Bowl
/r/deeplearning
https://redd.it/119zfxw
I am working on a salient feature extractor, to allow future farmers to collect training data about invasive weed species directly from their fields.
/r/computervision
https://redd.it/116yood
R Hitchhiker’s Guide to Super-Resolution: Introduction and Recent Advances
I'm glad to share with you our Open Access survey paper about image super-resolution:
https://ieeexplore.ieee.org/abstract/document/10041995
The goal of this work is to give an overview of the abundance of publications in image super-resolution, give an introduction for new researchers, and open thriving discussions as well as point to potential future directions to advance the field :)
/r/MachineLearning
https://redd.it/11287zf
Good replacement for Tensorflow's Object detection API
The TF Object detection api has been deprecated for a while now, but I really liked the fact that it provided a standardized interface to train and test multiple model architectures. I was wondering if there was a popular alternative today?
I know the new big boy in object detection is YoloV8 so maybe I should just switch to using that model and ecosystem instead.
Edit: Never mind, Ultralytics and yolov8 slaps, I will be using that from now on.
/r/computervision
https://redd.it/10uq4c5
an explicitly invariant model is not.
The Loss Functions I used were DICE & Tversky. DICE is simply Intersection over Union between our predicted segmentation map and the ground truth segmentation map (code in repo & below).
def DICE_loss(input, target, eps= 1e-5):
'''
Args:
input:
Predicted Tensor to gauge accuracy of. Same size as target.
target:
Target Tensor to use as ground truth. Same size as input.
eps:
Smoothing value to ensure no division by zero.
Desc:
DICE Loss function computes 1 - DICE coefficient. DICE coefficient
is representation of Intersection over Union. Formula is:
2 * |Input && Target| / ( |Input| + |Target| )
For |...| sybolizing cardinality of a set.
Since input can include soft probabilities as well as hard 1/0,
the cardinality of an input is the sum.
Returns:
Tensor containing 1 - DICE coefficient, optimal when minimized @ 0
'''
intersection = (input * target).view(input.shape[0], -1).sum(axis= -1)
union = input.view(input.shape[0], -1).sum(axis= -1) + target.view(target.shape[0], -1).sum(axis= -1)
return (1 - 2*intersection/(union + eps) ).sum()
Tversky is similar, but more fine tuned. Tversky breaks down the Union term into False Positive + False Negative + True Positive. We can then add alpha & beta parameters to the False Positive & False Negative terms & guide our model's learning dynamically based on the mistakes it is making. Here is the code (also in repo).
def tversky_loss(input, target, eps= 1, alpha= .5, beta= .5):
'''
Args:
input:
Predicted Tensor to gauge accuracy of. Same size as target.
target:
Target Tensor to use as ground truth. Same size as input.
eps:
Smoothing value to ensure no division by zero.
alpha:
Weight to put on False Positives. Higher value penalizes more.
Value of .5 for alpha & beta results in standard DICE loss.
beta:
Weight to put on False Negatives. Higher value penalizes more.
Value of .5 for alpha & beta results in standard DICE loss.
Desc:
Tversky Loss is DICE Loss (IoU) with separate weights put on
False Positives and False Negatives. The Union calculation for
the denominator is framed as:
Union = True Positive + False Positive + False Negative
This allows us to put separate weights on False Positives and
False Negatives, leading to the calculation:
Union = True Positive + alpha * False Positive + beta * False Negative
Values of .5 for both parameters create the standard DICE loss.
Values lie in domain (0, inf).
Returns:
Tensor containing 1 - Tversky coefficient, optimal when minimized @ 0.
'''
# Flattens mask to single binary image since all 3 channels are the same
# for all masks in the batch
target = target[:,0,:,:].reshape(-1)
input = input.reshape(-1)
true_pos = (input * target).sum()
false_pos = ( (1-target) * input).sum()
false_neg = (target * (1-input) ).sum()
tversky_coef = (true_pos + eps) / (true_pos + alpha*false_pos + beta*false_neg + eps)
return 1 - tversky_coef
The model, like I mentioned before, is a simple U-Net architecture, that looks like this:
[Image created in NN-SVG & MS Paint](https://preview.redd.it/m6eo4v1pzura1.png?width=1236&format=png&auto=webp&v=enabled&s=7596f1d05baed4a8a613ab7b3d43181f2f4c41cc)
The PyTorch code for the model can be found in the
3D Brain Tumor Segmentation in PyTorch using U-Net & Eigenvector projection for color invariance
https://github.com/Saswatm123/3D-Brain-Tumor-Segmentation-PyTorch
Preface: I'm looking for a CV or ML/MLE job since getting my last offer rescinded in December.
I created a 3D Brain Tumor segmentation model using PyTorch that takes in human brain MRI scan data as input, and outputs a segmentation map of the part of the brain that has a tumor, if it is present.
Input images example
​
Predicted segmentation map for previous input
There are many more pictures/GIFs on the README, but I would like to briefly go over the project here.
I settled on a U-Net architecture after experimenting with a couple other architecture styles. The skip connections really make a difference in the pixel-by-pixel accuracy of the segmentation map produced.
One interesting issue that took a bit of extra math to solve was regarding image color. The images would often be random hues since these MRI images come from different machines. For example, one brain image would be blue-on-black, while another one might be orange-on-green. Tumorous tissue is detected by a deviation in color, as can be seen in the first GIF, where even an untrained eye can pick out the tumor location, and the specific hue does not matter. Examples of this multiple hue effect can be seen in the README. This not only increases the dimensionality of the problem space, but also overactivates the residual connections often. For example, if they are used to lower-intensity color schemes (blue on black), a brighter color scheme (orange on green) would create an almost fully activated segmentation map since the first skip-connection would simply forward the image to the last couple layers. I needed a way to create color invariance in images. This is usually solved through grayscaling the image, which takes the L2 norm per pixel and uses that as a "brightness" value. However, this does not work for this use case. An L2 norm takes the shell of a 3D sphere & compresses it to a single point. This means that points on the same sphere shell, but separate from each other,([0,1,1\],[0,1,0\],[1,1,0\]) would all be considered the same, and a tumor would go undetected. We need to maintain 3D distance between points while ignoring the actual color.
Solution: We view each image as a 5D point cloud of (x, y, R, G, B), where (x, y) are the coordinates per pixel, and (R, G, B) are the values for the pixel. We may ignore (x, y) for now and focus on the (R, G, B) values. Color invariance while maintaining shape is now simply a problem of scale, translation, and rotation invariance of a 3D point cloud.
Translation invariance is trivial - we simply center the means per axis. This means that any configuration of this point cloud that has the same shape, but is translated differently, maps to the same position.
Rotation invariance then can be achieved by taking the Eigenvectors of our centered point cloud, ordering them by length, and mapping them to axes (largest EV = axis 0, second largest = axis 1, etc.) We can then simply rotate our point cloud according to our eigenvector projections. This ends up being a 1-sample PCA, where the sample is the point cloud image. The README shows a table of various images with this technique applied to it, along with their point cloud representations.
This technique helps my model beat the human accuracy benchmark. The problem of residual channels being overwhelmed/thrown off by various color schemes is not an issue anymore.
I prefer solutions involving invariance & explicit bias over augmentation because augmentation is exponential in time & space. If there are 5 factors with 3 levels each that we wish to make our model robust to, the extra multiplier is 3\^5, and we can get rid of this with some ML craftiness. The augmented solution is also much more vulnerable to adversarial attacks in a way that
Vectorised Object Mapping for Neural Field SLAM
By: **Xin Kong** **Shikun Liu** **Marwan Taher** **Andrew Davison** **Dyson Robotics Lab** **Imperial College London**
https://kxhit.github.io/vMAP
TL;DR: We present vMAP, an object-level real-time mapping system, with each object represented by a separate MLP neural field model, and object models are optimised in parallel via vectorised training.
We present vMAP, an object-level dense SLAM system using neural field representations. Each object is represented by a small MLP, enabling efficient, watertight object modelling without the need for 3D priors. As an RGB-D camera browses a scene with no prior information, vMAP detects object instances on-the-fly, and dynamically adds them to its map. Specifically, thanks to the power of vectorised training, vMAP can optimise as 50 individual objects in a single scene, with an extremely efficient training speed of 5Hz map update. We experimentally demonstrate significantly improved scene-level and object-level reconstruction quality compared to prior neural field SLAM systems.
​
​
https://preview.redd.it/0nlr8i5bnwqa1.png?width=2404&format=png&auto=webp&v=enabled&s=57e8b4daae8de3592ed54a6af25936098d0546ab
/r/computervision
https://redd.it/126scqw
Should I specialize in NLP considering the advent of Large Language Models?
I am feeling that most of cutting edge research work is being done in a handful of companies. In that case, how does the future look like say 5 years down the line for somebody specialising in research in NLP? Seems like models like ChatGPT can do many of NLP tasks and are so ahead of the curve that it will ne difficult to beat them. How do job prospects look like in NLP?
/r/LanguageTechnology
https://redd.it/121gv4c
Whisper Open AI: How to not include silences in the timestamps returned?
Hello
​
I'm using Whisper, the timestamps includes the silence.
When having a video with a speaker starting his speech at sec 10, I'm getting the first timestamp to be at sec 1. instead of sec 10.
Here is my config:
POST/ v1/audio/transcriptions
Config
```
{
model:"whisper-1"
file:"...mp3"
response_format:"srt",
prompt:"Hello, welcome to my lecture"
}
​
```
​
Output:
```
1
00:00:01,000 --> 00:00:14,000
Why are there both successful and struggling entrepreneurs?
​
2
00:00:15,000 --> 00:00:23,000
Many customers prefer to watch videos to enjoy online content.
​
3
00:00:24,000 --> 00:00:32,000
an other sentences.
​
​
​
```
​
* I believe `1` it should be `00:00:10,000 --> 00:00:14,000`, since there is no one talking at all for 10 sec.
* Also, the `3`, the speakers starts again talking at sec 28, but I'm getting the timestamp to be at sec 24. The silence is simply included in the timestamp with Whisper
​
Any idea how I could fix that, maybe using a prompt?
​
Thanks!
/r/LanguageTechnology
https://redd.it/11xdnvd
Fine tune BERT for domain-specific information retrieval.
Hi guys, I'm a little lost on how to start a little side project.
So I want to take a BERT model, fine tune it on additional information about a specific domain which it was not initially trained on and then it should be able to answer questions regarding that topic. The way I understand it, I would need to put an additional question answering head on top of the fine-tuned model, in order for it to be able to answer questions and not just put out "random", to my query related sentences. Is this thinking correct?
I question this because all I find on the internet is fine tuning a model on qa- data, that is labeled dataset with questions and answers. My dataset on the otherhand consists on only text data, hence the title "information retrieval".
Thanks for your insights!
/r/LanguageTechnology
https://redd.it/11sxkj0
Is deep learning really so annoying?
I'm currently a master's student considering doing a PhD in computer vision afterwards (I haven't fully decided yet).
I really like 3D geometry. I also find some recent works combining 3D geometry with deep learning very elegant.
However, training deep networks annoys me. I feel like it blocks my productivity and I don't have the patience to wait for days to see how one parameter change affects the model. It seems that I don't produce much, I mostly test. Also, as a NN is a black box, its behaviour seems super random.
I'm wondering:
1. Is this the sign that I should simply look for a 3D geometry job in industry? Or can it be that the frustration goes away when one gains experience in deep learning?
2. Over time, does one develop better intuition in deep learning about what works and what doesn't or does it remain largely guesswork?
Thanks in advance!
/r/computervision
https://redd.it/11pqwoi
D Simple Questions Thread
Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
Thanks to everyone for answering questions in the previous thread!
/r/MachineLearning
https://redd.it/11ckopj
Research ActiveLab: Active Learning with Data Re-Labeling
I’m excited to share ActiveLab, a better algorithm for practical active learning.
https://preview.redd.it/g4yvrdyrkdla1.png?width=1544&format=png&auto=webp&v=enabled&s=33ce49d75f26590a1b86fd59c98462c7359016da
I recently published a paper introducing this novel method and an open-source Python implementation that is easy-to-use for all data types (image, text, tabular, audio, etc). For data scientists, I’ve made a quick Jupyter tutorial to run ActiveLab on your own data. For ML researchers, I’ve made all of our benchmarking code available for reproducibility so you can see for yourself how effective ActiveLab is in practice.
Labeled data is key to train models, but data annotators often make mistakes. One can collect multiple annotations per datapoint to get a more reliable consensus label, but this is expensive! To train the best ML model with the least data labeling, a key question is: which new data should I label, or which of my current labels should be checked again?
https://preview.redd.it/wvm5sskokdla1.png?width=960&format=png&auto=webp&v=enabled&s=3c6000bdbfc28217bf8f0f4d0910bf65f12d6cbd
ActiveLab automatically answers this question for you, allowing you to train the most accurate ML model via a smaller number of total annotations than required to reach similar accuracy with popular active learning methods. ActiveLab is highly practical — it runs quickly and works with: any type of ML model, batch settings where many examples are (re)labeled before model retraining, and settings where multiple annotators can label an example (or just one annotator).
If you're interested in reading more, check out my blogpost: https://cleanlab.ai/blog/active-learning/
/r/MachineLearning
https://redd.it/11gb5aq
If a model was trained on low resolution images, how well is it expected to generalize during test to high-resolution images ?
Lately, I have seen some examples of research using CIFAR and FER2013 (Facial Expression Recognition).
Both sets have low resolution images, resp. 32x32 and 48x48 images.
It seems to me that most studies using these datasets report good performance, on testsets that have similar resolutions and come from the same data pool. But I have doubts if training with low resolution images, the model will generalize well to different datasets with high resolution.
My question is :
Does anyone have experience with this, having trained on low resolution data and then after that having tested on different dataset with higher resolution?
Are there any studies that addressed this question ?
Thank you very much in advance for your input!
/r/computervision
https://redd.it/11bmpda
Real-Time-Object-Counting-on-Jetson-Nano
https://github.com/R-Mahmoudi/Real-Time-Object-Counting-on-Jetson-Nano
/r/deeplearning
https://redd.it/1174qgv
Open sourcing Rerun: A toolbox for visualizing Computer Vision
Today we're making the Rerun open source project public. Links to docs and repo on rerun.io
Rerun beta: Visualize Computer Vision
Rerun is now installable as
pip install rerun-sdk
for Python users and
cargo add rerun
for Rust users. C/C++ support is planned but not there yet.
Rerun is an SDK for logging data like images, tensors and point clouds, paired with an app that builds visualizations around that data. We built Rerun for computer vision and robotics developers. It makes it easy to debug, explore and understand internal state and data with minimal code. The point is to make it much easier to build computer vision and robotics solutions for the real world.
Rerun is in beta. It is already quite powerful and useful. A couple of great teams have been using it for several months as both their main internal debugging tool, and as a way to show off their systems to customers and investors.However, we're just getting started and have lots of exciting features in the pipeline.
We are also open for contributions now and are all looking forward to hearing your feedback!
Visualization of a sparse 3D reconstruction done with COLMAP
/r/computervision
https://redd.it/112w0br
[R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research
/r/MachineLearning
https://redd.it/110s8ui
N Getty Images Claims Stable Diffusion Has Stolen 12 Million Copyrighted Images, Demands $150,000 For Each Image
From Article:
Getty Images new lawsuit claims that Stability AI, the company behind Stable Diffusion's AI image generator, stole 12 million Getty images with their captions, metadata, and copyrights "without permission" to "train its Stable Diffusion algorithm."
The company has asked the court to order Stability AI to remove violating images from its website and pay $150,000 for each.
However, it would be difficult to prove all the violations. Getty submitted over 7,000 images, metadata, and copyright registration, used by Stable Diffusion.
/r/MachineLearning
https://redd.it/10w6g7n
Fine tuning mt5
How do I fine-tune an MT5 model for generating Bengali paraphrases? I have enough datasets but I can't find a working script to fine-tune an MT5 model.
/r/LanguageTechnology
https://redd.it/10rvura