Datacomp - Search News

US judge dismisses nationwide renters' lawsuit over mobile home lots

WASHINGTON, Dec 5 (Reuters) - A group of companies that lease land for mobile homes has convinced a federal judge in Chicago to dismiss a proposed nationwide class action accusing them of conspiring ...

IEEE

Scaling Laws for Data Filtering—Data Curation Cannot be Compute Agnostic

Abstract: Vision-language models (VLMs) are trained for thousands of GPU hours on carefully selected subsets of massive web scrapes. For instance, the LAION public dataset retained only about 10% of ...

GitHub

for Visual Generation and Understanding

This repo implements UniTok, a unified visual tokenizer well-suited for both generation and understanding tasks. It is compatiable with autoregressive generative models (e.g. LlamaGen), multimodal ...

GitHub

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Also supports saving captions for url+caption datasets. If you believe in making reusable tools to make data easy to use for ML and you would like to contribute, please join the DataToML chat. For all ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results