Main

related bits

0

processing priority

3

site type

5 (wiki-type site, growing by topic rather than chronologically)

review version

11

html import

20 (imported)

Events

first seen date

2024-09-15 03:53:34

expired found date

-

created at

2024-09-15 03:53:34

updated at

2024-11-09 14:08:22

Domain name statistics

length

20

crc

3525

tld

86

nm parts

0

nm random digits

0

nm rare letters

0

Connections

is subdomain of id

87719371 (github.io)

previous id

0

replaced with id

0

related id

-

dns primary id

0

dns alternative id

0

lifecycle status

0 (unclassified, or currently active)

Subdomains and pages

deleted subdomains

0

page imported products

0

page imported random

0

page imported parking

0

Error counters

count skipped due to recent timeouts on the same server IP

0

count content received but rejected due to 11-799

0

count dns errors

0

count cert errors

0

count timeouts

0

count http 429

0

count http 404

0

count http 403

0

count http 5xx

0

next operation date

-

Server

server bits

server ip

-

Mainpage statistics

mp import status

20

mp rejected date

-

mp saved date

-

mp size orig

12380

mp size raw text

2092

mp inner links count

0

mp inner links status

1 (no links)

Open Graph

title

description

T-MARS : Improving Visual Representations by Circumventing Text Feature Learning

image

site name

author

updated

2026-03-10 05:45:12

raw text

T-MARS: Improving Visual Representations by Circumventing Text Feature Learning T-MARS : Improving Visual Representations by Circumventing Text Feature Learning Pratyush Maini * 1     Sachin Goyal * 1     Zachary Lipton 1     Zico Kolter 1,2     Aditi Raghunathan 1 1 Carnegie Mellon University           2 Bosch Center for AI arXiv GitHub Summary TLDR We propose an algorithm to filter web datasets used for training CLIP in order to learn better visual representations, and achieve state-of-art zeroshot accuracy on vision tasks. Goal 1. Vision language models like CLIP are trained on web-crawled image caption pairs. 2. We aim to filter these web-datasets for better visual representation learning and improve zero-shot performance. 3. Filtering out bad samples will allow allocating computation resources to useful datapoints. A Look at the LAION Dataset 1. Our analysis shows an interesting observation: a large fraction of ima...

Text analysis

redirect type

0 (-)

block type

0 (no issues)

detected language

0 (awaiting analysis)

category id

Pozostałe (16)

index version

1

spam phrases

0

Text statistics

text nonlatin

0

text cyrillic

0

text characters

1567

text words

297

text unique words

177

text lines

45

text sentences

21

text paragraphs

6

text words per sentence

14

text matched phrases

0

text matched dictionaries

0

RSS

rss path

rss status

1 (priority 1 already searched, no matches found)

rss found date

-

rss size orig

0

rss items

0

rss spam phrases

0

rss detected language

0 (awaiting analysis)

inbefore feed id

-

inbefore status

0 (new)

Sitemap

sitemap path

sitemap status

1 (priority 1 already searched, no matches found)

sitemap review version

2

sitemap urls count

0

sitemap urls adult

0

sitemap filtered products

0

sitemap filtered videos

0

sitemap found date

-

sitemap process date

-

sitemap first import date

-

sitemap last import date

-