ClueWeb

From CMU -- Language Technologies Institute -- HPC Wiki
Revision as of 12:21, 31 October 2024 by 172.26.59.166 (talk) (How to Get Access to the dataset)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search



Clueweb22 is the newest in the Lemur Project's ClueWeb line of datasets that support research on information retrieval, natural language processing and related human language technologies. This new dataset was developed by the Lemur Project with significant assistance and support from Microsoft Corporation.


How to Get Access to the dataset

[edit | edit source]

You will need to sign the Individual Licence and submit it to your faculty sponsor or PHD advisor who is responsible for keeping track of who in their group is using the data.

Once your faculty sponsor or PhD advisor has the Clueweb Individual License, please submit user group change requests to be added to the group clueweb22_dataset using the HPC Cluster User Account Request Form available at: https://lti.cs.cmu.edu/misc-pages/intranet-forms.html.