datacomp_large数据集下载,snapshot_download使用示例

数据集下载:https://huggingface.co/datasets/mlfoundations/datacomp_large

import os
from huggingface_hub import snapshot_download

def download_parquet_files(repo_id, output_dir):
    """
    Download .parquet files from a Hugging Face dataset repository using snapshot_download.

    Args:
    - repo_id (str): The ID of the Hugging Face dataset repository.
    - output_dir (str): Directory where the .parquet files will be saved.
    """
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    cache_dir = os.path.join(output_dir, "cache")

    hf_snapshot_args = dict(
        repo_id=repo_id,
        allow_patterns="*.parquet",
        local_dir=output_dir,
        cache_dir=cache_dir,
        local_dir_use_symlinks=False,
        repo_type="dataset",
        resume_download=True,
        max_workers=16
    )

    snapshot_download(**hf_snapshot_args)

if __name__ == "__main__":
    REPO_ID = "mlfoundations/datacomp_large"  # Replace with your dataset repo ID
    OUTPUT_DIR = "/data/xiedong/datasets_meizu/datacomp_all/large/metadata"  # Replace with your desired output directory

    download_parquet_files(REPO_ID, OUTPUT_DIR)

相关推荐

  1. datacomp_large数据下载,snapshot_download使用示例

    2023-12-22 14:00:06       43 阅读
  2. 开源数据下载地址

    2023-12-22 14:00:06       39 阅读
  3. FF++数据下载脚本代码

    2023-12-22 14:00:06       29 阅读

最近更新

  1. TCP协议是安全的吗?

    2023-12-22 14:00:06       18 阅读
  2. 阿里云服务器执行yum,一直下载docker-ce-stable失败

    2023-12-22 14:00:06       19 阅读
  3. 【Python教程】压缩PDF文件大小

    2023-12-22 14:00:06       19 阅读
  4. 通过文章id递归查询所有评论(xml)

    2023-12-22 14:00:06       20 阅读

热门阅读

  1. 面试手写代码总结

    2023-12-22 14:00:06       29 阅读
  2. 9月9日算法学习(队列)

    2023-12-22 14:00:06       40 阅读
  3. leetCode算法—13. 罗马数字转整数

    2023-12-22 14:00:06       42 阅读
  4. 创酷rs2022车机安装app

    2023-12-22 14:00:06       167 阅读
  5. 用过都说好用的API接口汇总,含免费次数

    2023-12-22 14:00:06       44 阅读
  6. P2437 蜜蜂路线

    2023-12-22 14:00:06       45 阅读
  7. Python调用js,Python执行js,pyexecjs2使用方法

    2023-12-22 14:00:06       38 阅读
  8. 用radis扩展websockets服务

    2023-12-22 14:00:06       44 阅读