使用Arrow管理数据

在之前的数据挖掘:是时候更新一下TCGA的数据了推文中,保存TCGA的数据就是使用Arrow格式,因为占空间小,读写速度快,多语言支持(我主要使用的3种语言都支持)

 

Format

https://arrow.apache.org

 

Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

 

Language Supported

Arrow's libraries implement the format and provide building blocks for a range of use cases, including high performance analytics. Many popular projects use Arrow to ship columnar data efficiently or as the basis for analytic engines.

 

Libraries are available for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

 

Ecosystem

Apache Arrow is software created by and for the developer community. We are dedicated to open, kind communication and consensus decisionmaking. Our committers come from a range of organizations and backgrounds, and we welcome all to participate with us.

 

R

install.packages("arrow")

library(arrow)

# write iris to iris.arrow and compressed by zstd

arrow::write_ipc_file(iris,'iris.arrow', compression = "zstd",compression_level=1)

# read iris.arrow as DataFrame

iris=arrow::read_ipc_file('iris.arrow')

 

python

# conda install -y pandas pyarrow

import pandas as pd

# read iris.arrow as DataFrame

iris=pd.read_feather('iris.arrow')

# write iris to iris.arrow and compressed by zstd

iris.to_feather('iris.arrow',compression='zstd', compression_level=1)

 

Julia

using Pkg

Pkg.add(["Arrow","DataFrames"])

 

using Arrow, DataFrames

# read iris.arrow as DataFrame

iris = Arrow.Table("iris.arrow") |> DataFrame

# write iris to iris.arrow, using 8 threads and compressed by zstd

Arrow.write("iris.arrow",iris,compress=:zstd,ntasks=8)

相关推荐

  1. 使用Arrow管理数据

    2024-05-26 02:54:24       33 阅读
  2. gen_arrow_contour_xld

    2024-05-26 02:54:24       42 阅读
  3. 使用MariaDB数据库管理系统

    2024-05-26 02:54:24       53 阅读
  4. Harmonyos系统使用http访问网络和应用数据管理

    2024-05-26 02:54:24       51 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-05-26 02:54:24       98 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-05-26 02:54:24       106 阅读
  3. 在Django里面运行非项目文件

    2024-05-26 02:54:24       87 阅读
  4. Python语言-面向对象

    2024-05-26 02:54:24       96 阅读

热门阅读

  1. PCM和QAM

    2024-05-26 02:54:24       32 阅读
  2. 2024.5.25

    2024-05-26 02:54:24       33 阅读
  3. Visual Basic (VB) 编程入门:从基础到实战演练

    2024-05-26 02:54:24       36 阅读
  4. python多进程multiprocessing卡住问题

    2024-05-26 02:54:24       38 阅读
  5. 19. Vue面试题汇总

    2024-05-26 02:54:24       30 阅读
  6. 分享10个国内可以使用的GPT中文网站

    2024-05-26 02:54:24       36 阅读