parquet-go的CSVWriter

parquet-go的CSVWriter

代码:

package main

import (
	"github.com/xitongsys/parquet-go-source/local"
	"github.com/xitongsys/parquet-go/writer"
	"log"
)

func main() {
	var err error
	md := []string{
		"name=Name, type=BYTE_ARRAY, convertedtype=UTF8, encoding=PLAIN",
		"name=address, type=LIST, valuetype=BYTE_ARRAY, valueconvertedtype=UTF8",
	}

	//write
	fw, err := local.NewLocalFileWriter("csv.parquet")
	if err != nil {
		log.Println("Can't open file", err)
		return
	}
	pw, err := writer.NewCSVWriter(md, fw, 4)
	if err != nil {
		log.Println("Can't create csv writer", err)
		return
	}

	num := 10
	for i := 0; i < num; i++ {
		data2 := []interface{}{
			"Student Name",
			[]string{"string1", "string2", "string3"},
		}
		if err = pw.Write(data2); err != nil {
			log.Println("Write error", err)
		}

	}
	if err = pw.WriteStop(); err != nil {
		log.Println("WriteStop error", err)
	}
	log.Println("Write Finished")
	fw.Close()

}

执行这段代码会报错:

pw, err := writer.NewCSVWriter(md, fw, 4)

报错如下:

failed to create schema from tag map: type LIST: not a valid Type string

分析原因后是CSVWriter不支持LIST。

具体报错在这里:

if t, err := parquet.TypeFromString(info.Type); err == nil {
	schema.Type = &t

} else {
		return nil, fmt.Errorf("type " + info.Type + ": " + err.Error())
}

进入parquet.TypeFromString()

func TypeFromString(s string) (Type, error) {
	switch s {
	case "BOOLEAN":
		return Type_BOOLEAN, nil
	case "INT32":
		return Type_INT32, nil
	case "INT64":
		return Type_INT64, nil
	case "INT96":
		return Type_INT96, nil
	case "FLOAT":
		return Type_FLOAT, nil
	case "DOUBLE":
		return Type_DOUBLE, nil
	case "BYTE_ARRAY":
		return Type_BYTE_ARRAY, nil
	case "FIXED_LEN_BYTE_ARRAY":
		return Type_FIXED_LEN_BYTE_ARRAY, nil
	}
	return Type(0), fmt.Errorf("not a valid Type string")
}

可以看到这里并没有LIST,也没有MAP。只支持如上一些类型。

相关推荐

  1. parquet-goCSVWriter

    2024-07-13 09:56:04       30 阅读
  2. 记csv、parquet数据预览一个bug解决

    2024-07-13 09:56:04       52 阅读
  3. Spark中写parquet文件是怎么实现

    2024-07-13 09:56:04       52 阅读
  4. Impala写Parquet文件

    2024-07-13 09:56:04       22 阅读
  5. Go语言GC

    2024-07-13 09:56:04       50 阅读
  6. Spark read load Parquet Files

    2024-07-13 09:56:04       25 阅读
  7. Go系列】Go反射

    2024-07-13 09:56:04       22 阅读
  8. Iceberg: 列式读取Parquet数据

    2024-07-13 09:56:04       70 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-07-13 09:56:04       70 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-07-13 09:56:04       74 阅读
  3. 在Django里面运行非项目文件

    2024-07-13 09:56:04       62 阅读
  4. Python语言-面向对象

    2024-07-13 09:56:04       72 阅读

热门阅读

  1. 玩转鸿蒙NXET之组件导航与路由跳转二

    2024-07-13 09:56:04       25 阅读
  2. Go语言入门之数组切片

    2024-07-13 09:56:04       31 阅读
  3. P6. 对局列表和排行榜功能

    2024-07-13 09:56:04       24 阅读
  4. 使用Nginx实现高效负载均衡

    2024-07-13 09:56:04       23 阅读
  5. CRC32简述

    2024-07-13 09:56:04       27 阅读
  6. 赛博灯泡3.0,未完善,无bug

    2024-07-13 09:56:04       23 阅读
  7. C#——二进制流序列化和反序列化

    2024-07-13 09:56:04       31 阅读
  8. Redis原子计数器incr,防止并发请求

    2024-07-13 09:56:04       27 阅读
  9. 求某个矩阵的鞍点的个数

    2024-07-13 09:56:04       23 阅读