CSV解析

一直以为csv靠逗号(,)分割数据,那么只要用str.spilt(',',row)便可以将数据分割开。

事实证明想简单了,csv里还有这样的规定,如果数据内有双引号(")和逗号(,)那么,数据就要用双引号括起来,而双引号要替换两个双引号。

比如说数据是: hello,world,那么他在csv里应该表达为:"hello,world"

数据是: say "hello",那么他在csv里就应该是"say ""hello"""

这样就使得解析的时候没法直接用split函数。

根据规则解析应该符合以下状态图:

(有些字倒,将就看吧)

那么使用python实现便是:

class CsvInterpreter:
    _lines = {"0": {"quot": ("1", False), "other": ("2", True)},
              "1": {"other": ("1", True), "quot": ("3", False), "comma": ("1", True)},
              "2": {"other": ("2", True), "comma": ("0", False), "enter": ("4", False)},
              "3": {"quot": ("1", True), "comma": ("0", False), "enter": ("4", False)},
              "5": {"comma": ("0", False), "enter": ("4", False)}}

    def _init(self):
        self._chars = ""
        self._current_status = "0"
        self._buffer = ""
        self._container = []

    def __init__(self):
        self._buffer = None
        self._chars = None
        self._container = None
        self._current_status = None
        self._init()

    def _next(self, char: str) -> None:
        if char == '"':
            cond = "quot"
        elif char == ',':
            cond = "comma"
        elif char == '\n':
            cond = "enter"
        else:
            cond = "other"
        if cond not in self._lines[self._current_status]:
            raise ValueError("格式不正确", self._chars, self._container, self._buffer, self._current_status, cond, char)
        else:
            next_status, if_input = self._lines[self._current_status][cond]
            self._current_status = next_status
            if if_input:
                self._buffer += char

    def split(self, line: str) -> list:
        self._init()
        self._chars = line
        for char in line:
            self._next(char)
            if self._current_status in ["0", "4"]:
                self._container.append(self._buffer)
                self._buffer = ""
            if self._current_status == "4":
                return self._container
        if self._current_status not in ["3", "4"]:
            raise ValueError("格式不正确")
        self._container.append(self._buffer)
        return self._container

运行:

csv_interpreter = CsvInterpreter()
with open('*.csv','r',encoding='utf-8') as f:
    for line in f.readlines():
        row = csv_interpreter.split(line)
        print(row)

相关推荐

  1. <span style='color:red;'>CSV</span><span style='color:red;'>解</span><span style='color:red;'>析</span>

    CSV

    2024-04-26 18:36:02      34 阅读
  2. Django——CBV源码

    2024-04-26 18:36:02       39 阅读
  3. CSS 列表样式(ul)全面

    2024-04-26 18:36:02       31 阅读
  4. CSP - 2022 普及组初赛试题及

    2024-04-26 18:36:02       68 阅读
  5. CSR、SSR与同构渲染全方位

    2024-04-26 18:36:02       32 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-04-26 18:36:02       98 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-04-26 18:36:02       106 阅读
  3. 在Django里面运行非项目文件

    2024-04-26 18:36:02       87 阅读
  4. Python语言-面向对象

    2024-04-26 18:36:02       96 阅读

热门阅读

  1. Promise

    Promise

    2024-04-26 18:36:02      36 阅读
  2. Vue 3组合式API深度剖析:核心API使用指南

    2024-04-26 18:36:02       33 阅读
  3. UE5主视口导航快捷键汇总

    2024-04-26 18:36:02       37 阅读
  4. vue2中的文件命名规范

    2024-04-26 18:36:02       30 阅读
  5. Spring(25) 为什么使用 SpringCloud,而不是用 Dubbo?

    2024-04-26 18:36:02       29 阅读