Python酷库之旅-比翼双飞情侣库(05)

1、Ctrl+C：这个快捷键的作用是“拷贝”或“复制”。当你在Excel中选中某个单元格、一行、一列或整个工作表的内容后，按下Ctrl+C键，这些内容就会被复制到计算机的剪贴板中，等待下一步的粘贴操作。
2、Ctrl+V：这个快捷键的作用是“粘贴”。在你按下Ctrl+C键将内容复制到剪贴板后，可以通过按下Ctrl+V键将这些内容粘贴到Excel中的另一个位置，这两个操作经常是连续进行的，因此Ctrl+C和Ctrl+V就像一对“情侣”，总是成对出现。

除了这对常见的“情侣键”外，Excel中还有许多其他的快捷键可以帮助用户更高效地完成各种操作。然而，这些快捷键通常并没有像Ctrl+C和Ctrl+V那样形成特定的“情侣”关系。

然而，今天我不再展开介绍“情侣键”，而是要重点推介Python中的“情侣库”，即xlrd和xlwt两个第三方库。

一、xlrd库的由来

xlrd库是一种用于在Python中读取Excel文件的库，它的名称中的"xl"代表Excel，"rd"代表读取，其开发者是John Machin(注：库名字符拆分诠释，只是一种猜测)。

xlrd最初是在2005年开始开发的，是基于Python的开源项目(下载：xlrd库官网下载)。

由于Excel文件在数据处理和分析中的重要性，xlrd库填补了Python在处理Excel文件方面的空白，使得用户可以方便地在Python环境中读取Excel文件的内容，并进行进一步的数据操作和分析。

二、xlrd库优缺点

1、优点

1-1、支持多种Excel文件格式

xlrd库支持多种Excel文件格式，包括`.xls`和`.xlsx`(在旧版本中)，这使得无论数据存储在哪种格式的Excel文件中，用户都可以使用xlrd库来读取。

1-2、高效性

xlrd库使用C语言编写，因此其性能非常高，即使面对非常大的Excel文件，xlrd也可以快速地读取其中的数据。

1-3、开源性

xlrd是完全开源的，可以在GitHub等平台上找到其源代码，这使得任何人都可以根据自己的需求对其进行修改和扩展。

1-4、简单易用

xlrd提供了简单直接的API来获取单元格数据、行列数等，使得从Excel文件中读取数据变得简单而高效。

1-5、良好的兼容性

xlrd库适配多种Python版本，包括Python 2.7(不包括3.0-3.3)或Python 3.4及以上版本，这为用户提供了广泛的兼容性选择。

2、缺点

2-1、对.xlsx格式支持有限

在xlrd 1.2.0之后的版本中(大约从2020年开始)，xlrd库不再支持`.xlsx`文件格式，这限制了xlrd在新版Excel文件(主要是`.xlsx`格式)上的应用。

2-2、功能相对单一

xlrd库主要专注于从Excel文件中读取数据，而不提供写入或修改Excel文件的功能，这使得在处理需要写入或修改Excel文件的任务时，用户需要结合其他库(如`openpyxl`或`xlwt`)使用。

2-3、更新和维护频率低

由于xlrd库主要关注于读取Excel文件的功能，并且随着`.xlsx`格式的普及，其使用范围逐渐缩小，因此，xlrd库的更新和维护频率可能相对较低。

2-4、依赖外部资源

在某些情况下，xlrd库可能需要依赖外部资源或库来完全发挥其功能，这可能会增加用户在使用xlrd库时的复杂性和不确定性。

总之，xlrd库在读取Excel文件方面具有高效、开源和简单易用等优点，但在对`.xlsx`格式的支持、功能单一以及更新和维护频率等方面存在一些缺点，用户在选择使用xlrd库时需要根据自己的需求进行权衡和选择。

三、xlrd库的版本说明

xlrd库适配的Python版本根据库的不同版本而有所不同。以下是针对几个主要版本的说明：

1、xlrd 1.2.0版本

1-1、适配Python>=2.7(不包括3.0-3.3)或Python>=3.4。
1-2、该版本支持xlsx文件格式，并且是一个广泛使用的版本，因为它能够处理小到中等大小的Excel文件，并且具有较好的性能表现。

2、xlrd 2.0.1版本

2-1、适配Python>=2.7(不包括3.0-3.5)或Python>=3.6。
2-2、该版本不再支持xlsx文件格式，仅支持旧版的xls文件格式，因为在xlrd 2.0版本之后，xlrd移除了对xlsx格式的支持。

3、xlrd3(非官方名称)

xlrd3是xlrd的开源扩展库，提供了对xlsx文件格式的支持，然而，请注意，xlrd3并不是xlrd的官方名称(下载：GitHub - Dragon2fly/xlrd3)。

四、如何学好xlrd库？

1、获取xlrd库的属性和方法

用print()和dir()两个函数获取xlrd库所有属性和方法的列表

# ['Book', 'FILE_FORMAT_DESCRIPTIONS', 'FMLA_TYPE_ARRAY', 'FMLA_TYPE_CELL', 'FMLA_TYPE_COND_FMT', 'FMLA_TYPE_DATA_VAL',
# 'FMLA_TYPE_NAME', 'FMLA_TYPE_SHARED', 'Operand', 'PEEK_SIZE', 'Ref3D', 'XLDateError', 'XLRDError', 'XLS_SIGNATURE',
# 'XL_CELL_BLANK', 'XL_CELL_BOOLEAN', 'XL_CELL_DATE', 'XL_CELL_EMPTY', 'XL_CELL_ERROR', 'XL_CELL_NUMBER', 'XL_CELL_TEXT', 'ZIP_SIGNATURE', 
# '__VERSION__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', 
# '__spec__', '__version__', 
# 'biff_text_from_num', 'biffh', 'book', 'cellname', 'cellnameabs', 'colname', 'compdoc', 'count_records', 'decompile_formula', 
# 'dump', 'dump_formula', 'empty_cell', 'error_text_from_code', 'evaluate_name_formula', 'formatting', 'formula', 'info', 
# 'inspect_format', 'oBOOL', 'oERR', 'oNUM', 'oREF', 'oREL', 'oSTRG', 'oUNK', 'okind_dict', 'open_workbook', 'open_workbook_xls', 
# 'os', 'pprint', 'rangename3d', 'rangename3drel', 'sheet', 'sys', 'timemachine', 'xldate', 'xldate_as_datetime', 'xldate_as_tuple', 'zipfile']

2、获取xlrd库的帮助信息

用help()函数获取xlrd库的帮助信息

Help on package xlrd:

NAME
    xlrd

DESCRIPTION
    # Copyright (c) 2005-2012 Stephen John Machin, Lingfo Pty Ltd
    # This module is part of the xlrd package, which is released under a
    # BSD-style licence.

PACKAGE CONTENTS
    biffh
    book
    compdoc
    formatting
    formula
    info
    sheet
    timemachine
    xldate

FUNCTIONS
    count_records(filename, outfile=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)
        For debugging and analysis: summarise the file's BIFF records.
        ie: produce a sorted file of ``(record_name, count)``.
        
        :param filename: The path to the file to be summarised.
        :param outfile: An open file, to which the summary is written.
    
    dump(filename, outfile=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, unnumbered=False)
        For debugging: dump an XLS file's BIFF records in char & hex.
        
        :param filename: The path to the file to be dumped.
        :param outfile: An open file, to which the dump is written.
        :param unnumbered: If true, omit offsets (for meaningful diffs).
    
    inspect_format(path=None, content=None)
        Inspect the content at the supplied path or the :class:`bytes` content provided
        and return the file's type as a :class:`str`, or ``None`` if it cannot
        be determined.
        
        :param path:
          A :class:`string <str>` path containing the content to inspect.
          ``~`` will be expanded.
        
        :param content:
          The :class:`bytes` content to inspect.
        
        :returns:
           A :class:`str`, or ``None`` if the format cannot be determined.
           The return value can always be looked up in :data:`FILE_FORMAT_DESCRIPTIONS`
           to return a human-readable description of the format found.
    
    open_workbook(filename=None, logfile=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, verbosity=0, use_mmap=True, file_contents=None, encoding_override=None, formatting_info=False, on_demand=False, ragged_rows=False, ignore_workbook_corruption=False)
        Open a spreadsheet file for data extraction.
        
        :param filename: The path to the spreadsheet file to be opened.
        
        :param logfile: An open file to which messages and diagnostics are written.
        
        :param verbosity: Increases the volume of trace material written to the
                          logfile.
        
        :param use_mmap:
        
          Whether to use the mmap module is determined heuristically.
          Use this arg to override the result.
        
          Current heuristic: mmap is used if it exists.
        
        :param file_contents:
        
          A string or an :class:`mmap.mmap` object or some other behave-alike
          object. If ``file_contents`` is supplied, ``filename`` will not be used,
          except (possibly) in messages.
        
        :param encoding_override:
        
          Used to overcome missing or bad codepage information
          in older-version files. See :doc:`unicode`.
        
        :param formatting_info:
        
          The default is ``False``, which saves memory.
          In this case, "Blank" cells, which are those with their own formatting
          information but no data, are treated as empty by ignoring the file's
          ``BLANK`` and ``MULBLANK`` records.
          This cuts off any bottom or right "margin" of rows of empty or blank
          cells.
          Only :meth:`~xlrd.sheet.Sheet.cell_value` and
          :meth:`~xlrd.sheet.Sheet.cell_type` are available.
        
          When ``True``, formatting information will be read from the spreadsheet
          file. This provides all cells, including empty and blank cells.
          Formatting information is available for each cell.
        
          Note that this will raise a NotImplementedError when used with an
          xlsx file.
        
        :param on_demand:
        
          Governs whether sheets are all loaded initially or when demanded
          by the caller. See :doc:`on_demand`.
        
        :param ragged_rows:
        
          The default of ``False`` means all rows are padded out with empty cells so
          that all rows have the same size as found in
          :attr:`~xlrd.sheet.Sheet.ncols`.
        
          ``True`` means that there are no empty cells at the ends of rows.
          This can result in substantial memory savings if rows are of widely
          varying sizes. See also the :meth:`~xlrd.sheet.Sheet.row_len` method.
        
        
        :param ignore_workbook_corruption:
        
          This option allows to read corrupted workbooks.
          When ``False`` you may face CompDocError: Workbook corruption.
          When ``True`` that exception will be ignored.
        
        :returns: An instance of the :class:`~xlrd.book.Book` class.

DATA
    FILE_FORMAT_DESCRIPTIONS = {'xls': 'Excel xls', 'xlsb': 'Excel 2007 xl...
    FMLA_TYPE_ARRAY = 4
    FMLA_TYPE_CELL = 1
    FMLA_TYPE_COND_FMT = 8
    FMLA_TYPE_DATA_VAL = 16
    FMLA_TYPE_NAME = 32
    FMLA_TYPE_SHARED = 2
    PEEK_SIZE = 8
    XLS_SIGNATURE = b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'
    XL_CELL_BLANK = 6
    XL_CELL_BOOLEAN = 4
    XL_CELL_DATE = 3
    XL_CELL_EMPTY = 0
    XL_CELL_ERROR = 5
    XL_CELL_NUMBER = 2
    XL_CELL_TEXT = 1
    ZIP_SIGNATURE = b'PK\x03\x04'
    __VERSION__ = '2.0.1'
    biff_text_from_num = {0: '(not BIFF)', 20: '2.0', 21: '2.1', 30: '3', ...
    empty_cell = empty:''
    error_text_from_code = {0: '#NULL!', 7: '#DIV/0!', 15: '#VALUE!', 23: ...
    oBOOL = 3
    oERR = 4
    oNUM = 2
    oREF = -1
    oREL = -2
    oSTRG = 1
    oUNK = 0
    okind_dict = {-2: 'oREL', -1: 'oREF', 0: 'oUNK', 1: 'oSTRG', 2: 'oNUM'...

VERSION
    2.0.1

FILE
    e:\python_workspace\pythonproject\lib\site-packages\xlrd\__init__.py

3、用法精讲

3-13、xlrd.book.Book.sheet_by_name方法

3-13-1、语法

sheet_by_name(self, sheet_name)
    :param sheet_name: Name of the sheet required.
    :returns: A :class:`~xlrd.sheet.Sheet`.

3-13-2、参数

3-13-2-1、self(必须)：一个对实例对象本身的引用，在类的所有方法中都会自动传递。

3-13-2-2、sheet_name(必须)：一个字符串，表示要检索的工作表的名称。

3-13-3、功能

用于通过工作表名称获取工作表对象。

3-13-4、返回值

3-13-4-1、如果找到了具有给定名称的工作表，则返回该工作表对象(通常是xlrd.sheet.Sheet类型的一个实例)。

3-13-4-2、如果没有找到具有给定名称的工作表，则会抛出一个异常(如xlrd.biffh.XLRDError)。

3-13-5、说明

无

3-13-6、用法

# 13、xlrd.book.Book.sheet_by_name方法
import xlrd  
# 打开 Excel 文件  
workbook = xlrd.open_workbook('example.xls')  
# 通过名称获取工作表  
sheet = workbook.sheet_by_name('Sheet1')  
# 现在你可以使用 sheet 对象来访问和操作该工作表中的数据

3-14、xlrd.book.Book.sheets方法

3-14-1、语法

sheets(self)
    :returns: A list of all sheets in the book.
    All sheets not already loaded will be loaded.

3-14-2、参数

3-14-2-1、self(必须)：一个对实例对象本身的引用，在类的所有方法中都会自动传递。

3-14-3、功能

用于获取 Excel 工作簿(Workbook)中的所有工作表(Worksheet)对象。

3-14-4、返回值

返回的是一个Python列表，该列表包含了工作簿中所有工作表对象的引用，每个工作表对象都是xlrd.sheet.Sheet类的实例，代表了Excel文件中的一个工作表。

3-14-5、说明

无

3-14-6、用法

# 14、xlrd.book.Book.sheets方法
import xlrd  
# 打开 Excel 文件  
workbook = xlrd.open_workbook('example.xls')  
# 获取所有工作表对象列表  
sheets = workbook.sheets()  
# 遍历工作表列表  
for sheet in sheets:  
    print(sheet.name)  # 打印每个工作表的名称  
    print(sheet.nrows)  # 打印每个工作表的行数  
    print(sheet.ncols)  # 打印每个工作表的列数  
    # ... 其他操作 ...

3-15、xlrd.book.Book.sheet_names方法

3-15-1、语法

sheet_names(self)
    :returns:
      A list of the names of all the worksheets in the workbook file.
      This information is available even when no sheets have yet been
      loaded.

3-15-2、参数

3-15-2-1、self(必须)：一个对实例对象本身的引用，在类的所有方法中都会自动传递。

3-15-3、功能

用于获取 Excel 工作簿(Workbook)中所有工作表(Worksheet)的名称。

3-15-4、返回值

返回的是一个Python列表，该列表包含了工作簿中所有工作表的名称，每个名称都是字符串类型。

3-15-5、说明

无

3-15-6、用法

# 15、xlrd.book.Book.sheet_names方法
import xlrd  
# 打开Excel文件  
workbook = xlrd.open_workbook('example.xls')  
# 获取所有工作表的名称列表  
sheet_names = workbook.sheet_names()  
# 遍历并打印工作表名称  
for name in sheet_names:  
    print(name)

3-16、xlrd.book.Book.sheet_by_index方法

3-16-1、语法

sheet_by_index(self, sheetx)
    :param sheetx: Sheet index in ``range(nsheets)``
    :returns: A :class:`~xlrd.sheet.Sheet`.

3-16-2、参数

3-16-2-1、self(必须)：一个对实例对象本身的引用，在类的所有方法中都会自动传递。

3-16-2-2、sheetx(必须)：一个非负整数，表示工作表的索引号，默认从0开始。

3-16-3、功能

用于通过索引获取Excel工作簿(Workbook)中的工作表(Worksheet)对象。

3-16-4、返回值

返回的是一个xlrd.sheet.Sheet类的实例，代表了Excel文件中的一个工作表。

3-16-5、说明

索引值必须是一个非负整数，并且不能超过工作簿中工作表的总数。如果索引值超出范围，xlrd会抛出一个IndexError异常。

3-16-6、用法

# 16、xlrd.book.Book.sheet_by_index方法
import xlrd  
# 打开Excel文件  
workbook = xlrd.open_workbook('example.xls')  
# 通过索引获取工作表对象  
sheet = workbook.sheet_by_index(0)  # 获取第一个工作表，索引从0开始  
# 现在你可以使用 sheet 对象来访问和操作该工作表中的数据  
print(sheet.name)  # 打印工作表的名称  
print(sheet.nrows)  # 打印工作表的行数  
print(sheet.ncols)  # 打印工作表的列数

3-17、xlrd.book.Book.dump方法

3-17-1、语法

dump(self, f=None, header=None, footer=None, indent=0)
    :param f: open file object, to which the dump is written
    :param header: text to write before the dump
    :param footer: text to write after the dump
    :param indent: number of leading spaces (for recursive calls)

3-17-2、参数

3-17-2-1、self(必须)：一个对实例对象本身的引用，在类的所有方法中都会自动传递。

3-17-2-2、f(可选)：一个打开的文件对象，用于写入转储(dump)内容。

3-17-2-3、header(可选)：一个字符串，表示在转储内容之前写入的文本。

3-17-2-4、footer(可选)：一个字符串，表示在转储内容之后写入的文本。

3-17-2-5、indent(可选)：一个整数，用于指定在输出结构化数据时(如JSON、YAML等)的缩进级别，默认值为0，表示不添加任何缩进。

3-17-3、功能

将某个对象(通常是类实例的某些数据)以特定的格式写入到指定的文件或输出流中。

3-17-4、返回值

不返回任何值(即返回None)。它的主要目的是将数据写入到某个位置，而不是生成一个可以返回的结果。

3-17-5、说明

无

3-17-6、用法

# 17、xlrd.book.Book.dump方法
import sys
import json
class MyClass:
    def __init__(self, data):
        self.data = data
    def dump(self, f=None, header="Data Dump", footer="End of Data Dump", indent=2):
        if f is None:
            f = sys.stdout  # 使用标准输出
        f.write(header + "\n")
        # 调用data_to_string方法将self.data转换为字符串
        f.write(self.data_to_string(indent=indent))
        f.write(footer + "\n")
    def data_to_string(self, indent):
        # 使用json.dumps将self.data转换为格式化的字符串
        return json.dumps(self.data, indent=indent, ensure_ascii=False)
if __name__ == '__main__':
    obj = MyClass({"key": "value"})
    obj.dump(indent=4)  # 输出到控制台，并使用4个空格的缩进
# 输出：
# Data Dump
# {
#     "key": "value"
# }End of Data Dump