Python酷库之旅-第三方库Pandas(029)

# 74、pandas.api.interchange.from_dataframe函数
pandas.api.interchange.from_dataframe(df, allow_copy=True)
Build a pd.DataFrame from any DataFrame supporting the interchange protocol.

Parameters:
df
DataFrameXchg
Object supporting the interchange protocol, i.e. __dataframe__ method.

allow_copy
bool, default: True
Whether to allow copying the memory to perform the conversion (if false then zero-copy approach is requested).

Returns:
pd.DataFrame

74-2、参数

74-2-1、df(必须)：一个类似于数据框的对象，表示要转换为Pandas Data的数据，该对象可以是任何实现了数据框接口的对象，如来自其他(例如Dask、Vaex等)的DataFrame。

74-2-2、allow_copy(可选，默认值为True)：指示在转换过程中是否允许复制数据。如果设置为True，则在需要的情况下，方法可以复制数据来保证数据的一致性和完整性；如果设置为False，方法会尝试避免复制数据，这样可以提高性能和减少内存使用，但可能会导致一些限制。

74-3、功能

用于从其他数据框架接口中导入数据框架的Pandas API方法，它将其他数据框架对象转换为Pandas DataFrame。

74-4、返回值

返回值是一个Interchange DataFrame对象，该对象是一个通用的数据框架标准，用于在不同的数据处理库之间交换数据。

74-5、说明

无

74-6、用法

74-6-1、数据准备

无

74-6-2、代码示例

# 74、pandas.api.interchange.from_dataframe函数
import pandas as pd
from pandas.api.interchange import from_dataframe
# 创建一个示例DataFrame
data = {
    'Name': ['Myelsa', 'Bryce', 'Jimmy'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
# 使用from_dataframe方法转换DataFrame
interchange_df = from_dataframe(df, allow_copy=True)
# 打印转换后的Interchange DataFrame信息
print(type(interchange_df))
print(interchange_df)

74-6-3、结果输出

# 74、pandas.api.interchange.from_dataframe函数
# <class 'pandas.core.frame.DataFrame'>
#      Name  Age         City
# 0  Myelsa   25     New York
# 1   Bryce   30  Los Angeles
# 2   Jimmy   35      Chicago

75、pandas.Series类

75-1、语法

# 75、pandas.Series类
pandas.Series(data=None, index=None, dtype: 'Dtype | None' = None, name=None, copy: 'bool | None' = None, fastpath: 'bool | lib.NoDefault' = <no_default>) -> 'None'

One-dimensional ndarray with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of
methods for performing operations involving the index. Statistical
methods from ndarray have been overridden to automatically exclude
missing data (currently represented as NaN).

Operations between Series (+, -, /, \*, \*\*) align values based on their
associated index values-- they need not be the same length. The result
index will be the sorted union of the two indexes.

Parameters
----------
data : array-like, Iterable, dict, or scalar value
   Contains data stored in Series. If data is a dict, argument order is
   maintained.
index : array-like or Index (1d)
   Values must be hashable and have the same length as `data`.
   Non-unique index values are allowed. Will default to
   RangeIndex (0, 1, 2, ..., n) if not provided. If data is dict-like
   and index is None, then the keys in the data are used as the index. If the
   index is not None, the resulting Series is reindexed with the index values.
dtype : str, numpy.dtype, or ExtensionDtype, optional
   Data type for the output Series. If not specified, this will be
   inferred from `data`.
   See the :ref:`user guide <basics.dtypes>` for more usages.
name : Hashable, default None
   The name to give to the Series.
copy : bool, default False
   Copy input data. Only affects Series or 1d ndarray input. See examples.

Notes
-----
Please reference the :ref:`User Guide <basics.series>` for more information.

75-2、参数

75-2-1、data(可选，默认值为None)：表示Series数据，可以是列表、NumPy数组、字典或标量值(如单个数值)，如果是标量值，会将该值赋给Series的每一个元素。

75-2-2、index(可选，默认值为None)：表示索引标签，用于定义Series的索引，如果没有提供，默认会生成一个从0开始的整数索引，长度必须与data的长度相同。

75-2-3、dtype(可选，默认值为None)：表示数据类型。如果没有提供，Pandas会尝试自动推断data的数据类型。

75-2-4、name(可选，默认值为None)：表示Series的名称，可以为Series对象命名，方便在DataFrame中引用。

75-2-5、copy(可选，默认值为None)：如果设为True，则会复制data，通常在传递的是其他Pandas对象时使用，以确保数据不会被修改。

75-2-6、fastpath(可选)：内部使用参数，用于优化性能，通常用户不需要显式设置这个参数。

75-3、功能

pandas.Series是Pandas库中最基本的数据结构之一，它类似于一维数组，可以存储任意类型的数据(整数、浮点数、字符串等)，该构造函数允许我们从多种数据类型创建一个Series对象。

75-4、返回值

创建一个pandas.Series对象时，返回值是一个pandas Series对象，该对象具有以下特性：

75-4-1、一维数据结构：Series是一维的，可以看作是一个带有标签的数组。

75-4-2、索引：每个数据元素都有一个对应的标签(索引)，可以通过索引来访问数据。

75-4-3、数据类型：Series中的所有数据类型是一致的(如果在创建时未指定不同类型)。

75-5、说明

无

75-6、用法

75-6-1、数据准备

无

75-6-2、代码示例

# 75、pandas.Series类
# 75-1、从列表创建Series
import pandas as pd
data = [1, 2, 3, 4, 5]
series1 = pd.Series(data)
print(series1, end='\n\n')

# 75-2、从字典创建Series
import pandas as pd
data = {'a': 1, 'b': 2, 'c': 3}
series2 = pd.Series(data)
print(series2, end='\n\n')

# 75-3、指定索引和数据类型
import pandas as pd
data = [1.5, 2.5, 3.5]
index = ['a', 'b', 'c']
series3 = pd.Series(data, index=index, dtype=float, name='Example Series')
print(series3, end='\n\n')

# 75-4、从标量值创建Series
import pandas as pd
scalar_data = 10
series4 = pd.Series(scalar_data, index=['a', 'b', 'c'])
print(series4)

75-6-3、结果输出

# 75、pandas.Series类
# 75-1、从列表创建Series
# 0    1
# 1    2
# 2    3
# 3    4
# 4    5
# dtype: int64

# 75-2、从字典创建Series
# a    1
# b    2
# c    3
# dtype: int64

# 75-3、指定索引和数据类型
# a    1.5
# b    2.5
# c    3.5
# Name: Example Series, dtype: float64

# 75-4、从标量值创建Series
# a    10
# b    10
# c    10
# dtype: int64

76、pandas.Series.index属性

76-1、语法

# 76、pandas.Series.index属性
pandas.Series.index
The index (axis labels) of the Series.

The index of a Series is used to label and identify each element of the underlying data. The index can be thought of as an immutable ordered set (technically a multi-set, as it may contain duplicate labels), and is used to index and align data in pandas.

Returns:
Index
The index labels of the Series.

76-2、参数

无

76-3、功能

提供对Series中数据索引的访问。

76-4、返回值

返回值是一个pandas.Index对象，它包含了Series中每个数据点的索引标签。

76-5、说明

在Pandas中，Series是一个一维的、长度可变的、能够存储任何数据类型的数组(尽管在实践中，它通常用于存储相同类型的数据)，并且每个元素都有一个与之关联的索引标签。

76-6、用法

76-6-1、数据准备

无

76-6-2、代码示例

# 76、pandas.Series.index属性
import pandas as pd
# 创建一个带有自定义索引的Series
s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
# 访问Series的index属性
index_obj = s.index
# 76-1、打印index_obj的类型
print(type(index_obj), end='\n\n')

# 76-2、打印index_obj的内容
print(index_obj, end='\n\n')

# 76-3、将索引转换为列表
index_list = index_obj.tolist()
print(index_list, end='\n\n')

# 76-4、获取索引的NumPy数组
index_array = index_obj.values
print(index_array)

76-6-3、结果输出

# 76、pandas.Series.index属性
# 76-1、打印index_obj的类型
# <class 'pandas.core.indexes.base.Index'>

# 76-2、打印index_obj的内容
# Index(['a', 'b', 'c', 'd'], dtype='object')

# 76-3、将索引转换为列表
# ['a', 'b', 'c', 'd']

# 76-4、获取索引的NumPy数组
# ['a' 'b' 'c' 'd']

77、pandas.Series.array方法

77-1、语法

# 77、pandas.Series.array方法
pandas.Series.array
The ExtensionArray of the data backing this Series or Index.

Returns:
ExtensionArray
An ExtensionArray of the values stored within. For extension types, this is the actual array. For NumPy native types, this is a thin (no copy) wrapper around numpy.ndarray.

.array differs from .values, which may require converting the data to a different form.

77-2、参数

无

77-3、功能

获取存储在Series对象中的数据的底层数组表示。

77-4、返回值

返回值取决于Series中数据的类型：

77-4-1、对于NumPy原生类型的数据(如整数、浮点数、字符串等)，.array方法将返回一个NumpyExtensionArray对象，这是一个对内部NumPy ndarray的封装，但不进行数据的复制，这意味着返回的数组与Series中的数据共享相同的内存区域，除非进行显式的数据复制操作。
77-4-2、对于扩展类型的数据(如分类数据、时间戳、时间间隔等)，.array方法将返回实际的ExtensionArray对象，这些对象是为了支持Pandas中非NumPy原生类型的数据而设计的，这些扩展数组提供了与NumPy数组类似的接口，但具有额外的功能或属性，以适应特定类型的数据。

77-5、说明

返回值的特点：

77-5-1、类型依赖性：返回值的具体类型取决于Series中数据的类型。

77-5-2、内存共享(对于NumPy原生类型)：在大多数情况下，返回的数组与Series中的数据共享相同的内存区域，从而避免不必要的数据复制。

77-5-3、灵活性：通过提供对底层数组的访问，.array方法允许用户进行更底层的操作或优化，尽管这通常不是Pandas推荐的常规用法。

77-6、用法

77-6-1、数据准备

无

77-6-2、代码示例

# 77、pandas.Series.array方法
import pandas as pd
# 创建一个简单的Series（包含 NumPy 原生类型的数据）
s_numpy = pd.Series([1, 2, 3, 4])
# 使用.array方法获取底层数组
arr_numpy = s_numpy.array
print(arr_numpy)
print(type(arr_numpy))

77-6-3、结果输出

# 77、pandas.Series.array方法
# <NumpyExtensionArray>
# [1, 2, 3, 4]
# Length: 4, dtype: int64
# <class 'pandas.core.arrays.numpy_.NumpyExtensionArray'>

78、pandas.Series.values属性

78-1、语法

# 78、pandas.Series.values属性
pandas.Series.values
Return Series as ndarray or ndarray-like depending on the dtype.

Warning

We recommend using Series.array or Series.to_numpy(), depending on whether you need a reference to the underlying data or a NumPy array.

Returns:
numpy.ndarray or ndarray-like

78-2、参数

无

78-3、功能

用于获取Series中数据的NumPy表示。

78-4、返回值

返回一个NumPy ndarray，其中包含了Series中的所有数据，但通常不包括索引信息。

78-5、说明

使用.values属性是获取Series中数据的一种快速方式，尤其是当你需要将数据传递给需要NumPy数组作为输入的函数或库时，然而，需要注意的是，返回的NumPy数组可能与原始的Series数据共享内存(对于非对象数据类型)，这意味着如果你修改了返回的数组，原始的Series数据也可能会被修改(尽管Pandas在许多情况下都会尝试避免这种情况)。

78-6、用法

78-6-1、数据准备

无

78-6-2、代码示例

# 78、pandas.Series.values属性
import pandas as pd
# 创建一个简单的Series
s = pd.Series([1, 2, 3, 4])
# 使用.values属性获取NumPy数组
np_array = s.values
# 输出结果  
print(np_array)
print(type(np_array))
# 修改NumPy数组（注意：这可能会影响原始的Series，但Pandas通常会避免这种情况）
np_array[0] = 10
# 检查Series是否被修改（对于非对象类型，通常不会）
print(s)
# 如果你想要一个确保不会修改原始Series的副本，可以使用.copy()
np_array_copy = s.values.copy()
np_array_copy[0] = 100
print(s)

78-6-3、结果输出

# 78、pandas.Series.values属性
# [1 2 3 4]
# <class 'numpy.ndarray'>
# 0    10
# 1     2
# 2     3
# 3     4
# dtype: int64
# 0    10
# 1     2
# 2     3
# 3     4
# dtype: int64