Python酷库之旅-第三方库Pandas(032)

# 91、pandas.Series.set_flags方法
pandas.Series.set_flags(*, copy=False, allows_duplicate_labels=None)
Return a new object with updated flags.

Parameters:
copybool, default False
Specify if a copy of the object should be made.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

allows_duplicate_labelsbool, optional
Whether the returned object allows duplicate labels.

Returns:
Series or DataFrame
The same type as the caller.

91-2、参数

91-2-1、copy(可选，默认值为False)：如果设置为True，则会创建Series的一个副本，并在副本上设置标志；如果为False，则在原始Series上设置标志。大多数情况下，建议保持默认值True，以避免意外更改原始数据。

91-2-2、allows_duplicate_labels(可选，默认值为None)：如果设置为True，允许Series对象包含重复的索引标签。默认情况下，Pandas的Series不允许具有相同索引标签的重复条目。

91-3、功能

用于设置Series对象标志的方法。

91-4、返回值

没有返回任何值，它主要用于设置Series对象的标志。

91-5、说明

调用该方法后，会在原始Series对象上直接进行修改(如果copy参数设置为False)，或者返回一个副本(如果copy参数设置为True)，如果你需要获取修改后的Series对象，直接使用原始Series对象即可。

91-6、用法

91-6-1、数据准备

无

91-6-2、代码示例

# 91、pandas.Series.set_flags方法
import pandas as pd
# 创建一个示例Series
data = [10, 20, 30, 40]
index = ['a', 'b', 'c', 'd']
series = pd.Series(data, index=index)
# 使用set_flags方法
series_with_flags = series.set_flags(allows_duplicate_labels=True)
# 显示原Series和带有flags的Series
print("原始Series:")
print(series)
print("\n带有flags的Series:")
print(series_with_flags)
# 尝试添加重复标签，测试allows_duplicate_labels=True是否生效
series_with_duplicate_labels = series_with_flags._append(pd.Series([50], index=['a']))
print("\n带有重复标签的Series:")
print(series_with_duplicate_labels)

91-6-3、结果输出

# 91、pandas.Series.set_flags方法
# 原始Series:
# a    10
# b    20
# c    30
# d    40
# dtype: int64
# 
# 带有flags的Series:
# a    10
# b    20
# c    30
# d    40
# dtype: int64
# 
# 带有重复标签的Series:
# a    10
# b    20
# c    30
# d    40
# a    50
# dtype: int64

92、pandas.Series.astype方法

92-1、语法

# 92、pandas.Series.astype方法
pandas.Series.astype(dtype, copy=None, errors='raise')
Cast a pandas object to a specified dtype dtype.

Parameters:
dtypestr, data type, Series or Mapping of column name -> data type
Use a str, numpy.dtype, pandas.ExtensionDtype or Python type to cast entire pandas object to the same type. Alternatively, use a mapping, e.g. {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.

copybool, default True
Return a copy when copy=True (be very careful setting copy=False as changes to values then may propagate to other pandas objects).

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

errors{‘raise’, ‘ignore’}, default ‘raise’
Control raising of exceptions on invalid data for provided dtype.

raise : allow exceptions to be raised

ignore : suppress exceptions. On error return original object.

Returns:
same type as caller

92-2、参数

92-2-1、dtype(必须)：表示要转换为的数据类型，可以是字符串形式的数据类型名称(例如'int'、'float'、'datetime'等)，也可是NumPy的数据类型对象(例如numpy.int64、numpy.float32、numpy.datetime64等)。

92-2-2、copy(可选，默认值为False)：如果设置为True，则会创建Series的一个副本，并在副本上设置标志；如果为False，则在原始Series上设置标志。大多数情况下，建议保持默认值True，以避免意外更改原始数据。

92-2-3、errors(可选，默认值为'raise')：如果转换过程中出现错误，应该如何处理。可以选择以下几种方式：

92-2-3-1、'raise'：默认选项，如果出现错误，则引发异常。

92-2-3-2、'ignore'：忽略错误，保留原始数据类型。

92-2-3-3、'coerce'：将无法转换的值设置为缺失值(NaN)。

92-3、功能

用于将Series对象的数据类型转换为指定的数据类型。

92-4、返回值

返回一个带有转换后数据类型的新Series对象，如果copy=False，则直接在原始Series上进行数据类型转换并返回该Series。

92-5、说明

无

92-6、用法

92-6-1、数据准备

无

92-6-2、代码示例

# 92、pandas.Series.astype方法
import pandas as pd
# 创建一个示例Series
data = [10, 20, 30, 40]
index = ['a', 'b', 'c', 'd']
series = pd.Series(data, index=index)
# 将数据类型转换为float，并创建副本
series_float = series.astype('float', copy=True)
# 显示转换后的Series
print(series_float, end='\n\n')
# 将数据类型转换为datetime，并在原始Series上进行转换
series_datetime = series.astype('int64', copy=False)
# 显示转换后的Series
print(series_datetime)

92-6-3、结果输出

# 92、pandas.Series.astype方法
# a    10.0
# b    20.0
# c    30.0
# d    40.0
# dtype: float64
# 
# a    10
# b    20
# c    30
# d    40
# dtype: int64

93、pandas.Series.convert_dtypes方法

93-1、语法

# 93、pandas.Series.convert_dtypes方法
pandas.Series.convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True, dtype_backend='numpy_nullable')
Convert columns to the best possible dtypes using dtypes supporting pd.NA.

Parameters:
infer_objectsbool, default True
Whether object dtypes should be converted to the best possible types.

convert_stringbool, default True
Whether object dtypes should be converted to StringDtype().

convert_integerbool, default True
Whether, if possible, conversion can be done to integer extension types.

convert_booleanbool, defaults True
Whether object dtypes should be converted to BooleanDtypes().

convert_floatingbool, defaults True
Whether, if possible, conversion can be done to floating extension types. If convert_integer is also True, preference will be give to integer dtypes if the floats can be faithfully casted to integers.

dtype_backend{‘numpy_nullable’, ‘pyarrow’}, default ‘numpy_nullable’
Back-end data type applied to the resultant DataFrame (still experimental). Behaviour is as follows:

"numpy_nullable": returns nullable-dtype-backed DataFrame (default).

"pyarrow": returns pyarrow-backed nullable ArrowDtype DataFrame.

New in version 2.0.

Returns:
Series or DataFrame
Copy of input object with new dtype.

93-2、参数

93-2-1、infer_objects(可选，默认值为True)：是否推断object列的类型。

93-2-2、convert_string(可选，默认值为True)：是否将object类型转换为字符串。

93-2-3、convert_integer(可选，默认值为True)：是否将 object类型转换为整数。

93-2-4、convert_boolean(可选，默认值为True)：是否将object类型转换为布尔值。

93-2-5、convert_floating(可选，默认值为True)：是否将object类型转换为浮点类型。

93-2-6、dtype_backend(可选，默认值为'numpy_nullable')：内部调用，一般不需要用户设置。

93-3、功能

旨在自动为DataFrame中的每一列或Series中的数据找到最佳的数据类型，这可以通过使用更合适的数据类型来提高内存效率和性能。

93-4、返回值

返回值是一个新的、可能包含pandas扩展数据类型的Series，但dtype属性可能不足以完全反映这些内部优化。

93-5、说明

93-5-1、自动类型推断：该方法尝试推断Series中每个元素的最佳数据类型。例如，它可能将object类型转换为string类型，将integer转换为Int64类型，或者将float转换为Float64类型。

93-5-2、内存优化：通过转换为最合适的数据类型，可以减少内存使用。

93-5-3、可为空类型：在转换时，它还使用可为空类型(如Int64、Float64和boolean)，这些类型可以比它们的不可为空对应物更优雅地处理缺失值。

93-6、用法

93-6-1、数据准备

无

93-6-2、代码示例

# 93、pandas.Series.convert_dtypes方法
import pandas as pd
# 创建一个包含不同类型数据的Series
data = pd.Series([1, 2.5, '3', True])
# 原始数据类型
print("原始数据类型:")
print(data.dtypes)  
# 使用 convert_dtypes 方法
converted_data = data.convert_dtypes()
# 转换后的数据类型
print("转换后的数据类型:")
print(converted_data.dtypes)

93-6-3、结果输出

# 93、pandas.Series.convert_dtypes方法
# 原始数据类型:  
# object  
# 转换后的数据类型:  
# Int64  
# Float64  
# String  
# bool  
# dtype: object

94、pandas.Series.infer_objects方法

94-1、语法

# 94、pandas.Series.infer_objects方法
pandas.Series.infer_objects(copy=None)
Attempt to infer better dtypes for object columns.

Attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged. The inference rules are the same as during normal Series/DataFrame construction.

Parameters:
copybool, default True
Whether to make a copy for non-object or non-inferable columns or Series.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:
same type as input object

94-2、参数

94-2-1、copy(可选，默认值为None)：控制是否返回副本而不是在原始数据上进行操作。取值包括：

94-2-1-1、None(默认)：表示方法在原始数据上进行操作，而不生成副本。

94-2-1-2、True：生成数据的副本，并在副本上执行操作，原始数据保持不变。

94-2-1-3、False：在原始数据上直接执行操作，此时原始数据可能会被修改。

94-3、功能

尝试推断Series中的数据类型，并将其转换为更合适的数据类型，它可以帮助处理混合数据类型的情况，使得数据分析和操作更加高效。

94-4、返回值

返回值是一个经过数据类型推断后的Pandas Series，其中每个元素的数据类型被推断为最合适的类型。

94-5、说明

无

94-6、用法

94-6-1、数据准备

无

94-6-2、代码示例

# 94、pandas.Series.infer_objects方法
import pandas as pd
# 创建一个混合数据类型的Series
data = pd.Series([1, 2.0, '3', 4.5])
# 显示初始数据类型
print("初始数据类型:")
print(data.apply(type))
# 使用infer_objects方法推断数据类型
data = data.infer_objects()
# 显示推断后的数据类型
print("\n推断后的数据类型:")
print(data.apply(type))

94-6-3、结果输出

# 94、pandas.Series.infer_objects方法
# 初始数据类型:
# 0      <class 'int'>
# 1    <class 'float'>
# 2      <class 'str'>
# 3    <class 'float'>
# dtype: object
#
# 推断后的数据类型:
# 0      <class 'int'>
# 1    <class 'float'>
# 2      <class 'str'>
# 3    <class 'float'>
# dtype: object

95、pandas.Series.copy方法

95-1、语法

# 95、pandas.Series.copy方法
pandas.Series.copy(deep=True)
Make a copy of this object’s indices and data.

When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).

When deep=False, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).

Note

The deep=False behaviour as described above will change in pandas 3.0. Copy-on-Write will be enabled by default, which means that the “shallow” copy is that is returned with deep=False will still avoid making an eager copy, but changes to the data of the original will no longer be reflected in the shallow copy (or vice versa). Instead, it makes use of a lazy (deferred) copy mechanism that will copy the data only when any changes to the original or shallow copy is made.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Parameters:
deep
bool, default True
Make a deep copy, including a copy of the data and the indices. With deep=False neither the indices nor the data are copied.

Returns:
Series or DataFrame
Object type matches caller.

Notes

When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data (see examples below).

While Index objects are copied when deep=True, the underlying numpy array is not copied for performance reasons. Since Index is immutable, the underlying data can be safely shared and a copy is not needed.

Since pandas is not thread safe, see the gotchas when copying in a threading environment.

When copy_on_write in pandas config is set to True, the copy_on_write config takes effect even when deep=False. This means that any changes to the copied data would make a new copy of the data upon write (and vice versa). Changes made to either the original or copied variable would not be reflected in the counterpart. See Copy_on_Write for more information.

95-2、参数

95-2-1、deep(可选，默认值为True)：当deep参数设置为True时，copy()方法会创建一个新的Series对象，该对象中的数据(包括其内部的numpy数组)也会被复制，这意味着新创建的Series和原始Series在内存中是完全独立的；对其中一个所做的修改(如添加、删除元素或修改元素值)不会影响到另一个。

95-3、功能

用于创建一个Series对象的副本。

95-4、返回值

返回一个新的Series对象。

95-5、说明

如果使用pandas.Series.copy(deep=True)，则可以进行深拷贝，这意味着不仅会复制Series的数据和索引，还会复制其中的对象引用，使得新创建的Series对象与原始对象完全独立，即使修改其中一个对象，另一个对象也不受影响，因此，deep=True参数在需要创建一个原始对象完全独立的副本时非常有用。

95-6、用法

95-6-1、数据准备

无

95-6-2、代码示例

# 95、pandas.Series.copy方法
import pandas as pd
# 创建一个原始的Series对象
original_series = pd.Series([1, 2, 3, 4, 5])
# 进行深拷贝，创建副本
copied_series = original_series.copy(deep=True)
# 修改原始对象的值
original_series[0] = 100
# 打印两个对象，观察副本是否受到影响
print("原始Series对象：")
print(original_series)
print("\n拷贝的Series对象：")
print(copied_series)

95-6-3、结果输出

# 95、pandas.Series.copy方法
# 原始Series对象：
# 0    100
# 1      2
# 2      3
# 3      4
# 4      5
# dtype: int64
#
# 拷贝的Series对象：
# 0    1
# 1    2
# 2    3
# 3    4
# 4    5
# dtype: int64