📌  相关文章
📜  根据最近的 DateTime 合并两个 Pandas DataFrame

📅  最后修改于: 2022-05-13 01:55:30.896000             🧑  作者: Mango

根据最近的 DateTime 合并两个 Pandas DataFrame

在本文中,我们将讨论如何根据最近的 DateTime 合并 Pandas DataFrame。要首先了解如何合并 DataFrame,您必须了解如何为其创建 DataFrame,您必须参考文章创建 Pandas DataFrame 创建 DataFrames 后需要合并它们并合并 Dataframe 有一个名为merge_asof()的函数在编写时可以写为:

笔记:

  •  要了解有关此函数的更多信息,请参阅文章 Python的pandas.merge_asof()函数
  • 数据帧必须按键排序。

循序渐进的方法

第一步:导入pandas库

要完成此任务,我们必须导入名为 Pandas 的库。



import pandas as pd

第 2 步:创建数据框

在这一步中,我们必须使用函数“pd.DataFrame()”创建数据帧。在此,我们创建了 2 个数据帧,一个名为 left,另一个名为 right,因为我们的最后目标是基于最近的 DataTime 合并 2 个数据帧。可以写成:

第 3 步:合并数据框并打印它们

在这一步中,将使用函数“pd.merge_asof()”合并数据帧。 merge_asof()函数的结果存储在一个变量中,然后使用“print()”打印该变量。

Python3
# Importing the required package
import pandas as pd
  
# Creating the DataFrame of left side
left = pd.DataFrame({
    
    "time": [pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.030"),
             pd.Timestamp("2020-03-25 13:30:00.041"),
             pd.Timestamp("2020-03-25 13:30:00.048"),
             pd.Timestamp("2020-03-25 13:30:00.049"),
             pd.Timestamp("2020-03-25 13:30:00.072"),
             pd.Timestamp("2020-03-25 13:30:00.075")
             ],
    
    "ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
               "AAPL", "GOOG", "MSFT"],
    
    "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 
            720.50, 52.01],
    
    "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 
            720.88, 52.03]
})
  
# Creating the Dataframe of right side
right = pd.DataFrame({
    "time": [
        pd.Timestamp("2020-03-25 13:30:00.023"),
        pd.Timestamp("2020-03-25 13:30:00.038"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048")
    ],
    "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
    
    "price": [51.95, 51.95, 720.77, 720.92, 98.0],
    
    "quantity": [75, 155, 100, 100, 100]
})
  
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(right, left, on="time",
                                 by="ticker")
  
# print the variable
print(merged_dataframe)


Python3
# Importing the required package
import pandas as pd
# Creating the DataFrame of left side
left = pd.DataFrame({
    "time": [pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.030"),
             pd.Timestamp("2020-03-25 13:30:00.041"),
             pd.Timestamp("2020-03-25 13:30:00.048"),
             pd.Timestamp("2020-03-25 13:30:00.049"),
             pd.Timestamp("2020-03-25 13:30:00.072"),
             pd.Timestamp("2020-03-25 13:30:00.075")
             ],
    "ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
               "AAPL", "GOOG", "MSFT"],
    
    "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
            720.50, 52.01],
    
    "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01,
            720.88, 52.03]
})
  
# Creating the Dataframe of right side
right = pd.DataFrame({
    "time": [
        pd.Timestamp("2020-03-25 13:30:00.023"),
        pd.Timestamp("2020-03-25 13:30:00.038"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048")
    ],
    "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
    
    "price": [51.95, 51.95, 720.77, 720.92, 98.0],
    
    "quantity": [75, 155, 100, 100, 100]
})
  
# Applying merge_asof on data and store it 
# in a variable
merged_dataframe = pd.merge_asof(left, right, on="time",
                                 by="ticker")
  
# print the variable
print(merged_dataframe)


Python3
# Importing the required package
import pandas as pd
  
# Creating the DataFrame of left side
left = pd.DataFrame({
    "time": [pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.030"),
             pd.Timestamp("2020-03-25 13:30:00.041"),
             pd.Timestamp("2020-03-25 13:30:00.048"),
             pd.Timestamp("2020-03-25 13:30:00.049"),
             pd.Timestamp("2020-03-25 13:30:00.072"),
             pd.Timestamp("2020-03-25 13:30:00.075")
             ],
    "ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
               "AAPL", "GOOG", "MSFT"],
    
    "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
            720.50, 52.01],
    
    "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 
            720.88, 52.03]
})
  
# Creating the Dataframe of right side
right = pd.DataFrame({
    "time": [
        pd.Timestamp("2020-03-25 13:30:00.023"),
        pd.Timestamp("2020-03-25 13:30:00.038"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048")
    ],
    "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
    
    "price": [51.95, 51.95, 720.77, 720.92, 98.0],
    
    "quantity": [75, 155, 100, 100, 100]
})
  
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(left, right, on="time", by="ticker",
                                 tolerance=pd.Timedelta("2ms"))
  
# print the variable
print(merged_dataframe)


Python3
# Importing the required package
import pandas as pd
  
# Creating the DataFrame of left side
left = pd.DataFrame({
    "time": [pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.030"),
             pd.Timestamp("2020-03-25 13:30:00.041"),
             pd.Timestamp("2020-03-25 13:30:00.048"),
             pd.Timestamp("2020-03-25 13:30:00.049"),
             pd.Timestamp("2020-03-25 13:30:00.072"),
             pd.Timestamp("2020-03-25 13:30:00.075")
             ],
    "ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
               "AAPL", "GOOG", "MSFT"],
    
    "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
            720.50, 52.01],
    
    "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 
            720.88, 52.03]
})
  
# Creating the Dataframe of right side
right = pd.DataFrame({
    "time": [
        pd.Timestamp("2020-03-25 13:30:00.023"),
        pd.Timestamp("2020-03-25 13:30:00.038"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048")
    ],
    
    "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
    
    "price": [51.95, 51.95, 720.77, 720.92, 98.0],
    
    "quantity": [75, 155, 100, 100, 100]
})
  
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(left, right, on="time", by="ticker",
                                 tolerance=pd.Timedelta("2ms"),
                                 allow_exact_matches=False)
  
# print the variable
print(merged_dataframe)


Python3
# Importing the required package
import pandas as pd
  
# Creating the DataFrame of left side
left = pd.DataFrame({
    "time": [pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.030"),
             pd.Timestamp("2020-03-25 13:30:00.041"),
             pd.Timestamp("2020-03-25 13:30:00.048"),
             pd.Timestamp("2020-03-25 13:30:00.049"),
             pd.Timestamp("2020-03-25 13:30:00.072"),
             pd.Timestamp("2020-03-25 13:30:00.075")
             ],
    
    "ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
               "AAPL", "GOOG", "MSFT"],
    
    "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
            720.50, 52.01],
    
    "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 
            720.88, 52.03]
})
  
# Creating the Dataframe of right side
right = pd.DataFrame({
    "time": [
        pd.Timestamp("2020-03-25 13:30:00.023"),
        pd.Timestamp("2020-03-25 13:30:00.038"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048")
    ],
    
    "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
    
    "price": [51.95, 51.95, 720.77, 720.92, 98.0],
    
    "quantity": [75, 155, 100, 100, 100]
})
  
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(left, left, on="time", 
                                 by="ticker")
  
# print the variable
print(merged_dataframe)


输出 :

示例 1:现在我们在 merge_asof函数更改左右 Dataframe 的位置。

蟒蛇3

# Importing the required package
import pandas as pd
# Creating the DataFrame of left side
left = pd.DataFrame({
    "time": [pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.030"),
             pd.Timestamp("2020-03-25 13:30:00.041"),
             pd.Timestamp("2020-03-25 13:30:00.048"),
             pd.Timestamp("2020-03-25 13:30:00.049"),
             pd.Timestamp("2020-03-25 13:30:00.072"),
             pd.Timestamp("2020-03-25 13:30:00.075")
             ],
    "ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
               "AAPL", "GOOG", "MSFT"],
    
    "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
            720.50, 52.01],
    
    "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01,
            720.88, 52.03]
})
  
# Creating the Dataframe of right side
right = pd.DataFrame({
    "time": [
        pd.Timestamp("2020-03-25 13:30:00.023"),
        pd.Timestamp("2020-03-25 13:30:00.038"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048")
    ],
    "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
    
    "price": [51.95, 51.95, 720.77, 720.92, 98.0],
    
    "quantity": [75, 155, 100, 100, 100]
})
  
# Applying merge_asof on data and store it 
# in a variable
merged_dataframe = pd.merge_asof(left, right, on="time",
                                 by="ticker")
  
# print the variable
print(merged_dataframe)

输出:

注意:因此,从我们的 2 个输出中可以清楚地看出,当我们将右 DataFrame 放在第一位时,输出中的行数为 5 等于右 DataFrame 中的行数,而当左 DataFrame 放在第一位时那么输出中的行数等于左侧 DataFrame 中的行数。如果我们查看两个输出并比较它们,那么我们可以很容易地说 merge_asof() 类似于左连接,除了我们匹配最近的键而不是相等的键。



示例2:我们只在报价时间和交易时间之间的2ms内进行。

蟒蛇3

# Importing the required package
import pandas as pd
  
# Creating the DataFrame of left side
left = pd.DataFrame({
    "time": [pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.030"),
             pd.Timestamp("2020-03-25 13:30:00.041"),
             pd.Timestamp("2020-03-25 13:30:00.048"),
             pd.Timestamp("2020-03-25 13:30:00.049"),
             pd.Timestamp("2020-03-25 13:30:00.072"),
             pd.Timestamp("2020-03-25 13:30:00.075")
             ],
    "ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
               "AAPL", "GOOG", "MSFT"],
    
    "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
            720.50, 52.01],
    
    "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 
            720.88, 52.03]
})
  
# Creating the Dataframe of right side
right = pd.DataFrame({
    "time": [
        pd.Timestamp("2020-03-25 13:30:00.023"),
        pd.Timestamp("2020-03-25 13:30:00.038"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048")
    ],
    "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
    
    "price": [51.95, 51.95, 720.77, 720.92, 98.0],
    
    "quantity": [75, 155, 100, 100, 100]
})
  
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(left, right, on="time", by="ticker",
                                 tolerance=pd.Timedelta("2ms"))
  
# print the variable
print(merged_dataframe)

输出 :

示例3:我们只在报价时间和交易时间之间的10ms内进行asof,并且我们排除了时间上的完全匹配。但是,先前的数据将向前传播。

蟒蛇3

# Importing the required package
import pandas as pd
  
# Creating the DataFrame of left side
left = pd.DataFrame({
    "time": [pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.030"),
             pd.Timestamp("2020-03-25 13:30:00.041"),
             pd.Timestamp("2020-03-25 13:30:00.048"),
             pd.Timestamp("2020-03-25 13:30:00.049"),
             pd.Timestamp("2020-03-25 13:30:00.072"),
             pd.Timestamp("2020-03-25 13:30:00.075")
             ],
    "ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
               "AAPL", "GOOG", "MSFT"],
    
    "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
            720.50, 52.01],
    
    "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 
            720.88, 52.03]
})
  
# Creating the Dataframe of right side
right = pd.DataFrame({
    "time": [
        pd.Timestamp("2020-03-25 13:30:00.023"),
        pd.Timestamp("2020-03-25 13:30:00.038"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048")
    ],
    
    "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
    
    "price": [51.95, 51.95, 720.77, 720.92, 98.0],
    
    "quantity": [75, 155, 100, 100, 100]
})
  
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(left, right, on="time", by="ticker",
                                 tolerance=pd.Timedelta("2ms"),
                                 allow_exact_matches=False)
  
# print the variable
print(merged_dataframe)

输出 :

示例 4:当两个地方都使用相同的 DataFrame 时。在这个左边的 Dataframe 中,两边都使用了。

蟒蛇3

# Importing the required package
import pandas as pd
  
# Creating the DataFrame of left side
left = pd.DataFrame({
    "time": [pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.023"),
             pd.Timestamp("2020-03-25 13:30:00.030"),
             pd.Timestamp("2020-03-25 13:30:00.041"),
             pd.Timestamp("2020-03-25 13:30:00.048"),
             pd.Timestamp("2020-03-25 13:30:00.049"),
             pd.Timestamp("2020-03-25 13:30:00.072"),
             pd.Timestamp("2020-03-25 13:30:00.075")
             ],
    
    "ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
               "AAPL", "GOOG", "MSFT"],
    
    "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
            720.50, 52.01],
    
    "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 
            720.88, 52.03]
})
  
# Creating the Dataframe of right side
right = pd.DataFrame({
    "time": [
        pd.Timestamp("2020-03-25 13:30:00.023"),
        pd.Timestamp("2020-03-25 13:30:00.038"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048"),
        pd.Timestamp("2020-03-25 13:30:00.048")
    ],
    
    "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
    
    "price": [51.95, 51.95, 720.77, 720.92, 98.0],
    
    "quantity": [75, 155, 100, 100, 100]
})
  
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(left, left, on="time", 
                                 by="ticker")
  
# print the variable
print(merged_dataframe)

输出 :

它将相同的数据帧创建为 2 个帧,一个表示为 x,另一个表示为 y,即bid_x、bid_y、ask_x、ask_y。