目录
问题描述:
原始数据存储在一个.txt文件中,存储格式如下:
如何将按照上面格式存储的内容,修改成下面的格式(方便UniCOQE处理):
注意:
index从0开始计数
问题解决
generated_path= "/home/qtxu/UniCOQE_20230812/data/tuple/car/train_three_combined.txt" # 原始路径
Unicoqe_path = "/home/qtxu/UniCOQE_20230812/data/tuple/car/train.txt" # 修改之后的保存路径
with open(generated_path, 'r') as fr, open(Unicoqe_path, 'w') as fw:
read_lines = fr.readlines()
for line in read_lines:
try:
sent, label = line.strip().split("\t")
fw.write(sent+"####")
except:
span_index =[]
cur_span = line.strip()[1:-1].split(';')
sub, obj,asp = cur_span[0], cur_span[1], cur_span[2]
for part in [sub,obj,asp]:
part_index = [int(index) for index, word in (pair.split('&')for pair in part.strip()[1:-1].split())]
span_index.append(part_index)
fw.write(str(span_index)+"\n")