在列表上實現正則表達式函數時出錯

我試圖在python中的語法標記列表上實現正則表達式,以查找語法列表的時態形式。我寫了下面的代碼來實現它。

Data preprocessing:

from nltk import word_tokenize, pos_tag
import nltk

text = "He will have been doing his homework." 

tokenized = word_tokenize(text)
tagged = pos_tag(tokenized)
tags = []
for i in range(len(tagged)):
    t = tagged[i]
    tags.append(t[1])
print(tags)

regex公式,即待實施

grammar = r"""
Future_Perfect_Continuous: {<MD><VB><VBN><VBG>}
Future_Continuous:         {<MD><VB><VBG>}
Future_Perfect:            {<MD><VB><VBN>}
Past_Perfect_Continuous:   {<VBD><VBN><VBG>}
Present_Perfect_Continuous:{<VBP|VBZ><VBN><VBG>}
Future_Indefinite:         {<MD><VB>}
Past_Continuous:           {<VBD><VBG>}
Past_Perfect:              {<VBD><VBN>}
Present_Continuous:        {<VBZ|VBP><VBG>}
Present_Perfect:           {<VBZ|VBP><VBN>}
Past_Indefinite:           {<VBD>}
Present_Indefinite:        {<VBZ>|<VBP>}

函數實現列表上的正則表達式tags

def check_grammar(grammar, tags):
    cp = nltk.RegexpParser(grammar)
    result = cp.parse(tags)
    print(result)
    result.draw()
 
check_grammar(grammar, tags)

但它返回了一個錯誤:

Traceback (most recent call last):
  File "/home/samar/Desktop/twitter_tense/main.py", line 35, in <module>
    check_grammar(grammar, tags)
  File "/home/samar/Desktop/twitter_tense/main.py", line 31, in check_grammar
    result = cp.parse(tags)
  File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 1276, in parse
    chunk_struct = parser.parse(chunk_struct, trace=trace)
  File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 1083, in parse
    chunkstr = ChunkString(chunk_struct)
  File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 95, in __init__
    tags = [self._tag(tok) for tok in self._pieces]
  File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 95, in <listcomp>
    tags = [self._tag(tok) for tok in self._pieces]
  File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 105, in _tag
    raise ValueError("chunk structures must contain tagged " "tokens or trees")
ValueError: chunk structures must contain tagged tokens or trees
? 最佳回答:

您對cp.parse()函數的調用期望語句中的每個標記都被標記,但是,您創建的tags列表只包含標記,而不包含標記,因此您的ValueError。解決方案是將pos_tag()調用(即tagged)的輸出傳遞給您的check_grammar調用。

Solution

from nltk import word_tokenize, pos_tag
import nltk

text = "He will have been doing his homework." 
tokenized = word_tokenize(text)
tagged = pos_tag(tokenized)
print(tagged)
# Output
>>> [('He', 'PRP'), ('will', 'MD'), ('have', 'VB'), ('been', 'VBN'), ('doing', 'VBG'), ('his', 'PRP$'), ('homework', 'NN'), ('.', '.')]

my_grammar = r"""
Future_Perfect_Continuous: {<MD><VB><VBN><VBG>}
Future_Continuous:         {<MD><VB><VBG>}
Future_Perfect:            {<MD><VB><VBN>}
Past_Perfect_Continuous:   {<VBD><VBN><VBG>}
Present_Perfect_Continuous:{<VBP|VBZ><VBN><VBG>}
Future_Indefinite:         {<MD><VB>}
Past_Continuous:           {<VBD><VBG>}
Past_Perfect:              {<VBD><VBN>}
Present_Continuous:        {<VBZ|VBP><VBG>}
Present_Perfect:           {<VBZ|VBP><VBN>}
Past_Indefinite:           {<VBD>}
Present_Indefinite:        {<VBZ>|<VBP>}"""


def check_grammar(grammar, tags):
    cp = nltk.RegexpParser(grammar)
    result = cp.parse(tags)
    print(result)
    result.draw()


check_grammar(my_grammar, tagged)

Output

>>> (S
>>>   He/PRP
>>>   (Future_Perfect_Continuous will/MD have/VB been/VBN doing/VBG)
>>>   his/PRP$
>>>   homework/NN
>>>   ./.)

主站蜘蛛池模板: 激情综合丝袜美女一区二区| 夜夜嗨AV一区二区三区| 日本一区二区三区在线观看 | 国产未成女一区二区三区| 久久综合一区二区无码| 春暖花开亚洲性无区一区二区| 国产探花在线精品一区二区| 色精品一区二区三区| 成人精品视频一区二区| 精品视频一区二区三区四区 | 亚洲福利视频一区二区三区| 日本亚洲国产一区二区三区| 一区二区三区中文字幕| 一区二区三区无码高清视频| 国产一区二区三区美女| 国产成人免费一区二区三区| 色屁屁一区二区三区视频国产| 无码丰满熟妇一区二区| 一区二区高清在线观看| 午夜视频久久久久一区| 色国产在线视频一区| 亚洲一区二区无码偷拍| 精品无码一区二区三区爱欲九九 | 台湾无码AV一区二区三区| 亚洲日本中文字幕一区二区三区| 视频一区在线免费观看| 国产乱码精品一区二区三| 狠狠做深爱婷婷综合一区| 一区二区三区无码视频免费福利| 中文字幕一区二区在线播放| 中文字幕精品一区| 日韩免费无码视频一区二区三区 | 国产一区二区三区在线影院 | 亚洲无码一区二区三区| 亚洲一区二区三区91| 国产怡春院无码一区二区| 天天躁日日躁狠狠躁一区| 无码毛片一区二区三区中文字幕| 亚洲一区二区三区无码影院| 国产精品一区二区av| 亚洲精品日韩一区二区小说|