我正在使用來自數(shù)據(jù)科學(xué)的fine-tuning一個伯特模型的現(xiàn)有代碼。我面臨的問題屬于代碼的這一部分,它試圖將我們的數(shù)據(jù)格式化為PyTorchdata.Dataset
對象:
class MeditationsDataset(torch.utils.data.Dataset):
def _init_(self, encodings, *args, **kwargs):
self.encodings = encodings
def _getitem_(self, idx):
return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
def _len_(self):
return len(self.encodings.input_ids)
dataset = MeditationsDataset(inputs)
運行代碼時,我會遇到以下錯誤:
TypeError Traceback (most recent call last)
<ipython-input-144-41fc3213bc25> in <module>()
----> 1 dataset = MeditationsDataset(inputs)
/usr/lib/python3.7/typing.py in __new__(cls, *args, **kwds)
819 obj = super().__new__(cls)
820 else:
--> 821 obj = super().__new__(cls, *args, **kwds)
822 return obj
823
TypeError: object.__new__() takes exactly one argument (the type to instantiate)
我已經(jīng)搜索了這個錯誤,但這里的問題是,遺憾的是,我對PyTorch或OOP都不熟悉,所以我無法修復(fù)這個問題。你能告訴我應(yīng)該從這個代碼中添加或刪除什么以便運行它嗎?提前非常感謝。
如果需要,我們的數(shù)據(jù)如下:
{'input_ids': tensor([[ 2, 1021, 1005, ..., 0, 0, 0],
[ 2, 1021, 1005, ..., 0, 0, 0],
[ 2, 1021, 1005, ..., 0, 0, 0],
...,
[ 2, 1021, 1005, ..., 0, 0, 0],
[ 2, 103, 1005, ..., 0, 0, 0],
[ 2, 4, 0, ..., 0, 0, 0]]),
'token_type_ids': tensor([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]]),
'attention_mask': tensor([[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
...,
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 0, ..., 0, 0, 0]]),
'labels': tensor([[ 2, 1021, 1005, ..., 0, 0, 0],
[ 2, 1021, 1005, ..., 0, 0, 0],
[ 2, 1021, 1005, ..., 0, 0, 0],
...,
[ 2, 1021, 1005, ..., 0, 0, 0],
[ 2, 1021, 1005, ..., 0, 0, 0],
[ 2, 4, 0, ..., 0, 0, 0]])}
Python中的特殊函數(shù)使用雙下劃線前綴和后綴。在您的情況下,要實現(xiàn)
data.Dataset
,必須有__init__
、__getitem__
和__len__
: