gensim的keyedvectors本质上是键和向量之间的Map。每个向量都由其查找键标识,通常是一个短字符串标记,因此这通常是{str=>1dnumpy array}之间的Map。
除了提供对该Map的访问之外,keyedvectors还提供了更小的ram占用空间。请参见此处的比较:
CAPABILITY | KeyedVectors | FULL MODEL | NOTE
continue training vectors | ❌ | ✅ | You need the full model to train or update vectors.
smaller objects | ✅ | ❌ | KeyedVectors are smaller and need less RAM, because they don’t need to store the model state that enables training.
save/load fasttext/word2vec fmt | ✅ | ❌ | Vectors exported by the Facebook and Google tools do not support further training, but you can still load them into KeyedVectors.
append new vectors | ✅ | ✅ | Add new-vector entries to the mapping dynamically.
concurrency | ✅ | ✅ | Thread-safe, allows concurrent vector queries.
shared RAM | ✅ | ✅ | Multiple processes can re-use the same data, keeping only a single copy in RAM using mmap.
fast load | ✅ | ✅ | Supports mmap to load data from disk instantaneously.
java的deeplearning4jword2vec实现中是否有一个等价的结构?我在文件中找不到类似的东西。
暂无答案!
目前还没有任何答案,快来回答吧!