npm 无法使用Tensorflow JS的加载令牌器

4sup72z8  于 2023-01-31  发布在  其他
关注(0)|答案(2)|浏览(121)

这是我第一次处理Tensorflow.js的东西。我试图用通用句子编码器在Javascript中标记我的句子。Github Reference

$ npm install @tensorflow/tfjs @tensorflow-models/universal-sentence-encoder

使用这个命令,我安装了一个package-lock.json,并将其移动到与index.html文件相同的目录中,该目录如下所示。

/*
  Folder
    |_index.html
    |_package-lock.json
    |_index.js
    |_index.css
*/

index.html

<head>
  <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
  <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/universal-sentence-encoder"></script>   
  <script src="index.js" defer></script> 
</head>

index.js

function tokenizePad(text){
    text = use.loadTokenizer().then(tokenizer => {
        tokenizer.encode(text); 
    });
    return text;
}

text = "I enjoy my holiday very much."
var tokenized = tokenizePad(text); //error

控制台中的错误消息如下所示

Uncaught TypeError: use.loadTokenizer is not a function

有没有解决这个问题的方法?或者有没有其他的替代方案来实现同样的事情?我想把我的字符串转换成这样的东西[341,4125,8,140,31,19,54,......],如Github参考链接所述

jgzswidk

jgzswidk1#

我也曾为此挣扎过,最后得到了以下结果:

use.load().then(useObj => {
    model = useObj.model;
    tokenizer = useObj.tokenizer;

    text = "I enjoy my holiday very much."
    var tokenized = tokenizer.encode(text); 

    console.log(tokenized); //[7933, 2222, 0, 109, 7933, 2222, 0, 154, 2174, 48, 7933, 2222, 0, 1272, 7933, 2222, 0, 645, 336, 944, 7933, 2222, 0, 5568, 7933, 2222, 0, 47, 1788, 6]
});

这是在字符级别虽然。让我知道如果你找到了一种方法来获得基于单词的编码。

1zmg4dgp

1zmg4dgp2#

这将更加健壮和优雅。首先按如下方式导入:

import * as use from '@tensorflow-models/universal-sentence-encoder';

然后将其与use配合使用:

// Load the model.
use.load().then(model => {
  // Embed an array of sentences.
  const sentences = [
    'Hello.',
    'How are you?'
  ];
  model.embed(sentences).then(embeddings => {
    // `embeddings` is a 2D tensor consisting of the 512-dimensional embeddings for each sentence.
    // So in this example `embeddings` has the shape [2, 512].
    embeddings.print(true /* verbose */);
  });
});

相关问题