winforms 为什么Wikipedia API查询返回HTML内容而不是XML？

lp0sw83n 于 2022-12-04 发布在其他

关注(0)|答案(1)|浏览(141)

我试着在我的应用程序中使用Wikipedia API，来显示我们在TextBox上键入的内容。
单击Button后，API应该根据我在TextBox上键入的内容返回一个XML文件。
但是，当我使用WebClient的DownloadString()方法时，该方法返回HTML内容而不是XML;为什么会这样
当我在Web浏览器中使用该URL时，它可以正确打开和显示。
下面是我的代码：

private void button1_Click_1(object sender, EventArgs e)
{
    var webclient = new WebClient();
    var pageSourceCode = webclient.DownloadString("http://id.wikipedia.org/w/api.php?Format=xml&action=query&prop=extracts&titles=" + textBox1.Text + "&redirects=true");

    var doc = new XmlDocument();
    doc.LoadXml(pageSourceCode);

    //This line causes an exception, because it's HTML
    var fnode = doc.GetElementsByTagName("extract")[0];

    try
    {
        string ss = fnode.InnerText;
        Regex regex = new Regex("\\<[^\\>]*\\>");
        string.Format("Before: {0}", ss);
        ss = regex.Replace(ss, string.Empty);
        string result = string.Format(ss);
        richTextBox1.Text = result;
    }
    catch (Exception)
    {
        richTextBox1.Text = "error";
    }
}

我不明白为什么内容是HTML。

winforms

来源：https://stackoverflow.com/questions/74634101/why-a-wikipedia-api-query-returns-html-content-instead-of-xml

1条答案

按热度按时间

mzaanser1#

查询中的参数区分大小写。

https://id.wikipedia.org/w/api.php?
  Format=xml&action=query&prop=extracts&titles=" + textBox1.Text + "&redirects=true"

这里，format字段是用大写字母Format书写的，因此解析器无法识别它，并且假定您没有指定格式。
在本例中，返回的是一个HTML页面，该页面描述了成功查询的内容，通知尚未指定格式，并添加了JSON格式的示例结果。
协议应设置为https
另外，考虑到WebClient是一次性的，因此使用using语句声明它：

string pageSourceCode = string.Empty;

using (var client = new WebClient()) {
    pageSourceCode = client.DownloadString(
        "https://id.wikipedia.org/w/api.php?format=xml&action=query&prop=extracts&titles=" + 
        textBox1.Text + 
        "&redirects=true");
}

if (string.IsNullOrEmpty(pageSourceCode)) return;

// The rest

我建议尽快将WebClient替换为HttpClient，因为前者在较新版本的.NET中标记为obsolete
有了这个类别，您的程式码看起来会像这样：
(Note（一个HttpClient对象只创建一次，可以多次使用）

using System.Net.Http;

// Simplified but functional declaration 
private static readonly HttpClient client = new HttpClient();
private async void button1_Click(object sender, EventArgs e)
{
    string pageSourceCode = await client.GetStringAsync("[The query]");

    if (string.IsNullOrEmpty(pageSourceCode)) return;

    // The rest
}

赞(0）回复(0）举报 2022-12-04

我来回答

winforms 为什么Wikipedia API查询返回HTML内容而不是XML？

1条答案

相关问题

热门标签

最新问答