我在使用jsoup时遇到了一个问题,因为它给了我一个格式错误的url错误。如果我将url硬编码到程序中,它可以正常工作,但是如果我将csv文件读入一个列表<string[]>中,然后循环列表中的每个值,它就会失败。例如,如果我硬编码http://www.clubmark.org.uk/ 在程序中,它可以正常工作,但如果我从csv读取到列表<string[]>中,它就会失败。
堆栈跟踪无效
Exception in thread "restartedMain" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.boot.devtools.restart.RestartLauncher.run(RestartLauncher.java:49)
Caused by: java.lang.IllegalArgumentException: Malformed URL: http://www.clubmark.org.uk/
at org.jsoup.helper.HttpConnection.url(HttpConnection.java:131)
at org.jsoup.helper.HttpConnection.connect(HttpConnection.java:70)
at org.jsoup.Jsoup.connect(Jsoup.java:73)
at com.domainModel.DownloadImages.findImages(DownloadImages.java:43)
at com.workingprojects.WebScraperApplication.main(WebScraperApplication.java:40)
我的主要课程是
@SpringBootApplication
@EntityScan({"com.bootstrap","com.domainModel"})
@ComponentScan({"com.bootstrap","com.domainModel"})
public class WebScraperApplication {
public static void main(String[] args) throws IOException, CsvException {
SpringApplication.run(WebScraperApplication.class, args);
DownloadImages downloadImages = new DownloadImages();
ReadCSV readCSV = new ReadCSV();
ArrayList<String[]> urls = (ArrayList<String[]>) readCSV.csvReader("C:\\link1.csv");
for (int i = 0; i < 1; i++) {
String[] thisURLObject = urls.get(0);
String thisURL =thisURLObject[0];
String status = downloadImages.findImages(thisURL, "C:\\Users\\xxx\\images");
System.out.println(thisURL + status);
}
;
System.out.println("finished");
}
}
我的课是在哪里得到图像和问题是看到的
package com.domainModel;
import org.jsoup.Jsoup;
public class DownloadImages {
//The url of the website.
@Getter @Setter
private String webSiteURL;
//The path of the folder that you want to save the images to
@Getter @Setter
private String folderPath;
public String findImages(String webSiteURL, String folderPath ) {
try {
//Connect to the website and get the html
Document doc = Jsoup.connect(webSiteURL).get();
//Get all elements with img tag ,
Elements img = doc.getElementsByTag("img");
System.out.println("Images is" + img.size());
String folderNameWk2 = webSiteURL.replace(".html", "");
String folderNameWk3 = folderNameWk2.replace("http://", "");
Path path = Paths.get(folderPath + folderNameWk3);
Files.createDirectories(path);
String path1 = path.toString();
System.out.println("The path is " + path1);
int counter = 0;
for (Element el : img) {
String docName = String.valueOf(counter)+".jpeg";
//for each element get the srs url
String src = el.absUrl("src");
System.out.println("Image Found!");
System.out.println("src attribute is : "+src);
getImages(src, path1, docName);
counter = counter+1;
}
} catch (IOException ex) {
System.err.println("There was an error");
System.out.println(ex);
// Logger.getLogger(DownloadImages.class.getName()).log(Level.SEVERE, null, ex);
}
return "complete";
}
private void getImages(String src, String folderPath, String docName) throws IOException {
// String folder = null;
//Exctract the name of the image from the src attribute
int indexname = src.lastIndexOf("/");
if (indexname == src.length()) {
src = src.substring(1, indexname);
}
indexname = src.lastIndexOf("/");
String name = src.substring(indexname, src.length());
System.out.println(name);
//Open a URL Stream
URL url = new URL(src);
InputStream in = url.openStream();
OutputStream out = new BufferedOutputStream(new FileOutputStream(folderPath+"/" + docName));
for (int b; (b = in.read()) != -1;) {
out.write(b);
}
out.close();
in.close();
}
/**
* @param webSiteURL
* @param folderPath
*/
public DownloadImages(String webSiteURL, String folderPath) {
super();
this.webSiteURL = webSiteURL;
this.folderPath = folderPath;
}
/**
*
*/
public DownloadImages() {
super();
}
}
And the class which gets the CSV file is
package com.domainModel;
public class ReadCSV {
public List<String[]> csvReader(String fileName) throws IOException, CsvException{
try (CSVReader reader = new CSVReader(new FileReader(fileName))) {
List<String[]> r = reader.readAll();
return r;
}
}
}
我的班级在csv中阅读
public class ReadCSV {
public List<String[]> csvReader(String fileName) throws IOException, CsvException{
try (CSVReader reader = new CSVReader(new FileReader(fileName))) {
List<String[]> r = reader.readAll();
return r;
}
}
}
我可以合理地确定问题出在我从列表中传递的内容的格式上,但是当我查看这些值时,它们看起来肯定是字符串
csv文件的前两行
http://www.clubmark.org.uk/, http://www.designit-uk.com/,
记事本中前两行数据的图像
暂无答案!
目前还没有任何答案,快来回答吧!