i must read an internet site url and output the web coding from the content to some string .Next i must look for web addresses inside the string and output these to a some other string.Anyway i must assist me to just with the creation of the web coding to some string. Thank you ahead of time. i've the next code. Could it be correct

URL url = new URL("http://www.example.com/");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding();
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);

I have tried personally the jericho parsing library which switched to be very handy. I enables you to definitely see the HTML tags from the document and access the tags characteristics. For instance, to obtain all of the links' web addresses: (book the precise syntax in documentation)

Source source = new Source(new URL("http://...");
List<Element> elementList = source.getAllElements(); // loads all HTML tags in a list
    for (Element element : elementList) {
    if (element.getName().equals("A")) { // if <A> tag
    String segment = element.getContent().toString(); // will give you a string "< a href=...>...</A>
    String url = element.getAttributeValue("href"); // will return the url of the link
    }
}

I would suggest the Jsoup html parser: http://jsoup.org/download you would like the .jar file. After you have that, to seize the html is fairly simple. You are able to say

String html = Jsoup.connect("http://url.com").get().html();

To find your url's use something to traverse the string just like a normal string(Like the Scanner class, that is easy to use). To make use of you could do: (obviously try looking in the api to know that as well, but here:)

Scanner in = new Scanner(html);
String links = "";
while(in.hasNext()){
    String line = in.nextLine();
    if(line.contains("yoursearchingkeyword") 
         links += line.substring(line.indexOf("http"),line.indexOf("</a>") + "\n";
}

The hyperlinks string might have your links for you personally there.