Page content is loaded with JavaScript and Jsoup doesn't see it

One block on the page is filled with content by JavaScript and after loading page with Jsoup there is none of that inforamtion. Is there a way to get also JavaScript generated content when parsing page with Jsoup?

Can't paste page code here, since it is too long:

Here's element which content I need: <div id='tags_list'></div>

I need to get this information in Java. Preferably using Jsoup. Element is field with help of JavaScript:

<div id="tags_list">
    <a href="/tagsc0t20099.html" style="font-size:14;">?????????</a>
    <a href="/tagsc0t1879.html" style="font-size:14;">Sr</a>
    <a href="/tagsc0t3140.html" style="font-size:14;">??????????????</a>

Java code:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;


public class Test
    public static void main( String[] args )
            Document Doc = Jsoup.connect( "" ).get();
            Elements Tags = "#tags_list a" );

            for ( Element Tag : Tags )
                System.out.println( Tag.text() );
        catch ( IOException e )



JSoup is an HTML parser, not some kind of embedded browser engine. This means that it's completely unaware of any content that is added to the DOM by Javascript after the initial page load.

To get access to that type of content you will need an embedded browser component, there are a number of discussions on SO regarding that kind of component, eg Is there a way to embed a browser in Java?


Solved in my case with com.codeborne.phantomjsdriver NOTE: it is groovy code.


          <version> <here goes last version> </version>


import org.jsoup.Jsoup
import org.jsoup.nodes.Document
import org.openqa.selenium.WebDriver
import org.openqa.selenium.phantomjs.PhantomJSDriver

class PhantomJsUtils {
    private static String filePath = 'data/temp/';

    public static Document renderPage(String filePath) {
        System.setProperty("phantomjs.binary.path", 'libs/phantomjs') // path to bin file. NOTE: platform dependent
        WebDriver ghostDriver = new PhantomJSDriver();
        try {
            return Jsoup.parse(ghostDriver.getPageSource());
        } finally {

    public static Document renderPage(Document doc) {
        String tmpFileName = "$filePath${Calendar.getInstance().timeInMillis}.html";
        FileUtils.writeToFile(tmpFileName, doc.toString());
        return renderPage(tmpFileName);


Document doc = PhantomJsUtils.renderPage(Jsoup.parse(yourSource))

You need to understand what is happening :

  • When you query a page from a website, whether using Jsoup or your browser, what gets sent back to you is some HTML. Jsoup is able to parse that.
  • However, most websites include Javascript in that HTML, or linked from that HTML, which will populate the page with content. Your browser is able to execute the Javascript, and thus populate the page. Jsoup is not.

The way to understand this is the following : parsing HTML code is easy. Executing Javascript code and updating corresponding HTML code is a lot more complex, and is the work of a browser.

Here are some solutions for this kind of problems:

  1. If you can find what are the Ajax calls that Javascript code is making, that is loading content, you might be able to use the URL of these calls with Jsoup. In order to do that, use Developer Tools from your browser. But this is not guaranteed to work:

    • it might be that the url is dynamic, and depends on what is on the page at that time
    • if the content is not public, cookies will be involved, and simply querying the resource URL will not be enough
  2. In these cases, you will need to "simulate" the work of a browser. Fortunately, such tools exist. The one I know, and recommend, is PhantomJS. It works with Javascript, and you would need to launch it from Java by starting a new process. If you want to stick to Java, this post lists some Java alternatives.


I fact there is a "way"! Maybe it is more "a workaround" than a "way... The code below checks both for meta attribute "REFRESH" and javascript redirects... If either of them exists RedirectedUrl variable is set. So you know your target... Then you can retrieve the target page and go on...

    String RedirectedUrl=null;
    Elements meta ="html head meta");
    if (meta.attr("http-equiv").contains("REFRESH")) {
        RedirectedUrl = meta.attr("content").split("=")[1];
    } else {
        if (page.toString().contains("window.location.href")) {
            meta ="script");
            for (Element script:meta) {
                String s =;
                if (!s.isEmpty() && s.startsWith("window.location.href")) {
                    int start = s.indexOf("=");
                    int end = s.indexOf(";");
                    if (start>0 && end >start) {
                        s = s.substring(start+1,end);
                        s =s.replace("'", "").replace("\"", "");        
                        RedirectedUrl = s.trim();

... now retrieve the redirected page again...

After specifying user agent, my problem is solved.


Is there a way to get also javascript generated content when parsing page with Jsoup?

I am going to guess NO, thinking about how difficult this would be, without building an entire javascript interpreter in Java.


It is possible by combining JSoup with another framework to interpret the webpage, in my example here I'm using HtmlUnit.

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;


WebClient webClient = new WebClient();
HtmlPage myPage = webClient.getPage(URL);

Document document = Jsoup.parse(myPage.asXml());
Elements otherLinks ="a[href]");


Document Doc = Jsoup.connect(url)
    .header("Accept-Encoding", "gzip, deflate")
    .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0")


You can use a combination of JSoup and HtmlUnit to get the page contents after JavaScript scripts are done loading.



Simple Example From file

// load page using HTML Unit and fire scripts
WebClient webClient2 = new WebClient();
HtmlPage myPage = webClient2.getPage(new File("page.html").toURI().toURL());

// convert page to generated HTML and convert to document
Document doc = Jsoup.parse(myPage.asXml());

// iterate row and col
for (Element row :"table#data > tbody > tr"))
    for (Element col :"td"))
        // print results

// clean up resources        

A Complex Example: Load login, get Session and CSRF, then post and wait for home page to finish loading (15 seconds)


import org.jsoup.Connection;
import org.jsoup.Connection.Method;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.HttpMethod;
import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.WebRequest;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

//JSoup load Login Page and get Session Details
Connection.Response res = Jsoup.connect("https://loginpage").method(Method.GET).execute();

String sessionId = res.cookie("findSESSION");
String csrf = res.cookie("findCSRF");

HttpCookie cookie = new HttpCookie("findCSRF", csrf);

WebClient webClient = new WebClient();
            new URL("https://url"),

// Add other cookies/ Session ...

webClient.setAjaxController(new NicelyResynchronizingAjaxController());
// Wait time

URL url = new URL("https://login.path");
        WebRequest requestSettings = new WebRequest(url, HttpMethod.POST);

HtmlPage page = webClient.getPage(requestSettings);

// Wait
synchronized (page) {
    try {
    } catch (InterruptedException e) {

// Parse logged in page as needed
Document doc = Jsoup.parse(page.asXml());


Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.