Blocked when trying to get the content of HTML

I am trying to get the HTML content from a website , but it is blocked by JavaScript.

Here is the implementation:

- (void)viewDidLoad
{
  NSURL *htmlUrl = [NSURL URLWithString:@"https://color.adobe.com/explore/most-popular/?time=week"];
  NSStringEncoding htmlEncoding = CFStringConvertEncodingToNSStringEncoding(kCFStringEncodingUTF8);

  NSString *htmlString = [NSString stringWithContentsOfURL:htmlUrl encoding:htmlEncoding error:nil];

  NSLog(@"%@",htmlString);

  NSData *htmlData = [htmlString dataUsingEncoding:NSUTF8StringEncoding];
  TFHpple *htmlHpple = [TFHpple hppleWithHTMLData:htmlData];
}

Some part of the NSLog output:

    <h1>JavaScript Disabled</h1>
<p>Adobe Color CC requires JavaScript in order to load properly. Please enable JavaScript in your browser and reload the page.</p>
</li>
<li>
  <h1>JavaScript est désactivé</h1>
  <p>Pour pouvoir se charger correctement, Adobe Color CC requiert JavaScript. Veuillez activer JavaScript dans votre navigateur et recharger la page.</p>
  JavaScript ist erforderlich, damit Adobe Color CC ordnungsgemäß geladen wird. Aktivieren Sie JavaScript im Browser und laden Sie die Seite neu.
  </p>
</li>
<li>
  <h1>JavaScript ?????</h1>
  <p>Adobe Color CC ??????????????JavaScript ???????????????? JavaScript ???????????????????????</p>
</li>
<li>
  <h1>JavaScript desactivado</h1>
  <p>Para que Adobe Color CC pueda cargarse correctamente, se requiere JavaScript. Active JavaScript en el navegador y vuelva a cargar la página.</p>
</li>

That is not what I want.

Actually, when you open the url, you can see lots of colors. And that's what I want to parse and fetch.

But not the

<h1>JavaScript Disabled</h1> <p>Adobe Color CC requires JavaScript in order to load properly. Please enable JavaScript in your browser and reload the page.</p>

Answers:

Answer

Where as I have analysed and find from the url you have provided is that, this loads only a part of web page using simple http request and loads all other parts like the colors using the ajax request. So when you will ask for code using simple
NSString *htmlString = [NSString stringWithContentsOfURL:htmlUrl encoding:htmlEncoding error:nil];

Then it will not give you the whole bunch of html code as shown in web browser but provide only a small part. If you want the whole html code then load the url using the UIWebView and then when web view will complete loading then find the html string using code

NSString *htmlString = [webView stringByEvaluatingJavaScriptFromString:@"document.documentElement.outerHTML"];

and it will give you the whole html code as seen in web browser, now you can find whatever you want.

Important note: To find when the webView has completed ajax loading, you will have to inject some java script in your webView to call your delegates when ajax request complete loading. Or only to verify my code you can simply use

dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(20 * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{
       NSString *htmlString = [webView stringByEvaluatingJavaScriptFromString:@"document.documentElement.outerHTML"];
    });

To call your code when the ajax request will approximately complete loading in 20 seconds. inside web view delegate method

- (void)webViewDidFinishLoad:(UIWebView *)webView

Hope this is what you want, and one more thing the javascript is by default included in webView you do not need to inject it by your own.

Answer

If you want to fetch the html data then you can try like this below:-

  TFHpple *htmlHpple = [TFHpple hppleWithHTMLData:htmlData];

//After this include the below lines of code for fetching the data

    NSString *htmlXpathQueryString = @"//h1";
    NSArray *htmlNodes = [tutorialsParser searchWithXPathQuery:htmlXpathQueryString];
     for (TFHppleElement *element in htmlNodes) {
         NSLog(@"%@",[[element firstChild] content]);
     }

For more details refer this How to parse Html Data

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.