Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

An MD5 hash of the script file defined within x-proxy-phantomjs-script-url HTTP header. Used to verify that the scripts get downloaded correctly. Scripts that do not match the supplied MD5 hash will no be run.

Response structure

...

Data can be retrieved from PhantomJS in two different ways: plaintext or JSON encoded.

Plaintext data is passed as-is, and a HTTP 200 OK status code is generated automatically when returning a result (see example 2).

It may however be useful to return a different status code or custom HTTP headers instead. A specifically formatted JSON output can be used in this case (see example 1):

Code Block
languagejs
body: null,

...

headers: null,

...

statusCode: null,

...

statusMessage: null,

...

httpVersion: null

...

Possible issues with the current implementation

Example scripts

For extended function reference see API documentation and examples of PhantomJS.

Anchor
Example1
Example1
Get full page

It is impossible for search engines to extract content directly from websites that essentially are Javascript applications. Therefore, either for those search engines or SEO applications it is desirable to obtain a full source code of the page. The following code retrieves the page source code after JS manipulation, including HTTP headers and status code, and returns it back to the requester.

Code Block
languagejs
var page = require('webpage').create(),
    system = require('system'),
    address;

address = system.args[1]; // The URL that is submitted to the proxy service

var result = { // Standard response structure, see Response structure section in the documentation
  body: null,
  headers: null,
  statusCode: null,
  statusMessage: null,
  httpVersion: null,
};

page.onResourceReceived = function(response) { // Used to obtain response headers and status code from the loaded page
    if (decodeURIComponent(response.url) == address) { // Verify that it is the actual page and not internal resources that have finished loaded
		result.headers = {};
		for (var i in response.headers) {
	    	result.headers[response.headers[i].name] = response.headers[i].value; // Clone headers into the final response
		}
		
		// Clone HTTP status code and text into the final response
		result.statusCode = response.status;
		result.statusMessage = response.statusText;
    }
};

page.onLoadFinished = function(status) { // Page load including all internal assets has completed
	result.body = page.content; // Clone page HTML source code (as manipulated by any internal JS scripts) into final response
	
	// Write out final response and exit
	console.log(JSON.stringify(result));
    phantom.exit();
}

page.open(address, function (status) {
    if (status !== 'success') { // Handle failures
        console.log('FAILED loading the address');
		phantom.exit();
    }
});

Anchor
Example2
Example2
Retrieve URLs from Google results

The following code navigates to a submitted Google result page (e.g. http://www.google.com/search?q=example) and returns a plain text list of page addresses found in that page.

...