Simple PHP proceedural HTTP GET wrapper

As many of the (few) readers of my blog of years ago know, I often work in PHP – it’s simpler than Perl for a small project, and is fairly universally available with most UNIX hosts these days. For those who are not familiar, it’s essentially ASP for UNIX hosts.

I figured I’d offer a rather trivial routine that’s useful for anyone who uses either libcURL in PHP or has URL wrappers within PHP enabled. The below has been reformatted to fit most browsers. I usually abuse ternary operators, but I wanted this to be legible for most PHP coders:

<?php

/*** get_http_content() by Shawn Holwegner, *** This function is placed under the GPL; I’m sure there are several *** thousand like it, some may be prettier. *** If you use this, please acknowledge so. *** Support and expansion is available for cash prizes. *** v 1.0: 4/13/2007 – ssh – Implemented full wrapper for cURL & *** file_get_contents() with AUTH_BASIC *** user:pass token support. ***/

function get_http_content($url, $fakeRefer=FALSE, $fullURLHack=FALSE, \ $userAgent=“Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)”, \ $saveHeaders=FALSE, $followRedirects=TRUE, $softTimeout=4, \ $hardTimeout=8) { // If we can’t parse the URL, it’s invalid: Quit here. $urlArray = parse_url($url) ? parse_url($url) : FALSE;

// First, look for, and hope that we have cURL. if (function_exists(‘curl_init’)) { if (isset($urlArray[‘port’]) // If we are not using the default port, set it up. curl_setopt($curl_session, CURLOPT_PORT, $urlArray[‘port’]); // Setup our user/pass session if required. if (isset($urlArray[‘user’]) && isset($urlArray[‘pass’])) { // We want allow transparent sending of user/pass combo for redirect curl_setopt($curl_session, CURLOPT_UNRESTRICTED_AUTH, TRUE); // Pass cURL our username:password combo in “user:pass” format curl_setopt($curl_session, CURLOPT_USERPWD, $urlArray[‘user’] . “:” \ . $urlArray[‘pass’]); // Accept Basic Authentication. curl_setopt($curl_session, CURLOPT_HTTPAUTH, CURLAUTH_BASIC); } $curl_session = curl_init(); // We’re going to force ourselves to use GET, in case if the previous // request was a POST/HEAD curl_setopt($curl_session, CURLOPT_HTTPGET, TRUE); // Our ‘soft connection’ timeout (4sec) curl_setopt($curl_session, CURLOPT_CONNECTTIMEOUT, “$softTimeout”); // Force failure after this time (8sec) curl_setopt($curl_session, CURLOPT_TIMEOUT, “$hardTimeout”); // Set our UserAgent to look like a real browser, IE7 on Vista. curl_setopt($curl_session, CURLOPT_USERAGENT, “$userAgent”); // If we want to save our header (debugging purposes) if ($saveHeaders) { curl_setopt($curl_session, CURLOPT_HEADER, $saveHeaders); } // Do we want to follow redirects? if (!isset($urlArray[‘user’]) && !isset($urlArray[‘pass’])) { curl_setopt($curl_session, CURLOPT_FOLLOWLOCATION, $followRedirects); // Set cURL to create it’s own redirects, else, create our own. if ($followRedirects && !$fakeReferer) { // If we follow redirects, by default, use the cURL’s referrers curl_setopt($curl_session, CURLOPT_AUTOREFERER, TRUE); } else { // XXX —- This breaks HTTP Spec, and SHOULD NOT BE USED! if ($fullURLHack) { // Hack to rebuild URL without any builtin PHP functions. if (isset($urlArray[‘query’])) $urlArray[‘path’] = $urlArray[‘path’] . “?” . \ $urlArray[‘query’]; if (isset($urlArray[‘fragment’])) $urlArray[‘path’] = $urlArray[‘path’] . “#” . \ $urlArray[‘fragment’]; } // The fake referrer will rebuild our referrer from our path, // as per HTTP spec, if not prev. overloaded with the // fullURLHack above. curl_setopt($curl_session, CURLOPT_REFERER, \ $urlArray[‘scheme’] . “://” . $urlArray[‘host’] . \ “/” . $urlArray[‘path’]); if ($followRedirects) // We want to follow redirects, with fake referrer. curl_setopt($curl_session, CURLOPT_AUTOREFERER, TRUE); } // This is not threaded, and we don’t care. Return transfer. curl_setopt($curl_session, CURLOPT_RETURNTRANSFER, true); // Set our request, finally. curl_setopt($curl_session, CURLOPT_URL, “$url”); // Actually do it $retString = curl_exec($curl_session); // Clean up, and close. curl_close($curl_session); // Ok, no cURL. Fine. } else { // If file_get_contents() exists, and we can get URLs // Note test for = equality, and not , as 0 FALSE, // but are not always the same type. (cry) if (function_exists('file_get_contents') && \ (ini_get('allow_url_fopen') = “1”)) { // Set ourselves up as a browser, no user agent, and // defaults are often bad. @ini_set(‘user_agent’, $userAgent); // Get it, or fail out. $retString = file_get_contents($url); if empty($retString) return FALSE; } else { // We can’t retrieve the URL, no, we’re not going to // write a socket wrapper. // Should return a different error, but obviously, // I don’t want to return an array. return FALSE; } } return $retString; }

?>


The above will use either php’s built in file_get_contents() with a URL wrapper, or libcURL, if possible. It’s hardly extensive, but it should allow you to get most data properly, it even supports basic user authentication.

Note that this does not support a raw socket only HTTP request; I’ve implemented those before; there’s virtually no need to do so today.

This system is not extensive enough for persistant connections; it’s not designed to be. It is a procedural method, not a class. There’s also no reason for this to be OOP, yet.

Note the warning about $fullURLHack – it will report the FULL URL for a referrer; however this is not in the HTTP spec. the HTTP spec ifications state to end referral with the path, pre-query. This allows for debugging, but should NOT be used in production.

...It’s a pretty long wrapper for a smarter file_get_contents(), I know. Welcome to Enterprise grade code.