I like one feature about facebook while updating status is that whenever I paste a URL, it grabs the primary details about the page and shows immediately. Which increases the chances that person viewing the post will more likely will click on it since he/she gets the summary right away.
I thought of mimicking the similar functionality using PHP. Well so why not lets try it.
Level: Beginner
You should be aware of the DOMDocument Class of PHP. If not please have a look at DOMDocument on php.net. Manual is more than enough to learn DOMDocument.
We are going to use very basic functions of the DOMDocument viz:
1. loadHTMLFile
2. getElementsByTagName
We are going to create this in 3 very simple steps:
- Get a URL for which we want to fetch the details
- With the use of DOMDocument fetch the details
- Print the essential details
We are going to create a class for the same, so that if required we can use it anywhere we want.
TO START WITH:
We will create a very simple HTML form where the user will enter the URL:
<!DOCTYPE HTML> <html> <head> <title>Page Extractor</title> </head> <body> <form method="get" action=""> <input type="text" name="url" /> </form> <div class="pageinfo"> <!-- --> </div> </body>
Also we will use the following class skeleton to write the feature
// its the very basic singleton class
class Extractor
{
private static $instance;
public $url;
public function __construct()
{
if(!self::$instance)
{
self::$instance = $this;
}
return self::$instance;
}
private function getUrl()
{
// it will check for the url
}
private function extractDetails()
{
// it will actually extract the details using DOMDocument
}
private function printDetails()
{
// it will just organise the fetched/extracted details
}
public function getPageDetails()
{
// this is the only exposed function to public
}
}
Lets move on to steps now:
Step 1: Get a URL for which we want to fetch the details
We will just check whether the user has entered the URL before submitting the form in the following way.
private function getUrl()
{
// you should also add the validation of URL over here to check for valid url
if(isset($_GET['url']) && $_GET['url'] != '')
{
return $_GET['url'];
}
else
{
return false;
}
}
Step 2: With the use of DOMDocument fetch the details
We will use the loadHTMLFile to load the URL and then perform operations to get the details out of it.
We are basically looking for following things:
- title of the page
- images on the page
- meta information about the page
private function extractPageDetails()
{
$arrDetails = null;
$doc = new DOMDocument();
// added @ to suppress the errors
@$doc->loadHTMLFile($this->url);
foreach($doc->getElementsByTagName('title') as $title)
{
$arrDetails['title'] = $title->nodeValue;
}
foreach($doc->getElementsByTagName('meta') as $meta)
{
// since there are lot of meta tags available with name:content
// we will create a array of meta tags and lets see ahead what all we can use
$arrDetails['meta'][$meta->getAttribute('name')] = $meta->getAttribute('content');
}
foreach($doc->getElementsByTagName('img') as $img)
{
// we will fetch all the images on the page and put it in array
$arrDetails['images'][] = $img->getAttribute('src');
}
return $arrDetails;
}
Step 3: Print the essential details.
Well this is the just the function which will print the details fetched in Step 2
private function printDetails($pageDetails)
{
$strHTML = '';
$strHTML .= '<h2>'.$pageDetails['title'].'</h2>';
$strHTML .= '<div>';
if(isset($pageDetails['images']))
{
$strHTML .= '<img src="'.$pageDetails['images'][0].'" />';
}
if(isset($pageDetails['meta']['description']))
{
$strHTML .= '<p>';
$strHTML .= $pageDetails['meta']['description'];
$strHTML .= '</p>';
}
$strHTML .= '</div>';
return $strHTML;
}
Here how the complete program will look. Have added very basic CSS to it
<?php
// FB like Page details extractor from URL with PHP DOMDocument
class Extractor
{
private static $instance;
public $url;
public function __construct()
{
if(!self::$instance)
{
$this->url = $this->getUrl();
self::$instance = $this;
}
return self::$instance;
}
private function getUrl()
{
if(isset($_GET['url']) && $_GET['url'] != '')
{
return $_GET['url'];
}
else
{
return false;
}
}
private function extractPageDetails()
{
$arrDetails = null;
$doc = new DOMDocument();
@$doc->loadHTMLFile($this->url);
foreach($doc->getElementsByTagName('title') as $title)
{
$arrDetails['title'] = $title->nodeValue;
}
foreach($doc->getElementsByTagName('meta') as $meta)
{
$arrDetails['meta'][$meta->getAttribute('name')] = $meta->getAttribute('content');
}
foreach($doc->getElementsByTagName('img') as $img)
{
$arrDetails['images'][] = $img->getAttribute('src');
}
return $arrDetails;
}
private function printDetails($pageDetails)
{
$strHTML = '';
$strHTML .= '<h2>'.$pageDetails['title'].'</h2>';
$strHTML .= '<div>';
if(isset($pageDetails['images']))
{
$strHTML .= '<img src="'.$pageDetails['images'][0].'" />';
}
if(isset($pageDetails['meta']['description']))
{
$strHTML .= '<p>';
$strHTML .= $pageDetails['meta']['description'];
$strHTML .= '</p>';
}
$strHTML .= '</div>';
return $strHTML;
}
public function getPageDetails()
{
if($this->url)
{
$pageDetails = $this->extractPageDetails();
return $this->printDetails($pageDetails);
}
else
{
return '';
}
}
}
$ext = new Extractor;
?>
<!DOCTYPE HTML>
<html>
<head>
<title>Page Extractor</title>
<style type="text/css">
h2,p,img,div{ padding:0;margin:0 }
.pageinfo{ background:#eee; font-family: Tahoma; padding:10px; overflow:hidden; }
.pageinfo h2{ }
.pageinfo img{ float:left; height:100px; width:100px; }
.pageinfo p{ float:left }
</style>
</head>
<body>
<form method="get" action="">
<input type="text" name="url" />
</form>
<div class="pageinfo">
<?php
if($ext->url)
{
echo $ext->getPageDetails();
}
?>
</div>
</body>
Well thats it, this is for the beginners level. Its not even a tutorial but just the introduction to DOMDocument class and how it can be used.