Getting website title and description

Getting the a website title and description is easy. Using the PHP’s builtin file_get_contents command together with a regex pattern allows us to capture and get any website title and description without any complex methods that is if the site has a title or a description. In case a site has no description a simple excerpt function is also provided.
Getting the site title:

function getMetaTitle($content){
$pattern = "|<[s]*title[s]*>([^<]+)<[s]*/[s]*title[s]*>|Ui";
if(preg_match($pattern, $content, $match))
return $match[1];
else
return false;
}

The code above returns the title of the site enclosed by the tags <title> and </title>. The function would return a boolean false in case there was none.
Getting the meta description:

function getMetaDescription($content) {
$metaDescription = false;
$metaDescriptionPatterns = array("/]*>/Ui", "/]*>/Ui");
foreach ($metaDescriptionPatterns as $pattern) {
if (preg_match($pattern, $content, $match))
$metaDescription = $match[1];
break;
}
return $metaDescription;
}

The code above returns the meta description of the site enclosed with single quotes or double quotes. It will return a boolean false it there wasn’t any. If this would happen we could get an excerpt of maybe the first website sentence to serve as our website description instead, however getting an excerpt would not be very efficient and i had some trouble with my code. Please fell free to make a comment to optimize it.
Getting the first website sentence:

function getExcerpt($content) {
$text = html_entity_decode($content);
$excerpt = array();
//match all tags
preg_match_all("|<[^>]+>(.*)]+>|", $text, $p, PREG_PATTERN_ORDER);
for ($x = 0; $x < sizeof($p[0]); $x++) {
if (preg_match('< p >i', $p[0][$x])) {
$strip = strip_tags($p[0][$x]);
if (preg_match("/./", $strip))
$excerpt[] = $strip;
}
if (isset($excerpt[0])){
preg_match("/([^.]+.)/", $strip,$matches);
return $matches[1];
}
}
return false;
}

The code above reads the entire page and looks for the <p> tag, then returns the first phrase that ends with a period and stripping all the html code inside.
Here’s a sample code to test our script:

$url = 'http://www.tildemark.com/';
$content = file_get_contents($url);
$title = getMetaTitle($content);
$description = getMetaDescription($content);
$excerpt = getExcerpt($content);
print "title: $title ";
print "< br />";
print "description: $description ";
print "< br />";
print "excerpt: $excerpt";
?>

You may download a working copy of the title and description scraper script.
Thank you for the comment:
Yes, indeed. We could use the builtin get_meta_tags function to get the website description without any knowledge on regular expressions. here’s how:

<?php
$meta_data= get_meta_tags('http://www.tildemark.com/');
echo $meta_data['description'];
?>

Aside from getting the description, you could also get Author, Keyword and GeoPosition meta data using the function get_meta_data().

13 Comments
  1. you did everything the codes are wonderful, but then it’s hard to understand from the new comers of regular expression… so better to use a built-in PHP function of extracting meta tag contents… this is how to extract contents under description tag…
    simply use “get_meta_tags”

  2. I’m not sure if the get_meta_tags() function uses the cURL extension in fetching the contents of the website whose meta data is to be parsed. If so, then this function is just too handy that I would place this into my list of favorite PHP functions.
    On the otherhand, if this (get_meta_tags()) function is not using cURL, maybe it is worth to note the benchmark details of this function. In this way, we could reconsider writing a function that will do the same that uses cURL to fetch the contents of the page to be parsed as what the author of this blog is doing.

  3. yeah. get_meta_tags is a bit slower. But, on the brighter side its much easier.

  4. Thank you tildemark. i need title fatch script. thank you very much.

  5. great code

  6. Awesome Code Dude!!!!
    can u share keyword function to…

  7. You can get keywords using the php function:
    get_meta_tags($url);

  8. Thanks. this code run slowly but it works.

  9. This is a fantastic script. I would like to integrate this script with a search script I’ve found. Can anyone suggest a way I could modify this to collect the titles of every page in a site?

  10. Great!!!!!!!!!

  11. Why use regex to parse HTML when there are countless XML libraries that are built to do exactly that?
    HTML is not a regular language, any poorly formed markup will generally cause a regex like this to fail.
    Regex is never the right tool for this job.
    http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-rege

  12. Well php provides various option for fetching web content. file_get_contents with preg_match is one of them. But i would rather say to use curl php functions to make your search more faster.

  13. Hi,

    And you’ll need to iconv for correct encoding of data.

    For example “UTF-8″;

    if (!empty($title)) {
    if (preg_match(“//i”, $contents, $matches)) {
    $charset = strtoupper(trim($matches[2]));
    if (!empty($charset) && $charset != “UTF-8″)
    $title = @iconv($charset, “UTF-8″, $title);
    }
    }

Leave a Reply