FileShareFreak

THE SOURCE FOR BITTORRENT & P2P TIPS, TRICKS AND INFO

How to Create a Torrent Tracker, Part 4: Search Engine

One of the most important parts of a torrent site is definitely the search engine. It is probably the first thing people are going to use, and a bug free search form will greatly improve user experience. In this tutorial, we are going to implement a Browse section including torrent details columns, paging, and a form which will be used to filter results based on search queries.

Torrents table

As a first step, we build an HTML table containing all of our torrents. I assume you have added the name and size fields in your xbt_files table.

<table>
<tr><th>Date</th><th>Name</th><th>Size</th><th>Seeds</th><th>Leechers</th></tr> <?php
$q=mysql_query(“SELECT name,size,seeders,leechers,ctime FROM xbt_files”);
while($t=mysql_fetch_row($q))
{
?>
<tr><td><?php echo date(“j M y”,$t->ctime) ?></td><td><?php echo $t->name ?></td><td><?php echo $t->size ?></td><td><?php echo $t->seeds ?></td><td><?php echo $t->leechers ?></td></tr>
<?php
}
?>

</table>

This code will show a table listing all the torrent records contained in our database. It will seem good as long as we don’t have more than 25 torrents on our tracker. Otherwise, we are definitely going to need to split our table in different pages. We are going to edit our SQL query using the LIMIT clause and show a navigation bar below the table containing links to pages.

<?php

define(‘ROWS_PER_PAGE’,25); $page=intval($_GET[‘p’]);
if($page1) $page=1;
$offset=($page-1)*ROWS_PER_PAGE;

// main query
$q=mysql_query(“SELECT SQL_CALC_FOUND_ROWS name,size,seeders,leechers,ctime FROM xbt_files LIMIT $offset,”.ROWS_PER_PAGE);

$num_rows=mysql_fetch_row(mysql_query(“SELECT FOUND_ROWS()”));
$max_page=ceil($num_rows/ROWS_PER_PAGE);

if($max_page > 0 && $page>$max_page)
        $page=1;

//
// … table code…
//

// build navigation bar code
$nav=”;
for($p=1; $p$max_page; $p++)
{
        $start=(($p-1)*ROWS_PER_PAGE)+1;
        $end=$p*ROWS_PER_PAGE;
        if($end>$num_rows)
                $end=$num_rows;

        if ($p == $page)
        {
                $nav .= “{$start} – {$end} | “; // no need to create a link to current page
        }
        else
        {
                $page_url=‘browse.php?p=’.$p;

                $nav .= ‘[](%E2%80%99</div>).$page_url.‘”>’.”{$start} – {$end}”.‘ | ‘;
        }
}

$nav=rtrim($nav,‘ | ‘);

?>

<?php if(empty($nav)) : ?>

<div id=”page-navigation”>
<?php echo $nav ?>
</div>

<?php endif ?>

Note that, after the SELECT query with the LIMIT clause, we make another query using the FOUND_ROWS() function in order to get the number of rows found with the previous query. Then we divide that value by the maximum number of rows per page we want to show, which gives us the number of total pages for this query; using this data, we are able to build a navigation bar.
As an exercise, you could add a sort parameter which will control the order of the results by using the ORDER BY clause.

Simple search form

Now that we have built a table capable of showing torrents, we are ready to implement a way to filter our entries. Here’s the code for our bare bones search form.

In our PHP script we are going to handle the q parameter. As a first example, we are going to try the simplest way to filter rows with MySQL, which is using the LIKE clause on the torrent’s name field. Let’s add this code at the beginning of our script:

<?php
$search=$_GET[‘q’]; $search=[mysql_real_escape_string][9]($search); ?>

And then we edit our main query:

<?php
$q=[mysql_query][1](“SELECT SQL_CALC_FOUND_ROWS name,size,seeders,leechers,ctime FROM xbt_files WHERE name LIKE ‘%”.$search.”%’ LIMIT $offset,”.ROWS_PER_PAGE); ?>

With this code, searching for “pear” will show only torrents that contain the word “pear” in their names. But what if we have a torrent containing “appearance”? It will be shown as well. This is because we put the “%” symbol before and after the search term. If we didn’t do so, our query would have returned only torrents named exactly “pear”, which is definitely something we don’t want.

As we have seen, the LIKE clause is very limited and is not suitable for a real world search engine. So we are going to explore a different way to search in strings: regular expressions.

A step further: RegEx

Regular expressions are a way of identifying strings of text through search patterns written in a formal language. Think of them as a much more advanced version of wildcards. MySQL has a built-in RegEx processor which can be used through the REGEXP function.

In the following snippet we are going to modify our query in order to use regexes for searching. We will also look for spaces in our search keyword and split the words into an array. Additionally, since our query is getting long, it will be split into multiple parts which we’ll eventually concatenate into a new string, so that our code will be a bit easier to read.

<?php

$search=$_GET[‘q’];
$search=mysql_real_escape_string($search);

// split query string into an array of words
$search=explode(‘ ‘,$search);

$sql=”SELECT SQL_CALC_FOUND_ROWS name,size,seeders,leechers,ctime FROM xbt_files “;

// where clause
$sql.=”WHERE name LIKE ‘%”.$search.”%’ “;

// regexp

if(empty($search))
{
        $sql.=‘(’;

        foreach($search as $word)
        {
                $word=trim($word);

                $sql.= “(`name` REGEXP ‘[[:<: />

.$word.”[[:>:]]’ OR `description` REGEXP ‘[[:<: />

.$word.”[[:>:]]’) AND “;
        }

        $sql=rtrim($sql,‘ AND ‘);
        $sql.=‘)’;
}

// limit
$sql.=”LIMIT $offset,”.ROWS_PER_PAGE;

$q=mysql_query($sql);

// … rest of code …

?>

This code will basically search for the specified keyword in the torrent’s name and description through MySQL’s regex engine. “[[:” and “[[:>:]]” are used for word boundaries, so that if you are looking for “lane” it won’t match “miscellaneous”.

Word stemming

There’s a new problem. What if our torrent contains the word “domain” and we are searching for “domains”? In order for this case to work, we need to use the stem of the keyword. Fortunately, this is exactly what the Porter Stemmer algorithm does.

The Porter Stemming Algorithm was developed by Martin Porter for reducing English words to their word stems. For example, the word “connections” would be reduced to its stem form “connect.”

In order to use it, first of all we have to include the library in our PHP file; then it’s simply a matter of adding this code before the call to the explode function.

<?php

require_once ‘class.stemmer.inc’;

// …

$stemmer=new Stemmer();
$search=$stemmer->stem_list($search);

?>

Filtering useless stuff

In a real world search engine, we may want to filter things such as symbols and short words.
So, right after the explode call, let’s add this if statement which will exclude words shorter than 3 characters:

<?php

$words=array();
foreach($search as $word)
{
        if(strlen($word)>=3)
                $words[]=$word;
}

$search=$words;

?>

Then we are ready to deal with useless symbols and punctuation. Let’s add this code right after the mysql_real_escape_string call:

<?php

var $symbols = array(‘/’,‘\’,’\”,‘”’,‘,’,‘.’,‘,‘>’,
‘?’,‘;’,‘:’,‘[‘,’]‘,‘{’,‘}’,‘|’,‘=’,‘+’,‘-’,‘_’,‘)’,‘(’,‘*’,
‘&’,‘^’,‘%’,‘$’,‘#’,‘@’,‘!’,‘~’,‘`’);

for ($i = 0; $i sizeof($this->symbols); $i++)
{
        $search=str_replace($this->symbols[$i],‘ ‘,$search);
}

?>

Conclusion

And that’s it — we are done with our simple and bare bones search engine. In a real world scenario, you will want to add your own custom things; for example in some cases you may want to extend acronyms (”ff” to “final fantasy” and so on), filter a list of custom words and stuff like that. This is all up to you and it depends on what you need for your specific website.

See you in the next tutorial and happy coding!

Article republished with permission from Valadiléne.org - source here.

  1. anonymous Says:

    What is the point f these articles? No one is going to create a new torrent source and if they did then they wouldn’t be following a guide since they wuld want something new and fresh. why don’t you teach us how to code for tbdev or gazelle instead!

  2. iNDEx Says:

    Yeah !

    Thanks ! ;)

  3. R3P71L14_C0R4X Says:

    @01, as far as these go its just like sharing the source code for tbdev or gazelle, both of which are neither new or fresh.

    on top of this they also have installers, so if you want to learn how to code for “tbdev or gazelle” just read a book or watch a video on PHP, then start coding.

    heck, you dont even need to know how to code, just browse the respective forums for mods, once installed, then copy and paste. did i mention they also offer free support?

    we dont need sharky to cover those.

  4. PwnMyPic Image Host Says:

    @ 01 - Point of these is to educate. Knowledge is power my friend, and even if it looks pointless, its knowlege that is worth having. :) Someone will find this very useful.

  5. Velocityy Says:

    @01 and 03, sharky didn’t post this. Hangman did.

  6. nnnnnn Says:

    This type of articles are always boring to read. Something more interesting would be better even though it might be a tough job writing all these.

    Hope part 5 is not coming very soon.

  7. lol Says:

    Man, this is some of the most amateur PHP I have ever seen.

    And LIKE is not a search engine, go implement Lucene like a real man.

    Also, the crusty old mysql_* functions are a pretty clear indicator of someone who doesn’t know what they’re doing. Try using mysqli, or PDO if you’re feeling brave, like someone who knows what they’re doing and isn’t copying out of tutorials.

    I’m quite disappointed in the overall lack of intelligence and quality in this series of posts.

  8. Hangman Says:

    Are you serious man? MySQLi? PDO? Who cares about which functions I’m using to perform the query? That’s just example code!

    Lucene? What’s the point of a search engine tutorial then? I guess you just wanted to show some random hate. Thanks!

  9. LOL Says:

    Well, deploying lucene would be the single most difficult thing in any of your tutorials here. Implying that MySQL LIKE is a search engine is a clear indication of your lack of expertise.

    As far as MySQLi/PDO vs. the crusty old mysql_* functions, it makes a huge difference. Why? OOP interfaces are nice, for one, as well as using prepared statements instead of mres operations every time (or you can rely on magic quotes, lolz) someone inputs anything not only saves you a lot of developer time, it makes your code inherently more secure. And both mysqli/pdo outbench mysql_* impressively.

    If there was any doubt the complete lack of skill you present, the disgusting mixture of markup and business logic removes it.

    There’s yet to be a quality tracker codebase written and I really doubt I’m alone as a developer in regarding that as a hurdle that can’t be overcome in participating in the fun of running a site. I had a look at tbdev (which this seems closer to) and I suppose it isn’t bad for early PHP4 era code, but it’s no good now. And gazelle is almost as bad, if not worse in some ways (like their complete lack of regard for the memory usage of memcached, their over-reliance on it, or their complete lack of regard for the relationality of their database).

    Must say I’m disappointed in you, pirates.

  10. Hangman Says:

    The point of that *example* code is not the mysql_query call, it’s the SQL query which gets executed. The reader can choose the set of database functions he/she likes, but that’s unrelated to this tutorial.
    Sure, MySQLi/PDO outbench mysql_*, so what? Is that important in that *example* code which is about a completely different thing?

    Also, actually in the article I’m saying NOT to use LIKE. I talk about regular expressions later. If you think you can’t make a search engine with that, well, look at this http://vgmtorrents.net/browse.

    Cheers!

  11. E Says:

    Why don’t you start showing us a real tracker source code LOL

  12. bash Says:

    Hey this stuff is fantastic, im doing my 3rd year undergrad project on a bittorrent network and this stuff helps alot. keep it up. when is the next tutorial coming. any other places you would recommend to learn about this subject

  13. Laffin Says:

    Actually the code is nice and bare bones. But I dont like the mixture of functional coding and OOP. I prefer straight functional coding over OOP, but that’s a personal preference stemming from C/C++.
    I think the hate comments, are kinda petty and they dont understand a tutorial/guide isnt meant to teach/show advanced programming skills. It provides for a bare/basic structure to learn from. It does the author no good if the intended audience doesnt understand.

    Anyways, thumbs up, and luck with the rest

Comments