<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.1" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: PHP Spam detection project</title>
	<link>http://www.thyphp.com/php-spam-detection-project.html</link>
	<description>All about your PHP</description>
	<pubDate>Wed, 08 Sep 2010 16:56:01 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.1</generator>
		<item>
		<title>By: Irvin</title>
		<link>http://www.thyphp.com/php-spam-detection-project.html#comment-1776</link>
		<dc:creator>Irvin</dc:creator>
		<pubDate>Wed, 07 Jan 2009 20:06:50 +0000</pubDate>
		<guid>http://www.thyphp.com/php-spam-detection-project.html#comment-1776</guid>
		<description>8s9NDGxNCfL0K</description>
		<content:encoded><![CDATA[<p>8s9NDGxNCfL0K</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David</title>
		<link>http://www.thyphp.com/php-spam-detection-project.html#comment-1774</link>
		<dc:creator>David</dc:creator>
		<pubDate>Sun, 28 Dec 2008 00:35:51 +0000</pubDate>
		<guid>http://www.thyphp.com/php-spam-detection-project.html#comment-1774</guid>
		<description>Thanks for the PHP libraries - I need to implement something like this and your scripts will help in looking into text algorithms.</description>
		<content:encoded><![CDATA[<p>Thanks for the PHP libraries - I need to implement something like this and your scripts will help in looking into text algorithms.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andries Louw Wolthuizen</title>
		<link>http://www.thyphp.com/php-spam-detection-project.html#comment-1760</link>
		<dc:creator>Andries Louw Wolthuizen</dc:creator>
		<pubDate>Fri, 15 Aug 2008 23:31:52 +0000</pubDate>
		<guid>http://www.thyphp.com/php-spam-detection-project.html#comment-1760</guid>
		<description>Oh, and please, please, don't make this system a "Wordpress-only-plugin", but make it a function that can be implemented in every system. 

It would also help to make the knowledge-database file universal, so that anyone could write a script in his programming language that uses this file. Because it is not easy to parse a PHP-serialized array with (for example) ASP.

I would personally use the function by storing all messages (like Akismet does) in a queue, and check 5-10 messages per minute on spam with a cronjob to keep the load low.</description>
		<content:encoded><![CDATA[<p>Oh, and please, please, don&#8217;t make this system a &#8220;Wordpress-only-plugin&#8221;, but make it a function that can be implemented in every system. </p>
<p>It would also help to make the knowledge-database file universal, so that anyone could write a script in his programming language that uses this file. Because it is not easy to parse a PHP-serialized array with (for example) ASP.</p>
<p>I would personally use the function by storing all messages (like Akismet does) in a queue, and check 5-10 messages per minute on spam with a cronjob to keep the load low.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andries Louw Wolthuizen</title>
		<link>http://www.thyphp.com/php-spam-detection-project.html#comment-1759</link>
		<dc:creator>Andries Louw Wolthuizen</dc:creator>
		<pubDate>Fri, 15 Aug 2008 23:15:10 +0000</pubDate>
		<guid>http://www.thyphp.com/php-spam-detection-project.html#comment-1759</guid>
		<description>Excellent idea! I was thinking of developing the same concept, I manage over 100 sites, and (almost) everyone of them has a guestbook, comments, or some other user-input forms that need spam checking. 

My idea was to let users report spam that passed the detection, send it to an central interface so I could double-check it, and that I could edit the rules for spam detection in a file that every website downloaded periodically.

The power of Akismet is the central server, that spammers don't know their algorithm, and that you can update spam-definitions frequently and fast, but it is, at the same time, a big disadvantage, because making a connection to Akismet (or any other not-in-your-lan server) is slow (Sander already menthioned that).

You could start simple by providing a central, downloadable, knowledge-database via a big-free-host like Sourceforge, Google Code, or something else, so that we can check on updates, download them, and use them in combination with your script.

Later you can add an option to report something as spam/ham, but you'll have to find a method that keeps the size of transfers low, and not to frequent. 

Maybe it can be achieved by doing the system in reverse, we provide url's to our ratings on messages, stored in an http-accessible file, and your server can download them whenever you want and/or have time to evaluate them. 

An big advantage of on-demand getting ratings from your users is that your server can decide when he has the resources and time to download (and parse) them. 

If you let users push the ratings, your server will get enormous amounts of requests to handle on the "peak-times" of internet.</description>
		<content:encoded><![CDATA[<p>Excellent idea! I was thinking of developing the same concept, I manage over 100 sites, and (almost) everyone of them has a guestbook, comments, or some other user-input forms that need spam checking. </p>
<p>My idea was to let users report spam that passed the detection, send it to an central interface so I could double-check it, and that I could edit the rules for spam detection in a file that every website downloaded periodically.</p>
<p>The power of Akismet is the central server, that spammers don&#8217;t know their algorithm, and that you can update spam-definitions frequently and fast, but it is, at the same time, a big disadvantage, because making a connection to Akismet (or any other not-in-your-lan server) is slow (Sander already menthioned that).</p>
<p>You could start simple by providing a central, downloadable, knowledge-database via a big-free-host like Sourceforge, Google Code, or something else, so that we can check on updates, download them, and use them in combination with your script.</p>
<p>Later you can add an option to report something as spam/ham, but you&#8217;ll have to find a method that keeps the size of transfers low, and not to frequent. </p>
<p>Maybe it can be achieved by doing the system in reverse, we provide url&#8217;s to our ratings on messages, stored in an http-accessible file, and your server can download them whenever you want and/or have time to evaluate them. </p>
<p>An big advantage of on-demand getting ratings from your users is that your server can decide when he has the resources and time to download (and parse) them. </p>
<p>If you let users push the ratings, your server will get enormous amounts of requests to handle on the &#8220;peak-times&#8221; of internet.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: buggedcom</title>
		<link>http://www.thyphp.com/php-spam-detection-project.html#comment-1758</link>
		<dc:creator>buggedcom</dc:creator>
		<pubDate>Fri, 15 Aug 2008 18:02:19 +0000</pubDate>
		<guid>http://www.thyphp.com/php-spam-detection-project.html#comment-1758</guid>
		<description>an additional way to check for spam is to check url links against a blacklist, add this to the ngram based detection and this could eventually get as good as askimet.</description>
		<content:encoded><![CDATA[<p>an additional way to check for spam is to check url links against a blacklist, add this to the ngram based detection and this could eventually get as good as askimet.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sander</title>
		<link>http://www.thyphp.com/php-spam-detection-project.html#comment-1757</link>
		<dc:creator>Sander</dc:creator>
		<pubDate>Tue, 12 Aug 2008 08:13:52 +0000</pubDate>
		<guid>http://www.thyphp.com/php-spam-detection-project.html#comment-1757</guid>
		<description>May this idea become real feel free to contact me! I'm the host of a large forum community (own written software) and we're battling against spam! Main problem is that servers like askimet are to slow to check every forum post/reply on spam so we need fast inhouse software..</description>
		<content:encoded><![CDATA[<p>May this idea become real feel free to contact me! I&#8217;m the host of a large forum community (own written software) and we&#8217;re battling against spam! Main problem is that servers like askimet are to slow to check every forum post/reply on spam so we need fast inhouse software..</p>
]]></content:encoded>
	</item>
</channel>
</rss>
