Spam is the posting of advertisements, abusive, or unneeded messages on Internet forums. It is generally posted by automated spambots.
Types of spam
Most spambot forum spam consists of links, with the dual goals of increasing search engine
visibility in highly competitive areas such as weight loss, pharmaceuticals, gambling, pornography, real estate or loans, and generating more traffic for these commercial websites. Some of these links contain code to track the spambot's identity if a sale goes through, when the spammer behind the spambot works on commission.
Spam posts may contain anything from a single link, to dozens of links. Text content is minimal, usually innocuous and unrelated to the forum's topic. Full banner advertisements have also been reported.
Alternately, the spam links are posted in the user's signature, in which case the spambot will never post. The link sits quietly in the signature field, where it is more likely to be harvested by search engine spiders than discovered by forum administrators and moderators.
Recently, a very destructive forum spam attack has been propagated by inserting into comments redirect domains with an automated posting script like Xrumer. These domains redirect a user to pornographic Websites. If a user clicks on the image or attempts to close the Website an ActiveX codec will be downloaded as a Zlob Trojan.
Effects of spam
Spam prevention and deletions measurably increase the workload of forum administrators and moderators. The amount of time and resources spent keeping a forum spam free contributes significantly to labour cost, and the skill required in the running of a public forum. Marginally profitable or smaller forums may be permanently closed by administrators. Forums that do not require registration are becoming rare.
- Flood control: This forces users to wait for a short interval between making posts to the forum, thus preventing spambots from flooding the forum with repeated spam messages.
- Registration control:
- Some forums employ CAPTCHA (visual confirmation) routines on their registration pages to prevent spambots carrying out automated registrations. Simple CAPTCHA systems which display alphanumeric characters have proven vulnerable to optical character recognition software but those that scramble the characters appear to be far more effective.
- Alternative is Textual Confirmation, promoted by bbAntiSpam: user should answer a random question to prove he/she isn't a spambot.
- Authoritative voice: Using an external filtering service, such as Akismet, to get a verdict if the data is spam or not.
- Posting limits: Limit posting to registered users and/or require that the user pass a CAPTCHA test before posting.
- Registration restrictions: Applying careful restrictions can seriously impact bogus and spambot registrations. One approach consists in the denial of registration from certain domain extensions that are a major source of spambots such .ru, .br, .biz, or freebase addresses such as "gawab.com". Another, more labor-intensive, consists in manual examination of new registrants. This examination looks at several indicators. First, spambots often delay email confirmation by several hours, while humans will confirm promptly. Second, spambots will tend to create user names that are unique, and unlikely to already be used in the forum, preferring "John84731" or "JohnbassKeepsie" to the much more common "John." Third, using a search engine to investigate, one finds hundreds, if not thousands of profiles using the spambot login name, sometimes with the diagnostic spam post, or "banned" label.
- Changing technical details of the forum software to confuse bots - for example, changing "agreed=true" to "mode=agreed" in the registration page of phpBB.
- Block posts or registrations that contain certain blacklisted words.
- Be wary of IPs used by untrusted posters (anonymous posts or newly registered users). A useful technique for proactive detection of well-known spammer proxies is to query a search engine for this IP. It will show up on pages that specialize in the listing of proxies.
- Some forums also have their own "spam subforums" to direct spam off their main site.
- Some forums have the signature option disabled.
Causes of page widening (sometimes called page stretching or just stretching) include:
- a wide image;
- a very long string of characters without breaks;
- a long line with the specification that the browser should not break it (for instance, use of the HTML tags <pre> or <nobr>);
- a table with many columns, in particular if columns contain a long word (the minimum width of a column is the width of the longest word in it);
- a table where the HTML specifies a large width.
The author of a web page may have failed to consider that the user:
- may have a lower screen resolution
- may be using a larger font
- may be viewing several pages in more than one window at the same time
- may be using a PDA
- may be using a mobile phone.
All these may cause a wide page requiring horizontal scrolling.
Page widening by trolls
Page widening is done by internet trolls on many message boards and forums, for example, Slashdot. This form of troll causes a web page to widen to a ridiculous width, to the point where one
cannot read the text without constantly scrolling left and right.
The first true page widening was an accident. Someone posted a UNIX directory listing.
Slashdot implemented a fix for this page widening, which was mostly known for affecting HTML display in Internet Explorer and Netscape browsers, but only after a considerable time had passed. Specifically, Internet Explorer's word-wrap code would not break a line before a word starting with a period and would place all the words on one line and thus widen the page. The then "alternative" browser, Opera, was not affected.
This exploit relies on the fact that, when properly implemented, some characters "prohibit line break before" them, as per the Unicode specification A fix to this problem also exists for phpBB
Less than a week later, a new widening troll appeared.
That widener was also fixed, by a filter that automatically inserts a space into postings after a certain number of consecutive characters. This is a source of constant frustration to users who post working URLs or segments of code that are automatically broken when they hit submit.
(However, this filter does not affect the contents of Slashdot's link tags;
because they do not appear on screen, they cannot widen the page.
The filter does not touch them, and unless the target rejects visitors coming from Slashdot, they link properly.)