<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Scrape Performance Tweaks</title>
	<atom:link href="http://blog.verselogic.net/archives/2004/scrape-performance-tweaks/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.verselogic.net/archives/2004/scrape-performance-tweaks/</link>
	<description>The personal blog of Alan J Castonguay.</description>
	<pubDate>Sat, 22 Nov 2008 16:39:58 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7-hemorrhage</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: ooklah</title>
		<link>http://blog.verselogic.net/archives/2004/scrape-performance-tweaks/#comment-26</link>
		<dc:creator>ooklah</dc:creator>
		<pubDate>Wed, 03 Nov 2004 18:10:55 +0000</pubDate>
		<guid isPermaLink="false">http://blog.b4k4.ath.cx/archives/2004/10/20/scrape-performance-tweaks/#comment-26</guid>
		<description>Your downloading all of the mt forums.... what possible use does this have and why?</description>
		<content:encoded><![CDATA[<p>Your downloading all of the mt forums&#8230;. what possible use does this have and why?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: codepoetica</title>
		<link>http://blog.verselogic.net/archives/2004/scrape-performance-tweaks/#comment-19</link>
		<dc:creator>codepoetica</dc:creator>
		<pubDate>Thu, 21 Oct 2004 15:49:51 +0000</pubDate>
		<guid isPermaLink="false">http://blog.b4k4.ath.cx/archives/2004/10/20/scrape-performance-tweaks/#comment-19</guid>
		<description>Looking for breaks in consecutive numbering I deemed complex a routine, so I opted for something a little simpl'r, if a bit more processor-hungry.

Before downloading possible-thread-X, I look through the list of posts I've already downloaded. If there is a post with ID number X there already, then it can't be an undownloaded, actually existing thread. Thus, I skip it, and incremenent X until this test fails.

This doesn't indicate an actual thread number, as there could be threads that &lt;em&gt;were&lt;/em&gt; there, but have been deleted, but it reduces the number of unnessessary downloads by not pulling down pages that are already known to contain no threads.</description>
		<content:encoded><![CDATA[<p>Looking for breaks in consecutive numbering I deemed complex a routine, so I opted for something a little simpl&#8217;r, if a bit more processor-hungry.</p>
<p>Before downloading possible-thread-X, I look through the list of posts I&#8217;ve already downloaded. If there is a post with ID number X there already, then it can&#8217;t be an undownloaded, actually existing thread. Thus, I skip it, and incremenent X until this test fails.</p>
<p>This doesn&#8217;t indicate an actual thread number, as there could be threads that <em>were</em> there, but have been deleted, but it reduces the number of unnessessary downloads by not pulling down pages that are already known to contain no threads.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jay Random</title>
		<link>http://blog.verselogic.net/archives/2004/scrape-performance-tweaks/#comment-18</link>
		<dc:creator>Jay Random</dc:creator>
		<pubDate>Thu, 21 Oct 2004 07:10:02 +0000</pubDate>
		<guid isPermaLink="false">http://blog.b4k4.ath.cx/archives/2004/10/20/scrape-performance-tweaks/#comment-18</guid>
		<description>&lt;em&gt;Thus, we can start downloading threads numbering at the first number greater than the last thread downloaded, which is not found in the local list.&lt;/em&gt;

Are the postnumbers within threads really all consecutive?  That would be impressive, considering that they weren't under UBB.  Do you scan for breaks in consecutive numbering in the local list before setting the the highest postnumber therein as the next potential thread number to retrieve?

Or am I mis-parsing how you described your shortcut?</description>
		<content:encoded><![CDATA[<p><em>Thus, we can start downloading threads numbering at the first number greater than the last thread downloaded, which is not found in the local list.</em></p>
<p>Are the postnumbers within threads really all consecutive?  That would be impressive, considering that they weren&#8217;t under UBB.  Do you scan for breaks in consecutive numbering in the local list before setting the the highest postnumber therein as the next potential thread number to retrieve?</p>
<p>Or am I mis-parsing how you described your shortcut?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Plains in the Dreaming &#187; Scrape Progress: 100%!</title>
		<link>http://blog.verselogic.net/archives/2004/scrape-performance-tweaks/#comment-25</link>
		<dc:creator>Plains in the Dreaming &#187; Scrape Progress: 100%!</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.b4k4.ath.cx/archives/2004/10/20/scrape-performance-tweaks/#comment-25</guid>
		<description>[...] nder:  	coding 	cwdb &#8212; codepoetica @ 10:28h  	 	 			We have attained completion. Our &lt;a href="http://b4k4.ath.cx/wordpress/archives/2004/10/20/scrape-performance-tweaks/"&gt;goal &lt;/a&gt; of downloading the first 1.66 Million (possibly existing) forum [...]</description>
		<content:encoded><![CDATA[<p>[...] nder:  	coding 	cwdb &#8212; codepoetica @ 10:28h  	 	 			We have attained completion. Our <a href="http://b4k4.ath.cx/wordpress/archives/2004/10/20/scrape-performance-tweaks/">goal </a> of downloading the first 1.66 Million (possibly existing) forum [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
