Asp.net - Crawler In C# Or In .net?

Jul 24, 2009

i have a link of working crawler sample code written in either c# or in Vb.net

View 3 Replies


ADVERTISEMENT

Implementing Multithreaded Crawler Using The Single Thread Crawler Code

Feb 1, 2010

I would like to implement a mulithtreaded crawler using the single thread crawler code I have now. Basically I read the urls from a text file, take each one and crawl and parse it. I know how thread basics of creating a thread and assigning a process to it but not too sure how to implement in the following way: I need at least 3 threads and need to assign a url to each thread from a list of urls, and then each needs to go and fetch it and parse it before adding contents to a database.

[Code]...

View 4 Replies

Create A Web Crawler?

Jan 18, 2010

create a web crawler using vb.net?

View 5 Replies

Creating A Simple Web Crawler?

Oct 21, 2010

I'm attempting to do a school project dealing with a simple web crawler. I have a form with a web browser control embedded that loads a web page will all the available courses. I have a text box and a button design for a user to search for a specific course by using the four letter department abbreviation. The page that I have loaded has all the department abbreviations as hyperlinks.is it possible to search the page using the four letter abbreviation specified by the user and if the search finds a corresponding hyperlink open it. Then I would use a loop to repeat the process of opening each class offered by that department and obtaining information such as course name, section number and so forth.

View 2 Replies

Implement Multithreaded Crawler - Add A Unique Url To Each Thread To Go Process

Feb 1, 2010

I would like to implement a mulithtreaded crawler using the single thread crawler code I have now. Basically I read the urls from a text file, take each one and crawl and parse it. I know how thread basics of creating a thread and assigning a process to it but not too sure how to implement in the following way:

I need at least 3 threads and need to assign a url to each thread from a list of urls, and then each needs to go and fetch it and parse it before adding contents to a database. [Code] Now the code maynot make sense but what I need to do is add a unique url to each thread to go process.

View 4 Replies

Website Crawler Creating Recursive Function To Get All The Site Link

Aug 26, 2009

i'm trying to build websites crawler and i having a bit of problem creating recursive function to get all the site link, provide a link to an example ?

View 2 Replies

VS 2008 - Multithreaded Crawler - Each Time A New Thread Accesses One Of The Lists The Content Is Changed

Mar 18, 2010

I have written a multithreaded crawler and the process is simply creating threads and having them access a list of urls to crawl. They then access the urls and parse the html content. All this seems to work fine. Now when I need to write to tables in a database is when I experience issues. I have 2 declared arraylists that will contain the content each thread parse. The first arraylist is simply the rss feed links and the other arraylist contains the different posts. I then use a for each loop to iterate one while sequentially incrementing the other and writing to the database. My problem is that each time a new thread accesses one of the lists the content is changed and this affects the iteration. I tried using nested loops but it did not work before and this works fine using a single thread.

Here is my

SyncLock dlock
For Each rsslink As String In finallinks
postlink = finalposts.Item(i)

[CODE]...

Finallinks and finalposts are the two arraylists. I did not include the rest of the code which shows the threads working but this is the essential part where my error occurs which is basically here postlink = finalposts.Item(i) i = i + 1

ERROR: index was out of range. Must be non-negative and less than the size of the collection. Parameter name:index

I tried copying it to a new list but dosent work.

View 9 Replies







Copyrights 2005-15 www.BigResource.com, All rights reserved