Asp.net - Crawler In C# Or In .net?
Jul 24, 2009i have a link of working crawler sample code written in either c# or in Vb.net
View 3 Repliesi have a link of working crawler sample code written in either c# or in Vb.net
View 3 RepliesI would like to implement a mulithtreaded crawler using the single thread crawler code I have now. Basically I read the urls from a text file, take each one and crawl and parse it. I know how thread basics of creating a thread and assigning a process to it but not too sure how to implement in the following way: I need at least 3 threads and need to assign a url to each thread from a list of urls, and then each needs to go and fetch it and parse it before adding contents to a database.
[Code]...
create a web crawler using vb.net?
View 5 RepliesI'm attempting to do a school project dealing with a simple web crawler. I have a form with a web browser control embedded that loads a web page will all the available courses. I have a text box and a button design for a user to search for a specific course by using the four letter department abbreviation. The page that I have loaded has all the department abbreviations as hyperlinks.is it possible to search the page using the four letter abbreviation specified by the user and if the search finds a corresponding hyperlink open it. Then I would use a loop to repeat the process of opening each class offered by that department and obtaining information such as course name, section number and so forth.
View 2 RepliesI would like to implement a mulithtreaded crawler using the single thread crawler code I have now. Basically I read the urls from a text file, take each one and crawl and parse it. I know how thread basics of creating a thread and assigning a process to it but not too sure how to implement in the following way:
I need at least 3 threads and need to assign a url to each thread from a list of urls, and then each needs to go and fetch it and parse it before adding contents to a database. [Code] Now the code maynot make sense but what I need to do is add a unique url to each thread to go process.
i'm trying to build websites crawler and i having a bit of problem creating recursive function to get all the site link, provide a link to an example ?
View 2 RepliesI have written a multithreaded crawler and the process is simply creating threads and having them access a list of urls to crawl. They then access the urls and parse the html content. All this seems to work fine. Now when I need to write to tables in a database is when I experience issues. I have 2 declared arraylists that will contain the content each thread parse. The first arraylist is simply the rss feed links and the other arraylist contains the different posts. I then use a for each loop to iterate one while sequentially incrementing the other and writing to the database. My problem is that each time a new thread accesses one of the lists the content is changed and this affects the iteration. I tried using nested loops but it did not work before and this works fine using a single thread.
Here is my
SyncLock dlock
For Each rsslink As String In finallinks
postlink = finalposts.Item(i)
[CODE]...
Finallinks and finalposts are the two arraylists. I did not include the rest of the code which shows the threads working but this is the essential part where my error occurs which is basically here postlink = finalposts.Item(i) i = i + 1
ERROR: index was out of range. Must be non-negative and less than the size of the collection. Parameter name:index
I tried copying it to a new list but dosent work.