VS 2008 - Multithreaded Crawler - Each Time A New Thread Accesses One Of The Lists The Content Is Changed
Mar 18, 2010
I have written a multithreaded crawler and the process is simply creating threads and having them access a list of urls to crawl. They then access the urls and parse the html content. All this seems to work fine. Now when I need to write to tables in a database is when I experience issues. I have 2 declared arraylists that will contain the content each thread parse. The first arraylist is simply the rss feed links and the other arraylist contains the different posts. I then use a for each loop to iterate one while sequentially incrementing the other and writing to the database. My problem is that each time a new thread accesses one of the lists the content is changed and this affects the iteration. I tried using nested loops but it did not work before and this works fine using a single thread.
Here is my
SyncLock dlock
For Each rsslink As String In finallinks
postlink = finalposts.Item(i)
[CODE]...
Finallinks and finalposts are the two arraylists. I did not include the rest of the code which shows the threads working but this is the essential part where my error occurs which is basically here postlink = finalposts.Item(i) i = i + 1
ERROR: index was out of range. Must be non-negative and less than the size of the collection. Parameter name:index
I would like to implement a mulithtreaded crawler using the single thread crawler code I have now. Basically I read the urls from a text file, take each one and crawl and parse it. I know how thread basics of creating a thread and assigning a process to it but not too sure how to implement in the following way: I need at least 3 threads and need to assign a url to each thread from a list of urls, and then each needs to go and fetch it and parse it before adding contents to a database.
I would like to implement a mulithtreaded crawler using the single thread crawler code I have now. Basically I read the urls from a text file, take each one and crawl and parse it. I know how thread basics of creating a thread and assigning a process to it but not too sure how to implement in the following way:
I need at least 3 threads and need to assign a url to each thread from a list of urls, and then each needs to go and fetch it and parse it before adding contents to a database. [Code] Now the code maynot make sense but what I need to do is add a unique url to each thread to go process.
My routine to check for updates is run as a separate process. Exiting the application is required to update, so a dialog asks the user, when an update is found, if they want to exit now. f they do, the code (from the update thread) calls Application.Exit().However, if the FormClosed event of any form that needs to be closed needs to access its controls, an invalid cross-thread operation is detected (which sounds pretty logical).
If urlQueue.Count > 0 Then Debug.Assert(currentFarm.isBusy = False) Debug.Assert(currentFarm.WebClient.IsBusy = False)
[CODE]...
The commented code won't work. That's because by the time the thread start, the variable currentFarm have changed. So I create a new function and cache the currentFarm in the function parameter. No there will be a cache of currentFarm address (object is always a pointer) on a stack and that address is the one being passed to the System.Threading.Thread constructor.
OK so after a week of aggravating, hair pulling, disaster of trying to figure out what should be simple tasks but beat me down! Ok so I wanted to create a application that would, somewhat manage the users own collection of video games (Lame i know ) you add your games to a listview item, with an image and some sub items, viewing details, using a textbox to search your games... etc i hope you get it
Ok so what i cannot figure out, or find by combing the internet! is how to save all new items the user adds... And including: the item, all the sub items and the picture thats associated with the item.! Instead of posting the code, i posted the application Source code itself here:Source Code , I also started a project here: Codeplex
Later i eventually was thinkin of ading a "wish list" feature where you could add the games you want to get later and also tag it with a release date so when the date comes up, it will let you know, maybe through some alarming way or textbox somehwere... IF you guys want to make it better or fix it! go ahead, i just want this app to save upon EXIT or directing me in the right direction would be VERY VERY much appreciated!
I want to detect when the user has changed the content of a Textbox.
The following does not work for me, because: When the user navigates through the database (using my navigation buttons), the textbox is also changed, but NOT by the user making input. I would only like to know when the user EDITED the Textbox.
Private Sub mskStudentNumber_TextChanged(ByVal sender As Object, ByVal e As System.EventArgs) Handles mskStudentNumber.TextChanged If Not Movement Then '' If not navigation
how can i allow a progressbar to be changed on a seperate thread? my code keeps saying that i cannot access it because it was created on a seperate thread:
[Code]...
If it is not worth it, it is not fun - you say programmers are boring but i say they are worth it.
I have this code that checks if a thread or page has changed. But it always says it has changed. Because it does. Im currenlty checking the whole html page but i dont know a better way. Hers my code.
Public Class Form1 Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
The code has gone bad. It is in debug mode and jumping around everywhere. Error = "The process or thread has changed since last step."
Here is the code. I just create the file "start.csv" to initialize the shell command. This is very unstable for some reason. When I run line by line in debug mode it starts jumping around irrationally. It gives me versions of the above error.
Option Strict On Option Explicit On Imports System.IO Module Module1 Public WithEvents Tmr As New Timers.Timer [Code] .....
I changed my Username on my computer from i.e. "xxxxxx x xxxxx" to "Dennis"
VB 2008 during a compile gave me the following error: Error reading icon 'C:Usersxxxxxx x xxxxxAppDataRoamingMicrosoftVBExpress9.0VSProjectApplication.ico' -- The system cannot find the path specified.
The file "VSProjectApplication.ico" is in 'C:UsersDennisAppDataRoamingMicrosoftVBExpress9.0'
Somewhere in the VB2008 "configuration" files I believe that I need to manually change the old path to the new path.
I found the old path in the ".suo" file. The ".suo" file is not a text file. How can I edit the .suo file, save it so that it will work?
It is my previous problem in vb6, its hard to multithread using vb6 so I migrate my codes to vb.Net.How can I multithread this two loops so that this process is running at the same time.Because when I start my second Loop, my first loop stop in executing. It will start again when my 2nd loop finished.
[code]...
I have 2 connected GSM Modem in usb port.That codes will send SMS in all contacts in my ListBox.
Is there a way to check if the date time picker has been changed? I have a date time picker set to allow a person to put in their date of birth and I have it set to no greater date than 12/1/1992, but I need to check that the user has actually selected a date before the form is submitted. Can it be checked?
Is there an event that I can handle that is fired when the computers system time is changed?
My program timestamps things, and I want to make sure the time is correct. So when the program starts, I will run a query (SELECT GETDATE()) from my SQL server, and record the difference, in seconds, between the SQL server time, and the time returned from NOW(). Then when I timestamp things, I will just adjust the time. I cannot run a query every time. But, if someone changes the clock on the computer, that would throw the timestamp off. So I need a way to know when this has happened, so I can recalculate the difference.
Notice that everytime the winform first loads, it will trigger 2 comboboxes ( each tabs each comboboxes ) selected index changed event from,
trying to find possible way to stop combobox selected index changed from winform first load, only trigger combobox selected index changed event after selecting the specific tab,
I'm attempting to do a school project dealing with a simple web crawler. I have a form with a web browser control embedded that loads a web page will all the available courses. I have a text box and a button design for a user to search for a specific course by using the four letter department abbreviation. The page that I have loaded has all the department abbreviations as hyperlinks.is it possible to search the page using the four letter abbreviation specified by the user and if the search finds a corresponding hyperlink open it. Then I would use a loop to repeat the process of opening each class offered by that department and obtaining information such as course name, section number and so forth.
I have two lists. One list is a list of names, the other list is a list of how many times each name is found in the first list noted in the database
so... Nick John Jim Jack
is the firs tlist
10 13 13 2
is the second list. Nick had 10, john 13 and so on.I want to sort the second list from large to small, but have the index for the first list still linked to the correct amount of calls.i do it this way so I can
for x = 0 to num_of_names string = lst_name(x) & "-" & lst_count(x) next x
I am redesigning an application that I have previously written which accesses an SQL database, pulls records, creates an object hierarchy based on a bunch of custom classes that I created, and displays them in a TreeView control. When the user selects a node, a tabbed page displays the information about the object underlying the TreeNode selected.
I have since begun experimenting with Databinding to my Objects as I have recently redigned the objects to communicate directly with the datarow that each object is based on instead of storing the values in a private variable thus:
[Code]...
Allowing the databinding management objects to do the heavy lifting of getting the values, displaying them in the controls and writing new values as needed back to the objects (and thus the DataSet) i think is preferable. How can I accomplish this? I have added the Objects as datasources to the DataSources window, added the objects elements to the tabbed pages. How do I bind those controls to the item that is specified by selecting the TreeNode?
im trying to terminate a threading.thread using thread.abort the thread runs a download connection so is usually in the middle of socket.recieve or socket.send or socket.connect when aborting i just want to terminate the thread no matter what thread.abort raises an MDA excpetion. so i ticked it off in the debug exceptions menu. now it doesnt raise an exception, but the thread simply wont terminate. my program wont close unless i press top in the debugger.i cant pause downloads because i cant terminate the thread this code wasn't working?
Dim vT As Threading.Thread For Each vT In clsDownloader.DownloadThreads If Not vT Is Nothing Then vT.Abort()
[code].....
as for the exception concerning threads being aborted from other threads being dangerous, how is it possible to send a message from a main thread to another thread to abort itself? as far as i know the only way to a abort one thread from another is to just kill it (because i dont know of ways for a thread to communicate with another).
i'm trying to build websites crawler and i having a bit of problem creating recursive function to get all the site link, provide a link to an example ?
I'm firing a thread to unzip a file. I need some way of indicating to the parent that the unzip thread has ended. I've tried raise event in the unzip thread to a sub in the parent but this runs the sub from the unzip thread. With ApartmentState set to STA, so that I could use a save dialog, this was OK until I displayed a messagebox which was hidden behind the main UI(and whatever I did I could not get it to the front).
i am having a thread to keep the application in sleep for some time after the application process a data.
[Code]....
Currently, the application gets hanged if i press close button when thread.sleep() is running. How can i make the close button to close the application even when the thread.sleep is running.
I have a VB.NET solution, just upgraded from 3.5 to 4.0. One of the classes has a private field:
Private _Projection As ICalculatedPath.At runtime, any time the class containing that field accesses that field, I get a FieldAccessException. The first time that field happens to get accessed is a null check in a method, and one of the things I randomly tried is changing the above line to:
Private _Projection As ICalculatedPath = Nothing.When I do this, I get the FieldAccessException on that line saying that the class's .ctor() cannot access that field. I've also tried making the field protected and public, clean/rebuild solution, restarting VS, targeting x86 and .NET 4.0 specifically on every project in the solution, and other non-sensical measures to get rid of this Exception but to no avail. This code worked fine before the upgrade, of course.