VS 2010 : Search For Strings Within Large Text Blocks (e.g. HTML Source)?
May 3, 2012
I'm developing an app for WP7 and Win7 that will get info extracted directly from particular websites. The app will download the HTML source and parse through it to find the required strings. The strings may not have tags. note multiple instances of the string needs to be found. I've tried a few very rudimentary ways, and although they work, they are extremely slow.
I am trying to parse a very large text file for certain strings. The text file is part of a level-making software for an old game I play. The text file basically contains all the information the level designer software needs, but the only important bit is the 'texture information'. Basically what I'm trying to create is a little program that parses the text files and shows the user a list of every texture in that text file. The problem is, the strings denoting textures are not really easy to find, and I can't think of any sensible and fast way to get them...
I'm trying to have it so theres a big list of items for example:
item1 item2 item3
[Code]....
I was thinking a list box? but I'm not all too sure on how to use one. I'd like the user to be able to scroll down through the list OR use a search. How could I search the list?(Id use a textbox as what to search for and a button to search the list.)
My text file is a settings file and I know I will be adding more to it in the future, right now it reads "autoplay=false;mintray=false;"I am currently using probably the most inefficient way to check both of those settings:
if contents = "autoplay=false;mintray=false;" then elseif contents = "autoplay=false;mintray=true;" then elseif contents = "autoplay=true;mintray=false;" true elseif contents = "autoplay=true;mintray=true;" endif
How can I read the file, find each word from the equals sign to the semicolon, and store each of them in a variable identified by the text before the equals sign? Something like this:
Also, how can I edit the settings in the text file without having to overwrite everything every time I save to it? For instance; finding mintray in the text file, and changing mintray to true instead of overwriting the file with "autoplay=false;mintray=true;".
This is my current
Reading Dim fs As New FileStream("C:myfile.txt", FileMode.Open, FileAccess.Read) Dim d As New StreamReader(fs)
I am working on a program to connect to a site by using httpwebrequest methood. I want to know how to get the html tag using with httpwebrequest? The html tag I want to grab the data from the page to input in the listview is:
<p id='myid'></p><p id='myid'>The name of the data</p>
Here's the full code:
Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load 'Address of URL Dim URL As String = "http://www.mysite.com/" Dim request As HttpWebRequest = CType(WebRequest.Create(URL), HttpWebRequest)
Couldn't think of a better title.(Background on the problem/me)Okay, so, first question/post here, so hi. Now that that's done with, the information pertinent to my problem. I'm fairly new to VB (and programming as well, aside from screwing around with C++ and learning assembler(well, attempting is the better word) god knows how many years ago), and have only seriously been programming for about under half a year, and my skill level is about at that stage. Only been using VB.Net, nothing older. Depending on the time of day and if I'm home or at school, I fluxuate between VB express and Vis Studio 08. Umm, this program I'm having trouble with was on a test that I took yesterday (took the problem home with me cause I really wanted to figure out what was wrong with it).
The stipulations of the test were: No For->Each Loops No using Built-In Sorting or Searching Functions
I need a bit of advice. I am working on my project with the listbox. I know how to create a source to get the strings from the text file using ReadLine method, but I have no idea how to split the strings in the text file by on the third commas and ignore the other strings while select with the matched strings on the listbox for each row.Here it is an example listbox Here it is an example items on the listbox
random item 1 random item 2 random item 3 random item 4 random item 5
And here it is an example text file.
example strings one, any strings 1, any random strings 1, other strings 1, final end of strings 1 example strings two, any strings 2, any random strings 2, other strings 2, final end of strings 2 example strings three, any strings 3, any random strings 3, other strings 3, final end of strings 3 example strings four, any strings 4, any random strings 4, other strings 4, final end of strings 4 example strings five, any strings 5, any random strings 5, other strings 5, final end of strings 5
The listbox is display the list of an example items, so if I select the "random item 1" item on the listbox, it read the strings through in the text file on the first line to get the strings on the third commas in the same line which it is (any random strings 1) to split it while ignore the other strings (other strings 1, final end of strings 1) that come fourth and fifth commas. So, if I select the "random item 2" item on the listbox, it read the the strings through in the text file on the second line to get the strings on the third commas in the same line which it is (any random strings 2) to split it while ignore the other strings (other strings 2, final end of strings 2) that come fourth and fifth commas and so on...That is what I am trying to achieve by select the each listbox item to get the correct strings for each line in the text file that come on the third commas while to ignore the other strings.
Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted Dim PageElements As HtmlElement = WebBrowser1.Document.GetElementById("rso") TextBox2.Text = TextBox2.Text & PageElements.InnerText & Environment.NewLine End Sub
This may sound really stupid but I have to ask cause I'm not finding this answer anywhere.I have an application where the user will need to sign up for a new user account on the website [URL]..However when I am using Firefox's plug-in Firebug to view html I am getting something totally different than when I just right click on the site and view the page source.
What I am trying to do is to get the captcha from the website and display it in a picturebox on the application so the user can view the captcha, solve the captcha and then the app post is back to the service for a response.
Here is the source that I am getting using Firefox's Firebug to inspect the element:
<td> <input type="hidden" value="Oo3Jo1I8bgzK68agMqo3s79ZZib2OkbK" name="iden"> <img class="capimage" src="/captcha/Oo3Jo1I8bgzK68agMqo3s79ZZib2OkbK.png" alt="i wonder if these things even work"> </td>
[Code]...
Why would the two be showing me two different versions of the HTML?
And how would you be able to grab that source to view in a picturebox using webclient?
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim StrInput As String = Display.Text Dim firstInteger, secondInteger As Integer firstInteger = StrInput.IndexOf("ad_list_link", 0) secondInteger = StrInput.IndexOf("ad_list_link", firstInteger)
[Code]...
I need to string z from a webpage source file but having trouble cutting the code around it away.
Quite simple but I'm having the hardest times trying to write something that'll do what I want. I have 1 listbox with 3 pre-defined strings each in their own index. I'm trying to search through a textbox to see if any of the strings in my listbox were found. So far I have no compatible code so I seek help from the professionals.
I'm trying to extract the text fields inbetween the code but the text is always changing so I'm not sure how to keep this dynamic. In put them in to the proper text boxes.
So text box 1 might be Date: then it pulls the date.
and there are multiple listings. so I need it to loop until the end of </table>
[code]The two parts I've coloured red change, I need to grab the first part which is the link but I'm not sure how to do this. I've used regex before and it doesn't look possible to use it on this on this, there's about 25 of these in the source.
I decided it's time to migrate from VB6 to VB.NET and am currently translating my radio player but have run into several issues so I might be on this forum for the rest of the day... Anyway, in VB6 I used Inet to get the source of a PHP page that had the current song that was playing on the SHOUTcast stream. I can't seem to get Inet to work and figured I would just look for another method so I wouldn't need to pack msinet.ocx along with the finished product.
How can I get the source of the PHP page and store it in a variable or textbox/label.
I got a quick question, I am building an ASP.NET website and have a database set up (SQL Server 2008). One one of my pages I am displaying all of all of the entries of the database with a search text box at the top. How would I go about taking the value that the user entered and using that string to display only the data entries that apply to it. Here is my code for the .aspx file
But when I use WebClient.DownloadString to read the source code to a textbox, I only get this:
<div id="webResults"> </div>
There's nothing. All of the webresults have been removed.How come I can view the code in my internet browser but not on my application?? I even used an InStr method to confirm that the results weren't contained in the generated code.
I have a simple HTML viewer and I would like to add the capability of searching an open document for a text value I specify. Below is an example I found on MSDN. VB gives me an error, "Selection" is not a member of "System.Windows.Forms.Application".
Private Sub SelectionFind() Dim findText As String = "find me" Application.Selection.Find.ClearFormatting() If Application.Selection.Find.Execute(findText) = True Then
[code]....
What I would like to search is the content of a WebBrowser control.
how to replace the html code numbers with the correct ones? i would show you example of html output, but vbforum automatically converts the characters so no point. i wish the replace all the & #40; (without the space) and so on with their correct replacement eg, ( in this case. also would like a short way to do this as i will be using this multiple times. so basically i would like the source to be exactly as it would if you viewed source in firefox browser, not with all the special chars unformatted like visual studio does.
Using a C+ DLL i can search a directory of 12000 blocks (512 byte records) in .46 seconds. That's moving.I can search it with the following code:
Imports System.IO.DirectoryInfo
Imports Microsoft.VisualBasic.FileIO
Imports System.Globalization[code]....
My question is, that's pretty fast. Why the enormous speed differential between C+ and VB? "Becase VB is managed code", doesn't really answer the question.
Way to space out the source code of a web page, having each tag on one line, without having to search for each tag ending and then making a new line after.
I am trying save a value from an input tag in some HTML source code. The tag looks like so:
<input name="user_status" value="3" />
I have the page source in a variable (pageSourceCode), and need to work out some regex to get the value (3 in this example). I have this so far: [Code] Which works fine most of the time, however this code is used to process source code from multiple sites (that use the same platform), and sometimes there are other attributes included in the input tag, or they are in a different order, eg:
I recently had to use my "String converter" to convert 20 lines of text to a single line of "VB .Net String coding". For example:
This is line one This is line two This is line three "with extra stuff"
[Code]...
"This is line one" & vbCrLf & "This is line two" & vbCrLf & "This is line three " & chr(34) & "with extra stuff" & chr(34) & vbCrLf & vbCrLf & "Empty line above me"What is the best way to represent these types of Strings? For example, if you have to display a long message or just a label with information that changes.I was thinking of some sort of text file collection, but it is a little useless to have 100 text files of information of 5-6 lines.
Imagine there is a very large html file with of course lots of html tags. I cannot load the entire file into memory. My intention is to extract all indexes for this <p> and this </p> strings. How should I achieve it?
1. My program has to read text out of a text file. I use some HTML tags to preserve the formatting. For example, I might have the line "this is a <b> line </b> of <b> text </b>" where "line" and "text" are bold. How do I make it so the string prints to a RichtextBox, but only "line" and "text" are bold? I would use a RichTextbox.SaveFile method, but the program works by reading a group of richtextboxes in a flowlayoutpanel and appending them to a single text file.
2. How would I extract the text from between two strings? For instance, I have created special tags for use in my program. These tags tell the program where to add controls. Say my string was:
"[IMG= "dog.jpg" /] this is a picture of a dog. [IMG= "cat.jpg"] this is a picture of a cat."
For each occurrence of the string "[IMG=", I would need it to find the corresponding "/]" and extract the text between the two. I could maybe do something with a substring function. I don't know.
I have two array list same size, depending on the information gathered by previous functions. The size of the arrays range from 2 - 45 in length, both arrays always have the same length. I am trying to match one string in one array to another string in the second array. When they match then add Item to List.
Here is my Do Until i = Arraylenght info = Replace(myAL(s), " ", "") SortedArrayList(m) = Replace(SortedArrayList(m), " ", "") SortedLine = Split(SortedArrayList(m), "Price=") If myAL(s).Contains(SortedLine(1)) Then [Code] .....
This code works up to an array of not more then 4 in lenght, when working larger size array then 4, the minute it get to 5 I get this Error: Index was out of range. Must be non-negative and less than the size of the collection. Parameter name: index
Ok, so say the string is: <a href "[URL]">abc</a> And I just want to get the text "abc" from between the <a href "[URL]"> and the </a>. How would I do so? I dont think I can use the split function, can I?
I am trying to implement a webservice but I am receiving this error :Client found response content type of 'text/html', but expected 'text/xml'.The request failed with the error message:Quote:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml">