I just spent about 2 hours searching this forum on this topic but I need some advice. I am looking to extract certain data from HTML source code that I have down loaded into a text file its about 9KB in size.I am looking to keep all email address found. How would this work or what would be the best method to use? This is what I would like to extract and write to another file:
I would like to be able to parse vb.net code files, so I can examine the collection of Subs, Functions (and their contents, including comments), private variables, etc. I can be open the actual source code files. So for example, if I have:
So the following code will return the version number which currently is 6.59 which is what I'm after. [Code] But then i remembered that releases are done as following: 6.59, 6.59b, 6.59c, 6.60, 6.60b etc. So when the b version of 6.59 is released the parser will still return 6.59. So how can i make this code better?
[code]The two parts I've coloured red change, I need to grab the first part which is the link but I'm not sure how to do this. I've used regex before and it doesn't look possible to use it on this on this, there's about 25 of these in the source.
I have used examples from threads here on how to open and convert word documents to html in order to parse them. I got it all working great using the office interop library but used an example word document with some text in it and it worked fine. Now with actual word documents that I need to parse that come in all types of formatting and irregular formats I got it to convert to html all fine. But the actual html when looking at it does not make sense and I am not sure how to parse this. for example:
Dim wc As New System.Net.WebClient() Dim p As New System.Net.WebProxy() Dim test As String wc.Encoding = System.Text.Encoding.GetEncoding("utf-8") p.Credentials = System.Net.CredentialCache.DefaultCredentials wc.Proxy = p
I am trying save a value from an input tag in some HTML source code. The tag looks like so:
<input name="user_status" value="3" />
I have the page source in a variable (pageSourceCode), and need to work out some regex to get the value (3 in this example). I have this so far: [Code] Which works fine most of the time, however this code is used to process source code from multiple sites (that use the same platform), and sometimes there are other attributes included in the input tag, or they are in a different order, eg:
what i am trying to do is extract information beween two tags in some html from the source of a website. The contents of the text between the two tags will always be different. the code i currently have is;
However - the match is returning "[14]" - including the brackets and I do not understand why. I have surrounded the d with parenthesis which should mean that this is the data I want to capture.
Trying to parse a text file for records starting with an RH space and a date. I need to return the entire line. I expect to find about 6000 in the file. Example of a full record:
RH 09/27/08 11:49 11:49:00.224 COA292 H393 2664FB753 178 -54.82 8.98 C 431 264 13 040 34 24.45-074 58 57.93 H Snipit of text file:
I'm parsing fixed length file with .NET 3.5 and Regex. This file is from bank. In customer name sometimes there is one characters from this set &,(),[],',"". These are characters, so far I've encountered. There can be anything else. Because of this my regex is failing. My regex is [A-Za-z0-9s-.,'""""(){}[]]{35}. Is there any wild card I can use for special chars rather than specifying individually. I also tried . but didn't work.
I'm having some trouble putting the pieces together.First of all, I'm currently using the WebBrowser component, but would be plenty happy with HtmlAgilityPack if it had some decent documentation, but for a newbie as VB.Net, it's a rough road.
What I'd like to do is grab all the h3's with the "this-class" class and stash them into an array (one in each array element).I'd then like to search through each one and see which has "And Another Title" - which I already have the code to do... I just don't know how to do the first bit.
Imports System.Web Imports System.Net Imports System.Net.ServicePointManager Public Class GetSource Function GetHtml(ByVal strPage As String) As String tryAgain:
[Code]...
What I got here is a vb.net code where I parse the website for its html This function works fine. The question is this...
1.If I run 100 threads with this function at the same time, Will it work?
2.Won't it affect my internet connection as well?
I don't want to waste time creating threads and codes a hundred times so if you know the answer please advice me on what should I do instead
I need to parse a web page for blocks that contain open trouble tickets. The web page display several unresolved tickets and each one is inside a html divison labeled "issue-status". I've written the following code which does find the blocks, but when I try to parse its children to get its element fields (date opened, person requesting, history...) it instead pulls every element from the web page, not just the children.Is there a way to just parse the sub-fields under a particular DIV?
Code: Dim theElementCollection As HtmlElementCollection Dim strResult As String = ""
I'm a PHP/MySQL/HTML guy, but in the course of my work, I sometimes have to delve into Gatesland.I am working in VS2005 developing reports, and occasionally I have to write some custom code. This code is in (I believe) VB.NET. I avoid this as much as possible. It is my belief that if you have to use custom code in a report, you're doing something wrong with the DB, or with your query.Now, my boss (for reasons unknown) is storing data in the database as HTML. This data is historical, having a month and a dollar amount, and comes in a form like this:[code]I know this breaks even 1NF. I did not design the database. I simply must suffer under it's schema. See, the developer did this so that he could just read in a field, and dump it straight out to an echo/print statement when forming up the HTML. Unfortunately for me (the report developer), HTML shows up as verbose text if I dump it out as a field in a text field in a VS2005. So, I need to strip out he HTML tags, and replace them with appropriate values.
I am first trying to strip out the <th> data, and print it out with appropriate line feeds and carriage returns. This is the code I am trying to use:[code]Now, far from doing what I intend it to do, it simply returns the jubilent result "#Error". Wonderful. I'm sure the client will be happy.There must be some simple syntax errors or something going on there, but I am nowhere near an expert with VB.NET. I've used VBA extensively, but last time I used it was about 3 years ago. I'm hoping I can cash in some of that positive rep I've got, and get some much needed help in the dark wilderness of Microsoftia
I Have a folder that gets a lot of html files dumped into it I have to read each file and parse it to extract information. What I need is to be able to load the html into a HTMLDocument, but I'm having trouble.. here's my code so far..
I have saved some HTML pages from the web...now i want to parse some specific data. I mean I want to retrieve some specific part from the HTMl page using VB/C# code. How do I go about it? I am using this code to read the html file..All i want to do now is to save the specifications to the DATABASE.
1. How do i select the specifications and display them in a ListBox??
I've been programming in VB.NET 2005, 2008 and now 2010 for almost 2 years. Just casual little applications, nothing big.In this project I need to parse links from a web page, it doesn't quite work though, it parses the names only and no links.I'll give you my code, let's say for a random page:
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load TextBox1.Multiline = True WebBrowser1.Navigate("http:www.buyfixuse.com")
[code]....
If I activate this function in my application instead of links to the two blog posts on that website, it only gives out the text that is related to these links - (more...)