Skip to main content

Parse CSV or other separated strings

I had to work with Comma Separated Values (CSV) files the last few days and updated some code to make it easier to use with LINQ. Quick review, a CSV file is a file that has all of the values separated by a comma. Values that contain a comma are typically qualified with double quotes. For example:

Abraham, Lincoln, 03/4/1861, 04/15/1865, "Republican, National Union"

Basically, I created a class called SeparatedString that performs all the functionality of parsing the data so that I can retrieve it later. When the Text property is set it figures out the location of all relevant commas and loads them into a List. The GetValue method then uses the index to grab the occurrence of the comma that precedes the value to be retrieved.

Public Class SeparatedString
    ''' <summary>
    ''' Contains the position of all separator characters in the string
    ''' </summary>
    ''' <remarks></remarks>
    Private _Separators As New List(Of Integer)

    ''' <summary>
    ''' The CSV text of the string.
    ''' </summary>
    ''' <remarks></remarks>
    Private _Text As String
    Public Property Text() As String
        Get
            Return _Text
        End Get
        Set(ByVal value As String)
            _Text = value
            LoadSeparatorPositions()
        End Set
    End Property

    Private _Qualifier As Char = Chr(34)
    ''' <summary>
    ''' The character used to qualify text strings.
    ''' </summary>
    ''' <value>The character used to qualify text strings.</value>
    ''' <returns>Char</returns>
    ''' <remarks>
    ''' <para>
    ''' Default value: double quotes - chr(34)
    ''' </para>
    ''' Depending on the tool used to create the string, text
    ''' strings may be qualified by a character, typically double quotes.
    ''' <example>
    ''' "John", "Quincy", "Adams", 01/01/2003, 54
    ''' </example>
    ''' </remarks>
    Public Property Qualifier() As Char
        Get
            Return _Qualifier
        End Get
        Set(ByVal value As Char)
            _Qualifier = value
        End Set
    End Property

    Private _Separator As Char = ","
    ''' <summary>
    ''' The character used to separate values.
    ''' </summary>
    ''' <value>The character used to separate values.</value>
    ''' <returns>Char</returns>
    ''' <remarks>
    ''' <para>
    ''' Default value: comma
    ''' </para>
    ''' <example>
    ''' "John", "Quincy", "Adams", 01/01/2003, 54
    ''' </example>
    ''' </remarks>
    Public Property Separator() As Char
        Get
            Return _Separator
        End Get
        Set(ByVal value As Char)
            _Separator = value
        End Set
    End Property

    ''' <summary>
    ''' Loads the list of separator positions.
    ''' </summary>
    ''' <remarks>
    ''' The _Separators list is cleared and loaded with the
    ''' position of all separators. If the string is empty
    ''' or contains no unqualified separators, the list is
    ''' empty.
    ''' <para>
    ''' NOTE: Separators found within qualifiers are not
    ''' stored.
    ''' </para>
    ''' </remarks>
    Private Sub LoadSeparatorPositions()
        _Separators.Clear()
        If _Text.Trim.Length < 1 Then Return

        Dim IsInQualifier = False
        Dim CurChar As Char
        'Dim x = 0
        For x = 0 To _Text.Length - 1
            CurChar = _Text.Substring(x, 1)
            If CurChar = _Separator And Not IsInQualifier Then _Separators.Add(x)
            If CurChar = _Qualifier Then IsInQualifier = Not IsInQualifier
        Next
    End Sub

    ''' <summary>
    ''' Gets the value at the specified separator position.
    ''' </summary>
    ''' <param name="occur">The occurance of the separator that begins the value.</param>
    ''' <returns>String</returns>
    ''' <remarks>
    ''' NOTE: The list of values is zero based unless the string begins with a separator.
    ''' <example>
    ''' <![CDATA[
    ''' Dim sStr As New SeparatedString(Chr(34) & "John" & Chr(34) & ", " & Chr(34) & "Quincy" & Chr(34) & ", " & Chr(34) & "Adams" & Chr(34) & ", 07/11/1767, 80")
    ''' Console.WriteLine("First Name: " & sStr.GetValue(0))
    ''' Console.WriteLine("Middle Name: " & sStr.GetValue(1))
    ''' Console.WriteLine("Last Name: " & sStr.GetValue(2))
    ''' Console.WriteLine("Date Of Birth: " & sStr.GetValue(3))
    ''' Console.WriteLine("Age: " & sStr.GetValue(4))
    ''' ]]> 
    ''' ' You'll need to remove the CDATA start and end.
    ''' '
    ''' 'This example produces the following results:
    ''' 'First Name: John
    ''' 'Middle Name: Quincy
    ''' 'Last Name: Adams
    ''' 'Date Of Birth: 07/11/1767
    ''' 'Age: 80
    ''' </example>
    ''' </remarks>
    Public Function GetValue(ByVal occur As Integer) As String
        If occur > _Separators.Count Then Return String.Empty

        Dim start = 0
        If occur > 0 Then start = _Separators(occur - 1) + 1

        Dim value As String = ""
        If occur < _Separators.Count Then
            value = _Text.Substring(start, _Separators(occur) - start)
        Else
            value = _Text.Substring(start)
        End If
        value = value.Trim
        value = value.Replace(_Qualifier, "")
        If value.Length < 1 Then Return String.Empty
        Return value
    End Function

    ''' <summary>
    ''' Creates a new instance of the SeparatedString class.
    ''' </summary>
    ''' <remarks></remarks>
    Public Sub New()
        Separator = ","
        Qualifier = Chr(34)
    End Sub

    ''' <summary>
    ''' Creates a new instance of the SeparatedString class with the specified value.
    ''' </summary>
    ''' <remarks></remarks>
    Public Sub New(ByVal value As String)
        Text = value
    End Sub

    ''' <summary>
    ''' Creates a new instance of the SeparatedString class with the specified value, separator and qualifier.
    ''' </summary>
    ''' <remarks></remarks>
    Public Sub New(ByVal value As String, ByVal sep As Char, ByVal qual As Char)
        Separator = sep
        Qualifier = qual
        Text = value
    End Sub
End Class

You can simply use the class out of the box or inherit from SeparatedString and create a class that has meaningful properties that can be used with LINQ.

Public Class President
    Inherits SDG.FileHandler.SeparatedString

    Public Const FirstNameKey As Int16 = 0
    Public Const LastNameKey As Int16 = 1
    Public Const TermStartKey As Int16 = 2
    Public Const TermEndKey As Int16 = 3
    Public Const PartyKey As Int16 = 4

    Public ReadOnly Property FirstName() As String
        Get
            Return GetValue(FirstNameKey)
        End Get
    End Property

    Public ReadOnly Property LastName() As String
        Get
            Return GetValue(LastNameKey)
        End Get
    End Property

    Public ReadOnly Property Party() As String
        Get
            Return GetValue(PartyKey)
        End Get
    End Property

    Public ReadOnly Property TermEnd() As Date
        Get
            If IsDate(GetValue(TermEndKey)) Then Return GetValue(TermEndKey)
            Return Date.MinValue
        End Get
    End Property

    Public ReadOnly Property TermStart() As Date
        Get
            If IsDate(GetValue(TermStartKey)) Then Return GetValue(TermStartKey)
            Return Date.MinValue
        End Get
    End Property

    Public Sub New()
        MyBase.New()
    End Sub

    Public Sub New(ByVal value As String)
        MyBase.New(value)
    End Sub
End Class

Putting it all together, I simply read the file and create a List. Since LINQ can be used on generic Lists, we're all set. The following reads a CSV file of US Presidents and returns those inaugurated in January.

Dim presidents As New List(Of President)

Using dataFile = New StreamReader(TextBox1.Text)
    Dim line As String
    Do
        line = dataFile.ReadLine
        If line Is Nothing Then Exit Do
        If line.Trim.Length < 2 Then Exit Do
        presidents.Add(New President(line))
    Loop Until line Is Nothing
End Using

Dim query = From p In presidents Where p.TermStart.Month = 1 Select p

GridView1.DataSource = query.ToList
        

You can change the separator and/or qualifier. I made a conscious decision to not provide a  mechanism to override the check for a qualifier. Simply use a character that will not appear in your file to get around this check.

Comments

Lisa J. Fuentes said…
I tried to do the same way, but it still produced an error. After a long search, I found one more way how to open csv file https://wikiext.com/csv. And this method suited me more. My problem has been fixed and I can continue my work further.

Popular posts from this blog

Migrating Legacy Apps to the New SimpleMembership Provider

Asp.Net MVC4 uses the new SimpleMembership provider, changes the table structure and adds a new hashing algorithm. The reasons for the changes can be found in this article by Jon Galloway. This article shows how to migrate your existing apps to the new provider.I’m assuming that you stored your passwords in the unrecoverable SHA-1 format. If you didn’t, then you’ll have to change a couple of things. All of my apps are done this way so… I’m also assuming that you have created the basic skeleton of the new app and ran it once so the correct tables will be created.First, we’ll look at the new tables. Previously, we had all of those aspnet_xxxxxx tables. Here’s the new ones.UserProfileContains all of the elements relevant to the user. This is a combination of the aspnet_Users table and the aspnet_Profiles table.webpages_MembershipStores the password info when not using OAuth, Live, Facebook, etc. This table is somewhat of a match to the aspnet_Membership table.webpages_OAuthMembershipStor…

JavaScript function to automatically add slashes to date

In converting an old Windows app to a browser app, the user wanted to be able to enter dates without the slashes. Here's a simple jscript: 1:// Function to convert short date string (MMddyy) 2:// or (MMddyyyy) to a date string (mm/dd/yyyy). 3:// txtBox is the actual textbox control 4:// with the value to be processed. 5:function FixShortDate(txtBox) { 6:if (txtBox == null) { 7:return'' } 8: 9:var re = new RegExp(/(\d{6})(\d{2})?/); 10: 11:if (re.test(txtBox.value)) 12: { 13:if (txtBox.value.length == 8) { 14: txtBox.value = txtBox.value.substring(0, 2) + '/' + txtBox.value.substring(2, 4) + '/' + txtBox.value.substring(4, 8) 15: } 16: 17:if (txtBox.value.length == 6) { 18:if (txtBox.value.substring(4, 6) < 20)

Get Asp.Net Profile properties from Sql

Ever wanted to include the profile information from an Asp.Net profile in a query? It’s not that hard once you understand the structure. I’ve written a little function that does all the work. Note: I’m using Sql Server as my repository.

First we need to understand how the profile data is stored. Looking at the aspnet_Profile table, we can see that it stores the information in two columns: PropertyNames and PropertyValuesString.

Looking at PropertyNames we can see that it has a basic structure of Property Name, Data Type, Starting Position and Length. For example, in the string “FirstName:S:0:4:Phone:S:4:10:LastName:S:14:5:” we can see that FirstName is of type string, starts at position 0 and has a length of 4. Notice the zero base for the starting position, we need to correct for that in our function. This means in the PropertyValuesString “John2175551212Smith”, we would start with the first position and proceed 4 characters to get the name.

You might be thinking …