F# Data: CSV Parser and Reader
The F# CSV Type Provider is built on top of an efficient CSV parser written in F#. There's also a simple API that can be used to access values dynamically.
When working with well-defined CSV documents, it is easier to use the type provider, but in a more dynamic scenario or when writing quick and simple scripts, the parser might be a simpler option.
Loading CSV documents
To load a sample CSV document, we first need to reference the FSharp.Data.dll
library
(when using F# Interactive) or to add reference to a project.
1: 2: |
#r "../../../bin/FSharp.Data.dll" open FSharp.Data |
The FSharp.Data
namespace contains the CsvFile
type that provides two static methods
for loading data. The Parse
method can be used if we have the data in a string
value.
The Load
method allows reading the data from a file or from a web resource (and there's
also an asynchronous AsyncLoad
version). The following sample calls Load
with a URL that
points to a live CSV file on the Yahoo finance web site:
1: 2: 3: 4: 5: 6: |
// Download the stock prices let msft = CsvFile.Load("http://ichart.finance.yahoo.com/table.csv?s=MSFT").Cache() // Print the prices in the HLOC format for row in msft.Rows do printfn "HLOC: (%s, %s, %s)" (row.GetColumn "High") (row.GetColumn "Low") (row.GetColumn "Date") |
Note that unlike CsvProvider
, CsvFile
works in streaming mode for performance reasons, which means
that Rows
can only be iterated once. If you need to iterate multiple times, use the Cache
method,
but please note that this will increase memory usage and should not be used in large datasets.
Using CSV extensions
Now we look at a number of extensions that become available after
opening the FSharp.Data.CsvExtensions
namespace. Once opened, we can write:
row?column
uses the dynamic operator to obtain the column value namedcolumn
; alternatively, you can also use an indexerrow.[column]
.value.AsBoolean()
returns the value as boolean if it is eithertrue
orfalse
value.AsInteger()
returns the value as integer if it is numeric and can be converted to an integer;value.AsInteger64()
,value.AsDecimal()
andvalue.AsFloat()
behave similarly.value.AsDateTime()
returns the value as aDateTime
value using either the ISO 8601 format, or using the\/Date(...)\/
JSON format containing number of milliseconds since 1/1/1970.value.AsGuid()
returns the value as aGuid
value.
Methods that may need to parse a numeric value or date (such as AsFloat
and
AsDateTime
) receive an optional culture parameter.
The following example shows how to process the sample previous CSV sample using these extensions:
1: 2: 3: 4: |
open FSharp.Data.CsvExtensions for row in msft.Rows do printfn "HLOC: (%f, %M, %O)" (row.["High"].AsFloat()) (row?Low.AsDecimal()) (row?Date.AsDateTime()) |
Transforming CSV files
In addition to reading, CsvFiles
also has support for transforming CSV files. The operations
available are Filter
, Take
, TakeWhile
, Skip
, SkipWhile
, and Truncate
. After transforming
you can save the results by using one of the overloads of the Save
method. You can choose different
separator and quote characters when saving.
1: 2: 3: 4: |
// Saving the first 10 stock prices where the closing price is higher than the opening price in TSV format: msft.Filter(fun row -> row?Close.AsFloat() > row?Open.AsFloat()) .Truncate(10) .SaveToString('\t') |
Related articles
- F# Data: CSV Type Provider - discusses F# type provider that provides type-safe access to CSV data
- API Reference: CsvFile class
- API Reference: CsvRow class
- API Reference: CsvExtensions module
Full name: CsvFile.msft
inherit CsvFile<CsvRow>
private new : readerFunc:Func<TextReader> * ?separators:string * ?quote:char * ?hasHeaders:bool * ?ignoreErrors:bool -> CsvFile
member private GetColumnIndex : columnName:string -> int
static member AsyncLoad : uri:string * ?separators:string * ?quote:char * ?hasHeaders:bool * ?ignoreErrors:bool -> Async<CsvFile>
static member Load : uri:string * ?separators:string * ?quote:char * ?hasHeaders:bool * ?ignoreErrors:bool -> CsvFile
static member Load : reader:TextReader * ?separators:string * ?quote:char * ?hasHeaders:bool * ?ignoreErrors:bool -> CsvFile
static member Load : stream:Stream * ?separators:string * ?quote:char * ?hasHeaders:bool * ?ignoreErrors:bool -> CsvFile
static member Parse : text:string * ?separators:string * ?quote:char * ?hasHeaders:bool * ?ignoreErrors:bool -> CsvFile
Full name: FSharp.Data.CsvFile
static member CsvFile.Load : reader:System.IO.TextReader * ?separators:string * ?quote:char * ?hasHeaders:bool * ?ignoreErrors:bool -> CsvFile
static member CsvFile.Load : stream:System.IO.Stream * ?separators:string * ?quote:char * ?hasHeaders:bool * ?ignoreErrors:bool -> CsvFile
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
member CsvRow.GetColumn : index:int -> string
from FSharp.Data