F# Data: JSON Type Provider
This article demonstrates how to use the JSON type provider to access JSON files in a statically typed way. We first look how the structure is inferred and then demonstrate the provider by parsing data from WorldBank and Twitter.
The JSON type provider provides a statically typed access to JSON documents. It takes a sample document as an input (or a document containing a JSON array of samples). The generated type can then be used to read files with the same structure. If the loaded file does not match the structure of the sample, an runtime error may occur (but only when accessing e.g. non-existing element).
Introducing the provider
The type provider is located in the FSharp.Data.dll
assembly. Assuming the assembly
is located in the ../../../bin
directory, we can load it in F# Interactive as follows:
1: 2: |
#r "../../../bin/FSharp.Data.dll" open FSharp.Data |
Inferring types from the sample
The JsonProvider<...>
takes one static parameter of type string
. The parameter can
be either a sample string or a sample file (relatively to the current folder or online
accessible via http
or https
). It is not likely that this could lead to ambiguities.
The following sample passes small JSON string to the provider:
1: 2: 3: 4: |
type Simple = JsonProvider<""" { "name":"John", "age":94 } """> let simple = Simple.Parse(""" { "name":"Tomas", "age":4 } """) simple.Age simple.Name |
You can see that the generated type has two properties - Age
of type int
and Name
of
type string
. The provider successfully infers the types from the sample and exposes the
fields as properties (with PascalCase name to follow standard naming conventions).
Inferring numeric types
In the previous case, the sample document simply contained an integer and so the provider
inferred the type int
. Sometimes, the types in the sample document (or a list of samples)
may not exactly match. For example, a list may mix integers and floats:
1: 2: 3: |
type Numbers = JsonProvider<""" [1, 2, 3, 3.14] """> let nums = Numbers.Parse(""" [1.2, 45.1, 98.2, 5] """) let total = nums |> Seq.sum |
When the sample is a collection, the type provider generates a type that can be used to store
all values in the sample. In this case, the resulting type is decimal
, because one
of the values is not an integer. In general, the provider supports (and prefers them
in this order): int
, int64
, decimal
and float
.
Other primitive types cannot be combined into a single type. For example, if the list contains numbers and strings. In this case, the provider generates two methods that can be used to get values that match one of the types:
1: 2: 3: 4: 5: |
type Mixed = JsonProvider<""" [1, 2, "hello", "world"] """> let mixed = Mixed.Parse(""" [4, 5, "hello", "world" ] """) mixed.Numbers |> Seq.sum mixed.Strings |> String.concat ", " |
As you can see, the Mixed
type has property Numbers
and Strings
that
return only int
and string
values from the collection. This means that we get a nice
type-safe access to the values, but not in the original order (if order matters, then
you can use the mixed.JsonValue
property to get the underlying JsonValue
and
process it dynamically as described in the documentation for JsonValue
.
Inferring record types
Now, let's look at a sample JSON document that contains a list of records. The
following example uses two records - one with name
and age
and the second with just
name
. If a property is missing, then the provider infers it as optional.
If we want to just use the same text used for the schema at runtime, we can use the GetSamples
method:
1: 2: 3: 4: 5: 6: |
type People = JsonProvider<""" [{ "name":"John", "age":94 }, { "name":"Tomas" }] """> for item in People.GetSamples() do printf "%s " item.Name item.Age |> Option.iter (printf "(%d)") printfn "" |
The inferred type for items
is a collection of (anonymous) JSON entities - each entity
has properties Name
and Age
. As Age
is not available for all records in the sample
data set, it is inferred as option<int>
. The above sample uses Option.iter
to print
the value only when it is available.
In the previous case, the values of individual properties had common type - string
for the Name
property and numeric type for Age
. However, what if the property of
a record can have multiple different types? In that case, the type provider behaves
as follows:
1: 2: 3: 4: 5: 6: 7: |
type Values = JsonProvider<""" [{"value":94 }, {"value":"Tomas" }] """> for item in Values.GetSamples() do match item.Value.Number, item.Value.String with | Some num, _ -> printfn "Numeric: %d" num | _, Some str -> printfn "Text: %s" str | _ -> printfn "Some other value!" |
Here, the Value
property is either a number or a string, The type provider generates
a type that has an optional property for each possible option, so we can use
simple pattern matching on option<int>
and option<string>
values to distinguish
between the two options. This is similar to the handling of heterogeneous arrays.
Note that we have a GetSamples
method because the sample is a json list. If it was a json
object, we would have a GetSample
method instead.
Loading WorldBank data
Let's now use the type provider to process some real data. We use a data set returned by the WorldBank, which has (roughly) the following structure:
[ { "page": 1, "pages": 1, "total": 53 },
[ { "indicator": {"value": "Central government debt, total (% of GDP)"},
"country": {"id":"CZ","value":"Czech Republic"},
"value":null,"decimal":"1","date":"2000"},
{ "indicator": {"value": "Central government debt, total (% of GDP)"},
"country": {"id":"CZ","value":"Czech Republic"},
"value":"16.6567773464055","decimal":"1","date":"2010"} ] ]
The response to a request contains an array with two items. The first item is a record
with general information about the response (page, total pages, etc.) and the second item
is another array which contains the actual data points. For every data point, we get
some information and the actual value
. Note that the value
is passed as a string
(for some unknown reason). It is wrapped in quotes, so the provider infers its type as
string
(and we need to convert it manually).
The following sample generates type based on the data/WorldBank.json
file and loads it:
1: 2: |
type WorldBank = JsonProvider<"../data/WorldBank.json"> let doc = WorldBank.GetSample() |
Note that we can also load the data directly from the web both in the Load
method and in
the type provider sample parameter, and there's an asynchronous AsyncLoad
method available too:
1:
|
let docAsync = WorldBank.AsyncLoad("http://api.worldbank.org/country/cz/indicator/GC.DOD.TOTL.GD.ZS?format=json") |
The doc
is an array of heterogeneous types, so the provider generates a type
that can be used to get the record and the array, respectively. Note that the
provider infers that there is only one record and one array. We can print the data set as follows:
1: 2: 3: 4: 5: 6: 7: 8: 9: |
// Print general information let info = doc.Record printfn "Showing page %d of %d. Total records %d" info.Page info.Pages info.Total // Print all data points for record in doc.Array do record.Value |> Option.iter (fun value -> printfn "%d: %f" record.Date value) |
When printing the data points, some of the values might be missing (in the input, the value
is null
instead of a valid number). This is another example of a heterogeneous type -
the type is either Number
or some other type (representing null
value). This means
that record.Value
has a Number
property (when the value is a number) and we can use
it to print the result only when the data point is available.
Parsing Twitter stream
We now look on how to parse tweets returned by the Twitter API.
Tweets are quite heterogeneous, so we infer the structure from a list of inputs rather than from
just a single input. To do that, we use the file data/TwitterStream.json
(containing a list of tweets) and pass an optional parameter SampleIsList=true
which tells the
provider that the sample is actually a list of samples:
1: 2: 3: 4: 5: 6: |
type Tweet = JsonProvider<"../data/TwitterStream.json", SampleIsList=true> let text = (omitted) let tweet = Tweet.Parse(text) printfn "%s (retweeted %d times)\n:%s" tweet.User.Value.Name tweet.RetweetCount.Value tweet.Text.Value |
After creating the Tweet
type, we parse a single sample tweet and print some details about the
tweet. As you can see, the tweet.User
property has been inferred as optional (meaning that a
tweet might not have an author?) so we unsafely get the value using the Value
property.
The RetweetCount
and Text
properties may be also missing, so we also access them unsafely.
Getting and creating GitHub issues
In this example we will now also create JSON in addition to consuming it. Let's start by listing the 5 most recently updated open issues in the FSharp.Data repo.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: |
type GitHub = JsonProvider<"https://api.github.com/repos/fsharp/FSharp.Data/issues"> let topRecentlyUpdatedIssues = GitHub.GetSamples() |> Seq.filter (fun issue -> issue.State = "open") |> Seq.sortBy (fun issue -> System.DateTime.Now - issue.UpdatedAt) |> Seq.truncate 5 for issue in topRecentlyUpdatedIssues do printfn "#%d %s" issue.Number issue.Title |
And now let's create a new issue. We look into the documentation at http://developer.github.com/v3/issues/#create-an-issue and we see that we need to post a JSON value similar to this:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: |
[<Literal>] let issueSample = """ { "title": "Found a bug", "body": "I'm having a problem with this.", "assignee": "octocat", "milestone": 1, "labels": [ "Label1", "Label2" ] } """ |
This JSON is different from what we got for each issue in the previous API call, so we'll define a new type based on this sample, create an instance, and send a POST request:
1: 2: 3: 4: 5: 6: 7: 8: |
type GitHubIssue = JsonProvider<issueSample, RootName="issue"> let newIssue = GitHubIssue.Issue("Test issue", "This is a test issue created in F# Data documentation", assignee = "", labels = [| |], milestone = 0) newIssue.JsonValue.Request "https://api.github.com/repos/fsharp/FSharp.Data/issues" |
Related articles
- F# Data: JSON Parser and Reader - provides more information about working with JSON values dynamically.
- API Reference: JsonProvider type provider
- API Reference: JsonValue discriminated union
Full name: JsonProvider.Simple
Full name: FSharp.Data.JsonProvider
<summary>Typed representation of a JSON document.</summary>
<param name='Sample'>Location of a JSON sample file or a string containing a sample JSON document.</param>
<param name='SampleIsList'>If true, sample should be a list of individual samples for the inference.</param>
<param name='RootName'>The name to be used to the root type. Defaults to `Root`.</param>
<param name='Culture'>The culture used for parsing numbers and dates.</param>
<param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution).</param>
Full name: JsonProvider.simple
Parses the specified JSON string
Full name: JsonProvider.Numbers
Full name: JsonProvider.nums
Parses the specified JSON string
Full name: JsonProvider.total
from Microsoft.FSharp.Collections
Full name: Microsoft.FSharp.Collections.Seq.sum
Full name: JsonProvider.Mixed
Full name: JsonProvider.mixed
from Microsoft.FSharp.Core
Full name: Microsoft.FSharp.Core.String.concat
Full name: JsonProvider.People
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printf
from Microsoft.FSharp.Core
Full name: Microsoft.FSharp.Core.Option.iter
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
Full name: JsonProvider.Values
Full name: JsonProvider.WorldBank
Full name: JsonProvider.doc
Full name: JsonProvider.docAsync
Loads JSON from the specified uri
Full name: JsonProvider.info
Full name: JsonProvider.Tweet
Full name: JsonProvider.text
Full name: JsonProvider.tweet
Full name: JsonProvider.GitHub
Full name: JsonProvider.topRecentlyUpdatedIssues
Full name: Microsoft.FSharp.Collections.Seq.filter
Full name: Microsoft.FSharp.Collections.Seq.sortBy
type DateTime =
struct
new : ticks:int64 -> DateTime + 10 overloads
member Add : value:TimeSpan -> DateTime
member AddDays : value:float -> DateTime
member AddHours : value:float -> DateTime
member AddMilliseconds : value:float -> DateTime
member AddMinutes : value:float -> DateTime
member AddMonths : months:int -> DateTime
member AddSeconds : value:float -> DateTime
member AddTicks : value:int64 -> DateTime
member AddYears : value:int -> DateTime
...
end
Full name: System.DateTime
--------------------
System.DateTime()
(+0 other overloads)
System.DateTime(ticks: int64) : unit
(+0 other overloads)
System.DateTime(ticks: int64, kind: System.DateTimeKind) : unit
(+0 other overloads)
System.DateTime(year: int, month: int, day: int) : unit
(+0 other overloads)
System.DateTime(year: int, month: int, day: int, calendar: System.Globalization.Calendar) : unit
(+0 other overloads)
System.DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int) : unit
(+0 other overloads)
System.DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, kind: System.DateTimeKind) : unit
(+0 other overloads)
System.DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, calendar: System.Globalization.Calendar) : unit
(+0 other overloads)
System.DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int) : unit
(+0 other overloads)
System.DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int, kind: System.DateTimeKind) : unit
(+0 other overloads)
Full name: Microsoft.FSharp.Collections.Seq.truncate
type LiteralAttribute =
inherit Attribute
new : unit -> LiteralAttribute
Full name: Microsoft.FSharp.Core.LiteralAttribute
--------------------
new : unit -> LiteralAttribute
Full name: JsonProvider.issueSample
Full name: JsonProvider.GitHubIssue
Full name: JsonProvider.newIssue
inherit IJsonDocument
new : title: string * body: string * assignee: string * milestone: int * labels: string [] -> Issue
member Assignee : string
member Body : string
member JsonValue : JsonValue
member Labels : string []
member Milestone : int
member Path : string
member Title : string
Full name: FSharp.Data.JsonProvider,Sample="
{
\"title\": \"Found a bug\",
\"body\": \"I'm having a problem with this.\",
\"assignee\": \"octocat\",
\"milestone\": 1,
\"labels\": [
\"Label1\",
\"Label2\"
]
}
",RootName="issue".Issue