F# Data


F# Data: Freebase Provider

The Freebase graph database contains information on over 23 million entities, with information on a wide variety of subjects ranging from books and movies to historical figures and events to chemical elements, as well as rich interconnections between the entities.

The Freebase type provider puts this information at your fingertips, giving you strongly-typed access to a treasure trove of data. This article provides a brief introduction showing some of the features.

This type provider is also used on the Try F# web site in the "Data Science" tutorial, so you can find more examples there. The Visual Studio F# Team Blog also has a series of 4 blog posts about it here and you can watch a recorded demo by Don Syme here.

Introducing the provider

The following example loads the FSharp.Data.dll library (in F# Interactive), initializes a connection to Freebase using the GetDataContext method:

1: 
2: 
3: 
4: 
#r "../../../bin/FSharp.Data.dll"
open FSharp.Data

let data = FreebaseData.GetDataContext()

Exploring Freebase data

Now you can explore the Freebase data schema by typing data. and exploring the available data sources in the autocomplete. For example, the following snippet retrieves the Chemical Elements and then looks at the details of Hydrogen:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
let elements = data.``Science and Technology``.Chemistry.``Chemical Elements``

let all = elements |> Seq.toList
printfn "Elements found: %d" (Seq.length all)

let hydrogen = elements.Individuals.Hydrogen
printfn "Atominc number: %A" hydrogen.``Atomic number``

Generating test cases

There is a lot of different data available on Freebase, and you can use it for all kinds of purposes. The following snippet uses the database of celebrities to generate realistic names for testing purposes. First, we obtain two arrays - one containing 100 first names (based on names of celebrities) and another obtaining 100 surnames (from a Freebase list of last names):

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
let firstnames = 
    data.Society.Celebrities.Celebrities
    |> Seq.truncate 100
    |> Seq.map (fun celeb -> celeb.Name.Split([|' '|]).[0])
    |> Array.ofSeq

let surnames = 
    data.Society.People.``Family names``
    |> Seq.truncate 100
    |> Seq.map (fun name -> name.Name)
    |> Array.ofSeq

To generate realistic test case data, we now write a helper function that picks a random element from the array and then concatenate a random first name with a random surname:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
let randomElement = 
    let random = new System.Random()
    fun (arr : string[]) -> arr.[random.Next(arr.Length)]

for i in 0 .. 10 do
  let name = 
    (randomElement firstnames) + " " +
    (randomElement surnames)
  printfn "%s" name

Querying Freebase data

In the previous examples, we used Seq functions to work with the collections returned by Freebase type provider. This works in simple cases, but it is inefficient if we need to filter the data or perform other querying tasks.

However, the Freebase provider includes support for querying. Queries written using the F# 3.0 LINQ syntax are translated to MQL (a querying language used by Freebase). This means you can write queries in F# 3.0 with auto-completion and strong typing, and still execute efficiently on the server, at least for the queries translated to MQL.

The following example returns stars, together with their distance from Earth (stars without known distance are skipped):

1: 
2: 
3: 
4: 
5: 
6: 
let astronomy = data.``Science and Technology``.Astronomy

query { for e in astronomy.Stars do 
        where e.Distance.HasValue
        select (e.Name, e.Distance) } 
      |> Seq.toList

To make the example shorter, we first defined a variable astronomy that represents the domain of astronomical data. We also need to add Seq.toList to the end to actually execute the query and get results back in a list.

The following query returns stars that have a known distance and are close to Earth:

1: 
2: 
3: 
4: 
query { for e in astronomy.Stars do 
        where (e.Distance.Value < 4.011384e+18<_>)
        select e } 
      |> Seq.toList

The query language supports a number of advanced operators in addition to simple where and select. For example, we can sort the stars by distance from Earth and then select 10 closest stars:

1: 
2: 
3: 
4: 
5: 
query { for e in astronomy.Stars do 
        sortBy e.Distance.Value
        take 10
        select e } 
      |> Seq.toList

Freebase query operators

In addition to the standard F# 3.0 query operators, the namespace FSharp.Data.FreebaseOperators defines a couple more Freebase specific operators such as ApproximatelyMatches, ApproximatelyOneOf, ApproximateCount and Count. These are translated to specific MQL operations.

For example, the following snippet uses Count and ApproximateCount to count the number of US presidents (in this case, ApproximateCount is not very useful, because counting the exact number is efficient enough):

1: 
2: 
3: 
4: 
open FSharp.Data.FreebaseOperators

data.Society.Government.``US Presidents``.Count()
data.Society.Government.``US Presidents``.ApproximateCount()

The ApproximatelyMatches operator can be used, for example, when working with strings. The following snippet searches for books that have a name approximately matching the specified string:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
let topBooksWithNameContaining (s:string) = 
    query { for book in data.``Arts and Entertainment``.Books.Books do
            where (book.Name.ApproximatelyMatches s)
            take 10 
            select book.Name }
 
topBooksWithNameContaining "1984" |> Seq.toList

Units of Measure

Units of measure are supported. For example, the Atomic mass property of chemical elements is automatically converted to SI units and it is exposed in Kilograms. This is statically tracked in the F# type system using units of measure.

Here is an example from data about cyclones and hurricanes:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
open Microsoft.FSharp.Data.UnitSystems.SI.UnitNames
open Microsoft.FSharp.Data.UnitSystems.SI.UnitSymbols

let cyclones = data.``Science and Technology``.Meteorology.``Tropical Cyclones``

// The type here is float<metre/second>, since the Freebase project uses normalized SI units
let topWind = cyclones.Individuals10.``Hurricane Sandy``.``Highest winds``

We can convert this figure into 185 km/h like this:

1: 
let distanceTravelledByWindInAnHour : float = topWind * 3600.0<second> / 1000.0<meter>

Advanced provider features

The Freebase type provider has a number of features and it is beyond the scope of this introduction to discuss all of them. Some of the aspects were already demonstrated and more documentation can be found in the articles linked in the introduction. To give a brief summary, here is a list of features:

  • Many queries are translated efficiently into the MQL language. Those that can't execute on the client side by default.
  • A selection of sample individuals is given under the Individuals entry for each collection of objects. This allows you to program against strongly named individual such as Hydrogen or Bob Dylan.
  • Custom Freebase operators such as approximate counts and approximate string matching are supported
  • Image URLs are provided via the GetImages() method, and the first image is provided using the MainImage property
  • Snapshot dates for Freebase are supported. This means that you can view the state of the Freebase database on a specific date (also meaning that your application will not break when the schema changes).
  • Optional client-side caching of schema information makes type checking quick and efficient
  • If you want to query larger amount of Freebase data, you can register at Google and obtain a custom API key. The key can be passed as a static parameter to the type provider.

Providing an API key

The Freebase API is rate limited, and initially you are using some quota available for debugging purposes. If you get the (403) Forbidden error, then this shows you are hitting rate limitations. You will need an API key with the Freebase service enabled. This gives you 100,000 requests/day. The F# Data Library also provides the FreebaseDataProvider type which allows you to specify several static parameters, including the API key:

1: 
2: 
3: 
4: 
5: 
[<Literal>]
let FreebaseApiKey = "<enter your freebase-enabled google API key here>"

//type FreebaseDataWithKey = FreebaseDataProvider<Key=FreebaseApiKey>
//let dataWithKey = FreebaseDataWithKey.GetDataContext()

In alternative, you can also set the FREEBASE_API_KEY environment variable, which will be used if you don't specify the Key parameter.

Further Individuals

As you saw above, individual entities can be addressed through the Individuals property. By default the first 1000 individuals are returned by Freebase. Three other versions of individuals exist - Individuals10 (containing 10,000 individuals), Individuals100 (containing 100,000 individuals) and IndividualsAZ (containing individuals bucketed by first letter of their name, with each bucket containing up to 10,000 individuals). Together these help provide alternative, more stable ways of scaling to larger tables, but where navigation may be slower.

1: 
2: 
3: 
data.``Science and Technology``.Astronomy.Stars.Individuals10.``Alpha Centauri A``

data.``Science and Technology``.Astronomy.Stars.IndividualsAZ.A.``Alpha Centauri A``

For example, there are at least 3,921,979 books in Freebase:

1: 
data.``Arts and Entertainment``.Books.Books.ApproximateCount()

Listing the first 100,000 reveals the Bible but is very, very slow:

1: 
// data.``Arts and Entertainment``.Books.Books.Individuals100.``The Bible``

This provides a stable but more efficient way of address that specific book:

1: 
data.``Arts and Entertainment``.Books.Books.IndividualsAZ.T.``The Bible``

Debugging MQL queries

If you want to understand how the Freebase type provider work, or if you want to debug a performance issue, it might be useful to see the requests that the provider sends to Freebase. This can be done by subscribing to the SendingQuery and SendingRequest events. The former triggers for overall Freebase MQL queries and can be run in the Freebase query editor. The latter triggers for individual REST requests including cursor-advancing requests and documentation requests.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
data.DataContext.SendingQuery.Add (fun e -> 
  printfn "query: %A" e.QueryText)

data.DataContext.SendingRequest.Add (fun e -> 
  printfn "request: %A" e.RequestUri)

data.``Science and Technology``.Chemistry.
     ``Chemical Elements``.Individuals.Hydrogen.``Atomic mass``.Mass

Related articles

namespace FSharp
namespace FSharp.Data
val data : FreebaseData.ServiceTypes.FreebaseService

Full name: Freebase.data
type FreebaseData =
  static member GetDataContext : unit -> FreebaseService
  nested type ServiceTypes

Full name: FSharp.Data.FreebaseData


<summary>Typed representation of Freebase data. See http://www.freebase.com for terms and conditions.</summary>
FreebaseData.GetDataContext() : FreebaseData.ServiceTypes.FreebaseService
val elements : FreebaseData.ServiceTypes.Chemistry.Chemistry.Chemical_elementDataCollection

Full name: Freebase.elements
val all : FreebaseData.ServiceTypes.Chemistry.Chemistry.Chemical_elementData list

Full name: Freebase.all
module Seq

from Microsoft.FSharp.Collections
val toList : source:seq<'T> -> 'T list

Full name: Microsoft.FSharp.Collections.Seq.toList
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
val length : source:seq<'T> -> int

Full name: Microsoft.FSharp.Collections.Seq.length
val hydrogen : FreebaseData.ServiceTypes.Chemistry.Chemistry.Chemical_elementDataIndividuals.Hydrogen Item

Full name: Freebase.hydrogen
property FreebaseData.ServiceTypes.Chemistry.Chemistry.Chemical_elementDataCollection.Individuals: FreebaseData.ServiceTypes.Chemistry.Chemistry.Chemical_elementDataIndividuals


<summary>A sample set of named individuals (up to 1000 in size) of type &apos;Chemical Element&apos; in the web data store</summary>
property FreebaseData.ServiceTypes.Chemistry.Chemistry.Chemical_elementDataIndividuals.Hydrogen: FreebaseData.ServiceTypes.Chemistry.Chemistry.Chemical_elementDataIndividuals.Hydrogen Item


<summary>Hydrogen ( /ˈhaɪdrɵdʒɨn/ HY-drə-jin) is the chemical element with atomic number 1. It is represented by the symbol H. With an average atomic weight of 1.00794 u (1.007825 u for hydrogen-1), hydrogen is the lightest element and its monatomic form (H1) is the most abundant chemical substance, constituting roughly 75% of the Universe&apos;s baryonic mass. Non-remnant stars are mainly composed of hydrogen in its plasma state.
At standard temperature and pressure, hydrogen is a colorless, odorless, tasteless, non-toxic, nonmetallic, highly combustible diatomic gas with the molecular formula H2. Naturally-occurring atomic hydrogen is rare on Earth because hydrogen readily forms covalent compounds with most elements and is present in the water molecule and in most organic compounds. Hydrogen plays a particularly important role in acid-base chemistry with many reactions exchanging protons between soluble molecules.
In ionic compounds, it can take a negative charge (an anion known as a hydride and written as H), or as a positively charged species H. The latter cation is written as though composed of a bare proton, but in reality, hydrogen cations in ionic compounds always occur as more complex </summary>
val firstnames : string []

Full name: Freebase.firstnames
property FreebaseData.ServiceTypes.FreebaseService.Society: FreebaseData.ServiceTypes.DomainObjects.Society


<summary>Contains the objects of the domain category &apos;Society&apos; defined in the web data store organized by type</summary>
property FreebaseData.ServiceTypes.DomainObjects.Society.Celebrities: FreebaseData.ServiceTypes.DomainObjects.CelebritiesDomain


<summary></summary>
property FreebaseData.ServiceTypes.DomainObjects.CelebritiesDomain.Celebrities: FreebaseData.ServiceTypes.Celebrities.Celebrities.CelebrityDataCollection


<summary>A celebrity is a widely-recognized or famous person who commands a high degree of public and media attention. A celebrity is a widely-recognized or famous person who commands a high degree of public and media attention. Do not include non-celebrities in this type. </summary>
val truncate : count:int -> source:seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Collections.Seq.truncate
val map : mapping:('T -> 'U) -> source:seq<'T> -> seq<'U>

Full name: Microsoft.FSharp.Collections.Seq.map
val celeb : FreebaseData.ServiceTypes.Celebrities.Celebrities.CelebrityData
property Runtime.Freebase.IFreebaseObject.Name: string
System.String.Split(params separator: char []) : string []
System.String.Split(separator: string [], options: System.StringSplitOptions) : string []
System.String.Split(separator: char [], options: System.StringSplitOptions) : string []
System.String.Split(separator: char [], count: int) : string []
System.String.Split(separator: string [], count: int, options: System.StringSplitOptions) : string []
System.String.Split(separator: char [], count: int, options: System.StringSplitOptions) : string []
module Array

from Microsoft.FSharp.Collections
val ofSeq : source:seq<'T> -> 'T []

Full name: Microsoft.FSharp.Collections.Array.ofSeq
val surnames : string []

Full name: Freebase.surnames
property FreebaseData.ServiceTypes.DomainObjects.Society.People: FreebaseData.ServiceTypes.DomainObjects.PeopleDomain


<summary> The people commons is a collection of common types that describe people in the system. The properties of person should be common across all people, i.e. birth date, birth place, relatives. People will often carry other types as well, such as author, basketball player, or recording artist. Properties that are particular to those pursuits should be included in those types, not in the person type. Similarly, it is recommend that &apos;birth date&apos; and properties of the person schema not be replicated in other Types. Instead, &apos;person&apos; should be an &apos;included type&apos; for these types, which means that when a topic is typed &apos;author&apos; it will also be automatically typed &apos;person&apos; and the properties of person will be added as well. </summary>
val name : FreebaseData.ServiceTypes.People.People.Family_nameData
val randomElement : (string [] -> string)

Full name: Freebase.randomElement
val random : System.Random
namespace System
Multiple items
type Random =
  new : unit -> Random + 1 overload
  member Next : unit -> int + 2 overloads
  member NextBytes : buffer:byte[] -> unit
  member NextDouble : unit -> float

Full name: System.Random

--------------------
System.Random() : unit
System.Random(Seed: int) : unit
val arr : string []
Multiple items
val string : value:'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

--------------------
type string = System.String

Full name: Microsoft.FSharp.Core.string
System.Random.Next() : int
System.Random.Next(maxValue: int) : int
System.Random.Next(minValue: int, maxValue: int) : int
property System.Array.Length: int
val i : int32
val name : string
val astronomy : FreebaseData.ServiceTypes.DomainObjects.AstronomyDomain

Full name: Freebase.astronomy
val query : Linq.QueryBuilder

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.query
val e : FreebaseData.ServiceTypes.Astronomy.Astronomy.StarData
property FreebaseData.ServiceTypes.DomainObjects.AstronomyDomain.Stars: FreebaseData.ServiceTypes.Astronomy.Astronomy.StarDataCollection


<summary>A star is really meant to be a single stellar object, not just something that looks like a star from earth. However, in many cases, other objects, such as multi-star systems, were originally thought to be stars. Because people have historically believed these to be stars, they are type as such, but they are also typed as what we now know them to be. </summary>
custom operation: where (bool)

Calls Linq.QueryBuilder.Where
property FreebaseData.ServiceTypes.Astronomy.Astronomy.Celestial_objectData.Distance: System.Nullable<float<Data.UnitSystems.SI.UnitNames.metre>>


<summary>The best approximated measurement from the earth to the object in parsecs.</summary>
property System.Nullable.HasValue: bool
custom operation: select ('Result)

Calls Linq.QueryBuilder.Select
property System.Nullable.Value: float<Data.UnitSystems.SI.UnitNames.metre>
custom operation: sortBy ('Key)

Calls Linq.QueryBuilder.SortBy
custom operation: take (int)

Calls Linq.QueryBuilder.Take
module FreebaseOperators

from FSharp.Data
property FreebaseData.ServiceTypes.DomainObjects.Society.Government: FreebaseData.ServiceTypes.DomainObjects.GovernmentDomain


<summary>The government domain is for all things related to the people and institutions that make up a government. It includes political parties, politicians, elections, government offices, electoral districts, etc., and covers all levels of government, from local to national.The types in this domain have been designed so that they can be used for all types of government, including (but not limited to) US-style federal republics, Westminster-style parliamentary systems, Chinese-style communist governments, and monarchies. Please let us know if you come across a system that doesn&apos;t seem to fit the model, so that we can try to accomodate it.For information about entering data for politicians, elected officials, and other public servants, see Entering information about electioned officials and public servants. The government domain is for all things related to the people and institutions that make up a government. It includes political parties, politicians, elections, government offices, electoral districts, etc., and covers all levels of government, from local to national.The types in this domain have been designed so that they can be used for all types of government, including (but not limited to) US-style federal republics, Westminster-style parliamentary systems, Chinese-style communist governments, and monarchies. Please let us know if you come across a system that doesn&apos;t seem to fit the model, so that we can try to accomodate it.For information about entering data for politicians, elected officials, and other public servants, see Entering information about elected officials and public servants. </summary>
val topBooksWithNameContaining : s:string -> System.Linq.IQueryable<string>

Full name: Freebase.topBooksWithNameContaining
val s : string
val book : FreebaseData.ServiceTypes.Book.Book.BookData
member System.String.ApproximatelyMatches : _pat:string -> bool
namespace Microsoft
namespace Microsoft.FSharp
namespace Microsoft.FSharp.Data
namespace Microsoft.FSharp.Data.UnitSystems
namespace Microsoft.FSharp.Data.UnitSystems.SI
namespace Microsoft.FSharp.Data.UnitSystems.SI.UnitNames
namespace Microsoft.FSharp.Data.UnitSystems.SI.UnitSymbols
val cyclones : FreebaseData.ServiceTypes.Meteorology.Meteorology.Tropical_cycloneDataCollection

Full name: Freebase.cyclones
val topWind : float<metre/second>

Full name: Freebase.topWind
property FreebaseData.ServiceTypes.Meteorology.Meteorology.Tropical_cycloneDataCollection.Individuals10: FreebaseData.ServiceTypes.Meteorology.Meteorology.Tropical_cycloneDataIndividuals10


<summary>A 10x larger sample set of named individuals of type &apos;Tropical Cyclone&apos; in the web data store. This property may be slower to explore</summary>
val distanceTravelledByWindInAnHour : float

Full name: Freebase.distanceTravelledByWindInAnHour
Multiple items
val float : value:'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------
type float = System.Double

Full name: Microsoft.FSharp.Core.float

--------------------
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>
[<Measure>]
type second

Full name: Microsoft.FSharp.Data.UnitSystems.SI.UnitNames.second
[<Measure>]
type meter = metre

Full name: Microsoft.FSharp.Data.UnitSystems.SI.UnitNames.meter
Multiple items
type LiteralAttribute =
  inherit Attribute
  new : unit -> LiteralAttribute

Full name: Microsoft.FSharp.Core.LiteralAttribute

--------------------
new : unit -> LiteralAttribute
val FreebaseApiKey : string

Full name: Freebase.FreebaseApiKey
property FreebaseData.ServiceTypes.Astronomy.Astronomy.StarDataIndividualsAZ.A: FreebaseData.ServiceTypes.Astronomy.Astronomy.StarDataIndividualsAZ.StarDataIndividualsIndexedA


<summary>An indexing of specific named individuals of type &apos;Star&apos; in the web data store</summary>
property FreebaseData.ServiceTypes.Book.Book.BookDataIndividualsAZ.T: FreebaseData.ServiceTypes.Book.Book.BookDataIndividualsAZ.BookDataIndividualsIndexedT


<summary>An indexing of specific named individuals of type &apos;Book&apos; in the web data store</summary>
property Runtime.Freebase.FreebaseDataContext.DataContext: Runtime.Freebase.FreebaseDataContextSettings
event Runtime.Freebase.FreebaseDataContextSettings.SendingQuery: IEvent<Handler<Runtime.Freebase.FreebaseSendingQueryArgs>,Runtime.Freebase.FreebaseSendingQueryArgs>
member System.IObservable.Add : callback:('T -> unit) -> unit
val e : Runtime.Freebase.FreebaseSendingQueryArgs
property Runtime.Freebase.FreebaseSendingQueryArgs.QueryText: string
event Runtime.Freebase.FreebaseDataContextSettings.SendingRequest: IEvent<Handler<Runtime.Freebase.FreebaseSendingRequestArgs>,Runtime.Freebase.FreebaseSendingRequestArgs>
val e : Runtime.Freebase.FreebaseSendingRequestArgs
property Runtime.Freebase.FreebaseSendingRequestArgs.RequestUri: System.Uri
Fork me on GitHub