While collecting and working with data is a burning passion for some, for many it is simply another critical business requirement that is only getting more complicated. In this session, data guru Evelina Gabasova showed participants how the F# language can help make the task of working with data a little more manageable.
Gabasova is definitely a member of the passionate group of data workers. She is a postdoctoral researcher at MRC Cancer Unit, and works with a lot of data. But she is just as interested as anyone else in making the task of working with that data easier.
“I study the genome…and that can get very complicated,” she said. Gabasova said that she relies heavily on the use of F# to work with the massive amounts of data she comes across in her research, particularly because of its strong ability to parse scripts through active patterns. Active patterns allow coders to define input data with names that can be used in a pattern matching expression.
The features are strong with F#
In order to demonstrate how effective F# is, Gabasova demonstrated how, using publically available copies of the Star Wars scripts and an API called SWAPI – the “Star Wars API” – she was able to determine exactly who the most important character in the Star Wars universe is. SWAPI, which calls itself the “world’s first quantified and programmatically-accessible data source for all the data from the Star Wars canon universe,” was able to provide Gabasova with detailed information like characters’ heights, birth dates and other intimate details.
One of the biggest reasons Gabasova advocates F# is because of the availability of built-in type providers that can automatically provide the types, properties and methods you need to work directly with tables in, say, a SQL database. In this way, those working with diverse information sources without having to manually write repetitive lines of code or add on files with a code generator. And she proved how this worked by showing us how easy it was for her to determine the average height of a stormtrooper (5′ 9″, I believe?) and verify that Luke actually was a little short for a stormtrooper.
“F# makes it easy to specify certain attribute,” Gabasova said. “Type providers are amazing!”
May the visualizations be with you
One of Gabasova’s major points was the importance of visualizations. Without visualizations, she said, it is not possible to glean the insights you may want from your data. She then went on to reveal the in-depth visualizations she put together outlining the key “social network analysis” factors that determine the importance of a character, such as centrality and density of network.
“Whenever you do data analysis, always visualize it,” Gabasova said. “Always.”
…who is the most important character in Star Wars according to Gabasova’s research? In her words, “Darth Vader still rules the universe”.
Gabasova then went on to explain how these same F# techniques can be used within organizations to analyze the network connections that exist within their own companies. By taking data from things Slack communications, email and other social platforms, it may be a lot easier to garner serious insights into the social structure of your company with the help of F#’s unique features. However, she insists that F# can really be used for all types of data and projects.
“I would just encourage you to play with the data you have,” she said.
Sameer C. Thiruthikad, a software developer at the Qatar Foundationan and attendee of Gabasova’s talk, said he enjoyed the talk and the way Gabasova chose to present it.
“It was good,” he said. “It was really interesting because they used Star Wars to tell the story.”
Thiruthikad’s team currently makes use of C#, and said they face the very common challenge of organizing and visualizing large sets of data. But he said the session encouraged him to try and use F# going forward as a solution.
“I will surely get into F#,” he said. “I’ll just test the waters and see if there’s anything interesting there.”