2014-09-15

Designing a faceted search for solar data

Ian Ruotsala

Abstract

In library, information and computer science, an ontology is a network of relationships. A well- known subset of an ontology is a "taxonomy," which is an ontology consisting of "is a" relationships. Biological taxonomies such as "a human is a primate is a mammal is an animal" provide a concrete example. While taxonomies have often been used by software developers as a basis for categorizing a set of items, more general ontologies can provide a richer description of relationships within a set.
Faceted searches provide a way to navigate through instances of a more general ontology and have been increasingly used by Internet merchants such as Amazon to provide intuitive ways of searching and browsing through their products. For this project, an implementation of faceted search was attempted for solar physics data. Using an Internet browser as a platform, a JavaScript program was to be developed in which a data base of solar events (e.g. flares, sunspots) and observations (e.g. which instrument was used, which astronomer cataloged the event) would be queried so that data could be accessed in a dynamic, intuitive way.
Keywords:
faceted search, ontology, search, JavaScript, software development, web application, helioinformatics, solar physics

Background: ontologies and faceted search

Taxonomies are constrained ontologies

Imagine one were to enumerate a set of animal species. One could classify the species according to their types, e.g.:






This is an example of a taxonomy, a network of "is a" relationships.
This is indeed useful, yet one could imagine many other sorts of relationships among species than type. One could describe "lion preys upon gazelle" or "hyena competes with lion". This more thorough network of relationships is an example of an ontology.

Faceted search is implemented by Internet merchants

It is common for Internet merchants to categorize their merchandise in an ontology, so that a faceted search may be implemented. Whereas traditional type classification encouraged hierarchal browsing according to a single relationship ("is a"), faceted search is a way of browsing through an ontology in lateral ways, and using many different relationships.

Helioinformatics

With increasingly cheap computer storage and processing, more massive amounts of data are generated by scientific studies. In solar physics, the field known as "helioinformatics" uses various tools of informations technology, including ontologies, to better get a handle on data.

Ontology of solar data

During the summer of 2009, I interned at the Lockheed Martin Solar & Astrophysics Laboratory (LMSAL). LMSAL has a database of solar data which is accessible via a web API (application programming interface). My task over that summer and into this school year has been to implement a faceted search for their solar data using that API.

Implementation challenges and achievements

Background: description of current system

LMSAL maintains a database of solar events and observations, the Heliophysics Event Registry (HER). The HER can be queried from a web API, returning a JSON file that contains information on events and observations within specified parameters, as well as the relationships of those events and observations to other events and observations. The JSON file format is a data-interchange format (a la XML), as well as a subset of JavaScript, making it especially pliable to that language. My faceted search was to be incorporated into their present search interface at http://www.lmsal.com/helio-informatics/hpkb/

Initial approach

The initial idea was to have the search be structured in a tree-like fashion, with a description of the currently searched set at the root, and, for leaf nodes, a description that was describable by the more limited web API. To mediate between the leaves and the root, various intermediary nodes would be used.

Changing approach

I think that I did not fully understand how faceted searching worked when I decided on the initial approach. Fortunately, I decided on a better approach; unfortunately, it was not until mid-way through my project. With my new approach, the user would be presented with every event and observation within a timespan. The user could then narrow down by selecting for specific features.
A challenge came up by way of the fact that the API only allows 200 events to be returned per query. My way around this was to have each of the about twenty different event types do a query involving just themselves; so, e.g., the "solar flare" type would only search for solar flares within a time span, the sunspot only sunspots, and so on. Once returned, the various events would be stored together so that events with different types but same properties (spots and flares both have grid locations, for example) could be found in the appropriate search.

Obstacles to completion

There were a variety of hindrances that kept this project behind schedule. Most significant among them, I underestimated the steepness of JavaScript's learning curve. JavaScript is described as having "Java-like syntax," and I foolishly thought that, since I could already program in Java, JavaScript would easily come to me. Despite the surface similarity of the two languages, they are, in fact, quite dissimilar: JavaScript is intended to be run on web browsers, while Java is run on a JVM; contrary to Java, there are no classes in JavaScript, only instantiated objects; and so on.

Coda

Sadly, this project is not near completion as of writing time for this paper. It was, however, a valuable learning experience. My plan is to continue working on this project even after the end of the school quarter with the hope that a modicum of extra effort, I can complete this.

Thanks

I would like to thank the many people whose insight, assistance and understanding brought this project along, including:

EJ Zita (The Evergreen State College) for providing the opportunity and preparation for the solar physics internship

Neal Hurlburt (Lockheed Martin Solar Astrophysics Lab) for his guidance on the direction of this project

Neal Nelson, Sherri Shulman and Richard Weiss (The Evergreen State College), my professors, for helping me mature as a programmer and student of computer science.

2010-10-23

Discovery Park blackberry clearing

this is a write-up I did for a day of volunteering while I was at Edmonds Community College


I arrived at EdCC at noon for the carpool, one volunteer of about a dozen with whom I carpooled to brave the wind and rain to clear an invasive blackberry species from around the Daybreak Star Indian Cultural Center in Discovery Park in Seattle. Our small convoy made it's way though Shoreline, then Greenwood, Ballard and Magnolia until we found ourselves in the lovely verdancy of the park. It was pleasantly serene despite the driving rain and the fact that it's located in a city of half a million. In fact, the low hanging clouds evoked an eerie beauty by obscuring the other side of Puget Sound; I, being a science fiction nerd, jokingly entertained the thought that we had passed through a warp in space and were actually by the ocean, rather than the Sound.









We entered the Center and signed into a guestbook that, among other things, provided us with insurance from the City of Seattle in case any of us were hurt while working (though everyone seemed quite saftey-conscious). A volunteer in front of me, from a group not from EdCC, asked a friend what their address was. I was a bit surprised that someone would have to ask what their address was, and wondered if they had just moved. When it was my turn to sign in, I saw why he didn't know what to put for an address: he belonged to a group of volunteers from Tent City 4. Contrary to the stereotype of homeless people as listless alcoholics, the volunteers from Tent City 4 ended up working as hard as us students.



Before beginning, Human Ecology instructor and HELP Club faculty advisor Tom Murphy gave us a brief history of Daybreak Star Center, We were then lead outside, back into the rain, where Tom Murphy and EdCC student Johhny Robbins explained to us the difference between the invasive Himalayan blackberry, and the native salmon and other native berries. The Himalayan blackberry had been imported to the Pacific Northwest to produce jam and wine, but escaped into the wild and began to dominate native species that had no adaptations against it. So, we picked up hoes and clippers, and set to work against the Himalayan.



I, along with several other people, used the hoes to uproot the blackberries--leaving behind the roots would just allow the plant to grow back. Then, the group with the clippers would follow, cutting the uprooted plant into small pieces. Finally, the blackberry pieces would be collected for mulching--if we allowed large pieces to lie on the ground, they would reroot, making a whole new bush (when told this, I though of star fish or grey goo).



We did this for three hours, getting soaked in the process. But, besides the sense of accomplishment imbued by maintaining the natural ecosystem against human mistakes, we were rewarded with pizza at the end of the work. I unfortunately wasn't able to get any pictures of the weed clearing, since heavy rain isn't the best thing for digital cameras, but I did get a few of the pizza dinner.



I, along with Johnny Robbins and EdCC student Kacie McCarty road home in CWU student and HELP Clup president Garrett Jenkins' car. As we were about the cross the Fremont Bridge, we noticed a man with a stalled car, alligator clips on his battery, but no one stopping to give him a jump. So, Garrett stopped, clamped the wires to his battery, and gave the guy a jump. As our car and the formerly-stalled car went their seperate ways, someone (I can't remember which of us) suggested that doing one good deed (i.e. maintain the ecology) puts you in the mood to do another good deed. I think this is partly true, but also think that just being in a car with friends makes you more likely to stop in the middle of a crowded road to give assistance; whereas most single-occupancy drivers were probably in a fairly foul mood, stuck in traffic with only the radio to entertain them, we had been having an engaging conversation. So, the extra few minutes to give a stranger a jump didn't feel onerous at all when we among friends.



I am actually fairly introverted, though it may not seem so when I'm sharing my thoughts on a public blog. So, for a shy person like me especially, I think this is an example of something, even more rewarding than pizza, that happens to me when volunteering: establishing friendships.

2010-10-09

self-evaluation: CSF

At TESC students write self-evaluations of their progress/work at the end of quarters. Here is my self-eval from June of 2009 for Computer Science Foundations, Evergreen's "CS 101"-type class.


I have had programming classes prior to taking this class, so I already had a modicum of experience with computer science coming into CSF. Even with this prior knowledge, I have learned a delightful amount of new concepts and skills in these two quarters. I think what is especially exciting is that this is merely an introductory class, meaning there is still so much more I can learn about this field. This was hardly an easy class; it demanded a significant investment of time and thought. I think, however, the fact that it was such a difficult class makes it all the more rewarding now that I have successfully completed it.

The class was quite well-balanced between theory and practice. The discrete math component was definitely the most theoretical. Although my success at some future programming job may not depend upon how well I can write a proof, the discrete math section was very useful in helping me to understand how computer science actually is just that: a science and not just another way of saying "computer programming". In the seminar section, we studied how computing technology interacted with the real world, both on the design-side by studying open source vs. closed source environments; and on the user-side, by considering how software affects society as a whole. I think it is quite helpful to be reminded that we actually do affect the rest of the world, that we are not cloistered in some ivory tower. The hardware component was probably the most challenging for me. I think it is useful to know how real-world computers function at low, unabstracted levels; but, I must admit that working at such low levels can sometimes be tedious. The Java component provided the most practical skills, in terms of what I assume my tasks at a future programming job would be. In this section, we began at such basic levels as loop structures, but very quickly progressed to more advanced topics, e.g. the various data structures and OOP paradigms. 

One particularly interesting project I worked on this quarter was an airport simulator written in Java. In fact, I ended up doing much more than was required. It began when early in the quarter my professors made off-hand references to it. I found it appealing in part because I have a somewhat nerdy obsession with transportation, whether it be planes, city buses or trains. Maybe a week later they gave some preliminary assignments, but I decided I'd start the entire project *right then*. So, I checked out some books, regarding civil engineering, from the library while doing Internet research on how airports are set up and run. I wanted this "simulation" to be a fairly accurate model of reality. Though some people may have regarded reading a bunch of engineering texts as a bore and a hassle, I thought it was a great opportunity to gain some domain knowledge. I sometimes wish there was a concept of a "liberal science major," like liberal arts major, except that you studied to be a generalist in science and engineering instead of the humanities--I love studying computation and making software, but there are so many other fields in science and technology I have studied little, and where I naturally want to sweep away my ignorance. 

Sieve

I have been learning Haskell for a mere two weeks in class, but am already quite delighted with it. Here is one of the toy programs I wrote today.




{-
sieve.hs, Haskell implementation of sieve of Eratosthenes:
http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes

this program is somewhat sub-optimal--e.g., I find all the composite numbers to n * p, rather than n.
I blame the fact that
1. this is totally for fun
2. I'm still very n00b at Haskell
-}

doSieve n = [p | p <- [2, 3 .. n] , not (p `elem` (compositeNumb n 2))]

compositeNumb n p =
if n < p^2
then []
else ([p * mults | mults <- [2, 3.. n]]) ++ compositeNumb n (p + 1)