Motivation

About 10 years ago I had to realize that I knew nothing at all about history. This revelation annoyed me, so I started reading books, one after the other: the history of England, France, the Habsburgs, the Russians, ... However, after finishing a number of books, I started to feel frustrated.

I don't know if you have read the book One Hundred Years of Solitude by Marquez. In the edition of the book I had, there was a family tree included. The Buendia family is quite populous, without a family tree one could get easily lost. I concluded that a simple family tree can keep me in the context, and I am confident when I am reading the story.

History is more complicated then the Buendia family, and can hardly be visualized with a tree. In real life, the people, the dynasties are more tightly coupled to each other, and understanding the relations among them is of utmost importance.

As an example lets consider a famous English-Spanish conflict. The successors of the English king Henry VIII faced a serious conflict with Spain. The king of Spain happened to be a Habsburg that times. You can encounter this story in many different books: in the history of England, the history of Spain, the history of the Habsburgs, ... In the different books you read the same thing from different angles. Many countries, many dynasties are involved, one can get lost easily. But if we had a tool that would visualize us the 'context', we could have a look at it during reading, and get back the confidence. I desperately needed such a tool with that I could visualize people, relations and events, that would put me into 'context'.

For the time being, the Wiking tool solves my problem partially. To explain how, lets consider the Henry VIII case. The following figure shows what I would generate for myself with Wiking when reading about the aforementioned conflict:

By playing 5 minutes with the tool, I was able to prepare this figure. I was interested in the English-Spanish conflict, but I found even more. I can now understand lots of details about relations among people and dynasties. Surprisingly there are not that many hops between the Tudors, the Austrian and Spanish Habsburgs, the Aragons, the Bourbons, ... Every time I am reading about these times, I can have a quick look at this figure, and I am in the 'context'. I know who is who, and I don't get lost in the details.

The Wiking tool has got its name from Wikipedia Kings. It is a tool that can discover the Wikipedia, like how the vikings discovered the sea. It helps you improving your understanding on history, by giving a visual window to Wikipedia, through that you can browse its excellent content easily.

I have several other tools in mind, they will also come to the L3 Portal in the future.

Usage

Using the Wiking tool is very simple. Just enter a Wikipedia article about a historically or politically 'important' person to the textbox on the right, and press Enter. The corresponding Wikipedia article will be retrieved, parsed, and the person together with her/his connections visualized. The visualization is based on a so-called force-directed graph. The nodes behave like planets having gravitation and particles having electric charge. By left clicking on a node, and dragging it to a desired position, we can fix a single node. When right clicking on a node, a context menu is visualized with the following options:

By double clicking a node, the node gets selected. After selecting several nodes, and right clicking on the display window, another context menu is displayed. It has a single option, it can hide all the selected nodes. This is a convenience feature.

The thick lines between people mean spouse relationship, the thin lines with arrows mean a parent-child relation. The arrow points from the child to the parent.

How it works

Every time you request for a wiki page (either by entering the page name into the textbox, or by right click + Extract on a person), a Wikipedia page should be fetched and processed. However, every Wiki page is read and parsed only once from Wikipedia. Once a page is read, the parsed info gets cached in our system. The next time someone is asking for the same person, the data is retrieved from our local cache, and we wont contact the Wikipedia. Consequently, the more we use the Wiking tool, the more Wikipedia pages get cached locally, and the more performant our application will be ( getting data from out local cache is significantly faster then reading pages from Wikipedia).

Technical background

When the user is requesting for a person, the following sequence is executed:

The Wikipedia pages about historical people typically contain an infobox in the top right corner. At the moment we are parsing infoboxes with template names 'royalty', 'nobility', 'monarch', 'officeholder', 'president', 'person', 'prime minister'. These templates contain more or less the same information, the parser logic is the same for all. It is important to note that reading one wiki page results in more than one nodes in the Neo4j database, and in the web browser window. If we request for a certain page, the infobox contains references to many other people: issues, spouses, parents, predecessors, successors. We parse all these people, store, and visualize them. However, there is an important difference: the main person about that the page is written, has more information included on the page: date of birth, date of death, roles (e.g. king of England), ... The referred people have only a name. So for instance if we parse the page about Henry VIII, we have lots of information about him, but only the name of the others that are in relation to him. We store all this data in Neo4j. The next time someone requests for a person that is related to Henry VIII, we see that in Neo4j we have partial data only, so we fetch the appropriate Wikipedia page, parse it, and merge the result to Neo4j. This means that every node in Neo4j has a lifecycle. First it might contain only limited data (name), and later full data. There is one more additional lifecycle step. We want to be able decide the gender of people. This information is not present in the infoboxes. What we can do is that when we read a Wiki page, we identify who are the father and mother of the current person, and mark the father and mother node in Neo4j as 'male' and 'female' respectively.

To recap, each node in Neo4j (and in the web browser) has a lifecycle. They might have basic information only, full data, and additional gender information. This is the explanation why some nodes are blue and red in the browser, and others are gray.

Next steps, known issues

I have dozens of ideas how to improve the Wiking tool. All these improvements are on the agenda, I will implement them one by one to have a decent tool in the end of the road.

Usability

The tool in the current form misses many convenience features such as

Mobile devices support

Currently the tool is not designed for mobile devices, it gives the best user experience on laptops or desktop computers. I tested it by using a 10.5" Android tablet, it works, but does not give currently a good user experience.

Languages

Only the English Wikipedia can we browse at the moment. It would be a nice feature to support other languages too.

Better Wikitext parsing

The wikitext parser I implemented is very basic. It should be revised significantly. In case there are more then one aliases pointing to the same Wikipedia artice, the Wiking tool visualizes them as two separate people.

Neo4j issues

The tool is using the default Neo4j transaction isolation level Read Committed. In very rare cases this can lead to deadlocks. In case 2 users request simultaneously for 2 wikipedia articles, that are related to each other, but none of them exists in Neo4j, after Wikitext parsing the 2 graphs are persisted to Neo4j in paralel. As these new graphs are overlapping, it can happen that 2 overlapping nodes are created so that 1 is created (and locked) by the first graph, the second by the second graph, and then they are waiting for each other. I dont think this case will come up often.

Intelligent graph queries

Once we synchronize huge amount of data from Wikipedia to the local Neo4j database, we could run graph queries on the data. The Neo4j data model is a graph, and lots of nice queries are possible, such as give me the shortest path between people, ... There is a third tab next to "WIKING" and "ABOUT" called "CYPHER". The ones familiar with Cypher can issue Cypher queries to the Neo4j. This feature is included for the ones having more interest in the internals.

More Wikipedia content utilized

Now we are visualizing the family relationships between people. We don't parse and visualize correctly the predecessor, successor information. It would be a nice feature asking for all the members of a certain dynasty, or all the English kings, ... We are also not parsing the events such as the 30 years war, or the battle of waterloo. It would be nice requesting information based on such events too. There are lots of interesting information in Wikipedia that we could utilize.

Wikidata

I was not aware of the project Wikidata when implementing the Wiki parser. By using Wikidata it is likely easier to get scructured Wikipedia data. I will consider using it in the future.

Contribution

In case you feel interest in the Wiking tool, there are a number of ways to contribute in order to make it better:

Contact

Please contact me in case of any questions, suggestions, concerns.

ivan.brencsics@gmail.com