Tutorial: How to bulk import Wikipedia data into the Neo4j graph database programatically

These days i have been pretty getting my hand messed with Neo4j, the world leading graph database. In order not to screw up the production data, I think it would be nice to have a huge sample dataset imported to Neo4j and play around before mastering how Neo4j works. Therefore, i wrote a simple tool, Neo4jDataImport, in which it would first download a wikipedia sample data(small or big file is up to you to choose, the average size of the wiki dataset is around 10Gb uncompressed ) and digest it and import to Neo4j database. For details of how to build and run the program, please refer to the README file inside.

After running the program to import the wikipedia data into Neo4j database, we will then have lot of data to play around. Personally, I really love their Neo4j browser, which can be used to query and visualise the imported graph. We have to know the basic syntax of the cypher query in order to communicate the hear of the Neo4j. For example, i ran this very simple cyper query to get the node:Konitineniti on the sample wikipedia sample dataset that i just imported:

MATCH (p0:Page {title:'Konitineniti'}) -[Link]- (p:Page) RETURN p0, p

11-19-2014_neo4jAdmin

One of the very cool feature is we can see visually how this node is linked to others

11-19-2014_WebAdmin_relationshipgraph

Of course, I have just scatched the very superfical of the whole Neo4j features provided and its cypher query. To learn more of the query sytax, we can refer it at http://neo4j.com/docs/stable/cypher-query-lang.html

The Neo4jDataImport tool can be downloaded and built from:
https://github.com/wwken/MISC/tree/master/Neo4jDataImport

Advertisements

One thought on “Tutorial: How to bulk import Wikipedia data into the Neo4j graph database programatically

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s