Tutorial: How to bulk import Wikipedia data into the Neo4j graph database programatically

These days i have been pretty getting my hand messed with Neo4j, the world leading graph database. In order not to screw up the production data, I think it would be nice to have a huge sample dataset imported to Neo4j and play around before mastering how Neo4j works. Therefore, i wrote a simple tool, Neo4jDataImport, in which it would first download a wikipedia sample data(small or big file is up to you to choose, the average size of the wiki dataset is around 10Gb uncompressed ) and digest it and import to Neo4j database. For details of how to build and run the program, please refer to the README file inside.

After running the program to import the wikipedia data into Neo4j database, we will then have lot of data to play around. Personally, I really love their Neo4j browser, which can be used to query and visualise the imported graph. We have to know the basic syntax of the cypher query in order to communicate the hear of the Neo4j. For example, i ran this very simple cyper query to get the node:Konitineniti on the sample wikipedia sample dataset that i just imported:

MATCH (p0:Page {title:'Konitineniti'}) -[Link]- (p:Page) RETURN p0, p

11-19-2014_neo4jAdmin

One of the very cool feature is we can see visually how this node is linked to others

11-19-2014_WebAdmin_relationshipgraph

Of course, I have just scatched the very superfical of the whole Neo4j features provided and its cypher query. To learn more of the query sytax, we can refer it at http://neo4j.com/docs/stable/cypher-query-lang.html

The Neo4jDataImport tool can be downloaded and built from:
https://github.com/wwken/MISC/tree/master/Neo4jDataImport

Advertisements

Tutorial – Python unit test with Eclipse (1)

In this article, I will walk through how to set up and do a python unit test with Eclipse.

Prerequisite: Pydev has been installed in Eclipse. If not, please open up the Eclipse and go to: Help -> Eclipse MarketPlace and search for ‘PyDev’ and install it as below

1
Now, we are ready to create a python unit test. To start with, let’s create a new PyDev Project for holding the project source and the unit tests.

1) We go to: File -> New, in the New window, choose PyDev Project as below

2

and give the project a name, such as TestPython

2) In the project TestPython, create a new python module on top of it in order to have all our unit tests placed inside this model.

3

and give the package as test and name as testCalculator

3) We now have the generated package: test and two files created. In theory, we can put as many unit tests in this package. In the testCalculator.py, put the following code there
import unittest

class TestCalc(unittest.TestCase):

def testAdd(self):
print("it is a test")
result = True
self.assertEqual(result, True, "Ohno")

4

Basically, in the above code, we just created a test class: TestCalc which extends the unittest.TestCase as the base class and the TestCalc class will have all the testing API available such as self.assertEqual..etc. For more information of the API of unittest, please feel free to refer: https://docs.python.org/3/library/unittest.html

4) The last step would be to just right click the testCalculator.py in the Package Explorer and choose Run As -> Python unit-test. If the setup is correct, we should see something like this:

5

And yes, congrats! you have just created the basic python unit test!

Stay tuned and more to come (such as running it in the command line instead) later!

Tutorial: Setting up a python debugging environment in Eclipse

As being a savvy programmer, it is vital to have a debugger along with the development. Today I spent two+ hours configuring the python debugger in the Eclipse IDE. (Yes, i feel a need to set it up as i just started dealing with the massive python scripts at my work). After nailing it down i would like to share how I set up.

Prerequisites:

1) Eclipse IDE installed. (At the time of writing, I am using Eclipse Luna 4.4 version)
2) Pydev Eclipse plugin installed. (http://pydev.org/download.html)

Then, depends on which way we want, we can do:
–To debug a remote program
1) Inside the Eclipse, start the remote debug server. If we don’t find it in the tool bar, we can go to Window > Customize perspective > Command groups availability > PyDev debug

2) In the external python script, put these two lines at the begining:

import sys;sys.path.append(r’/Users/ken/eclipse/plugins/org.python.pydev_3.0.0.1388187472/pysrc’) #assuming this is the pydev installation path
import pydevd

3) In the external python script, put this line anywhere you want to have the program paused at the debugger:
pydevd.settrace();

3) Inside the Eclipse, go to the debug perspective

4) there you go, you should be able to pause the execution at where you put the statement at in step 3) above

–To debug a program inside a Eclipse
This is much more easier than debugging a remote program. It is pretty much like debugging a java program in Eclipse.

1) Create a debug configuration: Go to Run -> Debug Configurations -> Python Run, create a profile accordingly

2) Hit Debug