SPLURGE: Scholars Portal Library Usage-Based Recommendation Generation Engine

Amazon.ca has a "customers who bought this item also bought" feature that recommends things to you that you might be interested in. LibraryThing has it too: the recommendations for What's Bred in the Bone by Robertson Davies include books by Margaret Laurence, Carol Shields, Michael Ondaatje, Peter Ackroyd, John Fowles, and David Lodge, as well as other Davies works.

Library catalogues don't have any such feature, but they should. And libraries are sitting on the circulation and usage data that makes it possible. (BiblioCommons does have a Similar Titles feature, but it's a closed commercial product aimed at public libraries, and anyway the titles are added by hand.)

SPLURGE will collect usage data from OCUL members and build a recommendation engine that can be integrated into any member's catalogue. The code will be made available under the GNU Public License and the data will be made available under an open data license.

Mailing list

There is a SPLURGE mailing list at Google Groups that anyone can join. Either subscribe through Google or email [email protected]. The archives are public.

Installation

Required packages

You will need to install some software packages to use SPLURGE. (Ubuntu commands in brackets.)

Git for version control (sudo apt-get install git)
PostgreSQL (sudo apt-get intall postgresql)
PostgreSQL development libraries (sudo apt-get install libpq-dev)
Python (sudo apt-get install python)
Python development libaries (sudo apt-get install python-dev)
pip (sudo apt-get install python-pip)
psycopg2, Python module for talking to PostgreSQL (sudo pip install psycopg2)
flask, a simple Python framework for web applications (sudo pip install flask)

Setting up PostgreSQL

First, install it, for example on Ubuntu (see Ubuntu documentation on PostgreSQL for full details, and your own documentation if you use a different operating system)

$ sudo apt-get install postgresql

At the shell, create the splurge_user account and the splurge database:

$ sudo -u postgres createuser
Enter name of role to add: splurge_user
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) n
$ sudo -u postgres createdb splurge
$ sudo -u postgres psql
psql (9.1.5)
Type "help" for help.

postgres=# alter user splurge_user with encrypted password 'splurge';
ALTER ROLE
postgres=# grant all privileges on database splurge to splurge_user;
GRANT

Note that the password is set to "splurge". For your local testing, this is fine. In production this should be different.

Now set up the database:

psql -d splurge -h localhost -U splurge_user -W -f app/db/schema_dump.sql

If you ever need to reset the splurge database back to zero and start over, run this command again.

Later, if you want to dump out the database, run

$ pg_dump -U splurge_user splurge

Set the SPLURGE_USER environment variable

The password for splurge_user needs to be set in the SPLURGE_DB_PASSWORD environment variable before running anything more. (This makes it easier to share code.) Before going on, run this (if you use bash) but, of course, use whatever password you set:

$ export SPLURGE_DB_PASSWORD=splurge

You could add this line to a login file such as .bashrc.

Test that it was set properly by running

$ echo $SPLURGE_DB_PASSWORD

Download SPLURGE

Download SPLURGE from the git repository:

$ git clone https://github.com/splurge/splurge.git
$ cd splurge

Loading in data

This assumes that you're a developer and have downloaded all of the data files from Scholars Portal into the app/splurge/data/ directory. Subdirectories of data use the following directory and file structure:

institution-name/YYYY-mm-dd/items.txt
institution-name/YYYY-mm-dd/transactions.txt

Run this:

$ ./tool.py --update_database

This will take a while.

Test it:

$ ./tool.py --test

Running the web service

$ ./tool.py --little_server

Then go to http://localhost:3000/static/index.html and try it out.

The service is running at http://localhost:3000/splurge_service/

Test ISBNs

0679723951
0691090254
9780140137941
0374270325
9780674991453
9780674992405
0521482631

TO DO

Put this under a GPLv2 license (with "or later") (discuss)
Figure out how best to handle new data uploads, and automate that process so that when new files are uploaded they automatically loaded.
Use xISBN and thingISBN: given book X, look up other manifestations of the same work, then look for and dedupe recommendations for all of them. Instead of offering recommendations based on one edition of a work, it would offer them based on the work.
Go beyond ISBNs into other standard numbers, such as LCCN and OCLCnums
Go beyond standard numbers!
Use for collection development purposes: give collection librarians a way of looking up what's recommended for a given book and seeing if it's in their collection. (Talk to collection librarians about what exactly they'd want.)

Background

We plan to implement in Ontario something close to the JISC project called MOSAIC (Making Our Shared Activity Information Count). The documents there describe what they did, and our plan is based on that.

MOSAIC Data Collection: A Guide
MOSAIC Final Report (and Appendices)
Also MOSAIC Demonstration Links, from a software contest they ran to find new, interesting uses for their data. The examples here go beyond the Recommendation Engine idea, but are worth looking at to see other possible future directions.)

The JISC MOSAIC wiki has code and data examples.

The JISC project grew out of work done by Dave Pattern (Library Systems Manager) and others at the University of Huddersfield. They made usage data available under an Open Data Commons License.

Data
README
Pattern explains things in Free book usage data from the University of Huddersfield
Pattern summarized it all in March 2011 in Sliding Down the Long Tail.

Updated 13 Feb: The SALT Recommender API is doing what we want to do, and JISC's planned SALT 2 project is a consortial approach like OCUL would do:

Pattern's Sliding Down the Long Tail describes the logic we'll need to follow.

Tim Spalding implemented a similar feature at LibraryThing. When asked on Twitter how it worked, he said The best code is just statistics and Given random distribution how many of book X would you expect? How many did you find?.

In conversation, both Pattern and Spalding mentioned the Harry Potter effect: some books are so popular with everyone that they need to be damped down. Everyone reading Freud or Ferlinghetti, Feynman or Foucault, is probably also reading J.K. Rowling, but that doesn't mean Harry Potter and the Goblet of Fire should be recommended to people looking at Totem and Taboo or Madness and Civilization.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
app		app
.gitignore		.gitignore
00-data-gathering.md		00-data-gathering.md
00-sql-notes.md		00-sql-notes.md
README.md		README.md
set_password.sh		set_password.sh
tool.py		tool.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SPLURGE: Scholars Portal Library Usage-Based Recommendation Generation Engine

Mailing list

Installation

Required packages

Setting up PostgreSQL

Set the SPLURGE_USER environment variable

Download SPLURGE

Loading in data

Running the web service

Test ISBNs

TO DO

Background

Related reading

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

splurge/splurge

Folders and files

Latest commit

History

Repository files navigation

SPLURGE: Scholars Portal Library Usage-Based Recommendation Generation Engine

Mailing list

Installation

Required packages

Setting up PostgreSQL

Set the SPLURGE_USER environment variable

Download SPLURGE

Loading in data

Running the web service

Test ISBNs

TO DO

Background

Related reading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages