I don't like writing this in my Python scripts every time:
import json, os, requests
photos_url = 'https://jsonplaceholder.typicode.com/photos'
photos_path = 'photos.json'
# Fetch the file if it doesn't exist yet.
if not os.path.isfile(photos_path):
res = requests.get(photos_url)
with open(photos_path, 'wb') as f:
f.write(res.content)
# Then, use the local copy.
with open('photos.json') as f:
photos = json.load(f)
print(photos[0]['title'])So this little library lets you write:
import sourced
photos_url = 'https://jsonplaceholder.typicode.com/photos'
photos = sourced.json('photos.json', url=photos_url)
print(photos[0]['title'])And offers some other handy behavior.
Pass url, and optionally headers, to fetch and decode files from the web.
Use encoding to set the encoding for text and JSON. It defaults to utf-8.
t = sourced.text('merci.txt', url=text_url, headers={'Accept-Language': 'fr'})
s = sourced.text('sozai.txt', url=sjis_url, encoding='shift_jis')
j = sourced.json('okay.json', url=json_url, headers={'Authorization': token})
b = sourced.binary('data.bin', url=binary_url)
assert isinstance(s, str)
assert isinstance(j, dict)
assert isinstance(b, bytes)Alternatively, use create=file_creating_function. This function should
return a deserialized result (so, a dict rather than a string, for JSON).
def default_json(): return {'meow': 123}
j = sourced.json('my.json', create=default_json)Pass max_age='2 weeks' to invalidate cached files after 2 weeks.
pytimeparse is used to parse
the provided time delta.
stuff = sourced.json('stuff.json', url=my_url, max_age='5 days')When you only need part of the data at url, you can provide a JSON path
to select the parts you want. The JSON path library used is
jsonpath-rw.
Use find=path to get an array of matches:
sourced.json('titles.json', url=url, find='[:10].title')Use pick=path to keep just the first match:
sourced.json('titles.json', url=url, pick='[?(@.id == 99)]')If url is a list, the results are concatenated with +. This is
useful for text or JSON endpoints that return arrays.
sourced.json('combi.json', url=[url1, url2])
sourced.text('combi.txt', url=[url3, url4])If url is https://blah.com/foo/%p1, then requests are made for
https://blah.com/foo/1, https://blah.com/foo/2, … until the result is
empty text / an empty JSON array.
Use %p0 to start numbering pages from 0 instead.
sourced.json('tags.json', url='https://testbooru.donmai.us/tags.json?limit=500&page=%p1')If using JSON and next_path is a JSON path, it is used to retrieve a URL
for the next page until it is null.
kanji = sourced.json('kanji.json',
url='https://api.wanikani.com/v2/subjects?types=kanji',
headers={'Wanikani-Revision': '20170710',
'Authorization': 'Bearer %s' % wk_token},
find='data[*].data.characters', next_page='pages.next_url')If next_path is a function, it is called with the decoded (e.g. JSON object)
result to get the next page URL.
f = lambda json: base_url + '&start_from=' + json['next_page_start']
sourced.json('a.json', url=base_url, next_page=f)