Text match metadata

On GitHub, you can use the context provided by code snippets and highlights in search results. The Search API offers additional metadata that allows you to highlight the matching search terms when displaying search results.

code-snippet-highlighting

Requests can opt to receive those text fragments in the response, and every fragment is accompanied by numeric offsets identifying the exact location of each matching search term.

To get this metadata in your search results, specify the text-match media type in your Accept header.

application/vnd.github.v3.text-match+json

When you provide the text-match media type, you will receive an extra key in the JSON payload called text_matches that provides information about the position of your search terms within the text and the property that includes the search term. Inside the text_matches array, each object includes the following attributes:

Name Description
object_url The URL for the resource that contains a string property matching one of the search terms.
object_type The name for the type of resource that exists at the given object_url.
property The name of a property of the resource that exists at object_url. That property is a string that matches one of the search terms. (In the JSON returned from object_url, the full content for the fragment will be found in the property with this name.)
fragment A subset of the value of property. This is the text fragment that matches one or more of the search terms.
matches An array of one or more search terms that are present in fragment. The indices (i.e., "offsets") are relative to the fragment. (They are not relative to the full content of property.)

Example

Using cURL, and the example issue search above, our API request would look like this:

curl -H 'Accept: application/vnd.github.v3.text-match+json' \
'http(s)://[hostname]/api/v3/search/issues?q=windows+label:bug+language:python+state:open&sort=created&order=asc'

The response will include a text_matches array for each search result. In the JSON below, we have two objects in the text_matches array.

The first text match occurred in the body property of the issue. We see a fragment of text from the issue body. The search term (windows) appears twice within that fragment, and we have the indices for each occurrence.

The second text match occurred in the body property of one of the issue's comments. We have the URL for the issue comment. And of course, we see a fragment of text from the comment body. The search term (windows) appears once within that fragment.

{
  "text_matches": [
    {
      "object_url": "https://api.github.com/repositories/215335/issues/132",
      "object_type": "Issue",
      "property": "body",
      "fragment": "comprehensive windows font I know of).\n\nIf we can find a commonly distributed windows font that supports them then no problem (we can use html font tags) but otherwise the '(21)' style is probably better.\n",
      "matches": [
        {
          "text": "windows",
          "indices": [
            14,
            21
          ]
        },
        {
          "text": "windows",
          "indices": [
            78,
            85
          ]
        }
      ]
    },
    {
      "object_url": "https://api.github.com/repositories/215335/issues/comments/25688",
      "object_type": "IssueComment",
      "property": "body",
      "fragment": " right after that are a bit broken IMHO :). I suppose we could have some hack that maxes out at whatever the font does...\n\nI'll check what the state of play is on Windows.\n",
      "matches": [
        {
          "text": "Windows",
          "indices": [
            163,
            170
          ]
        }
      ]
    }
  ]
}