Skip to content

Goals and progress

Gaurav Vaidya edited this page Dec 31, 2018 · 192 revisions

Page for setting goals and tracking project progress, updated weekly

Future goals

December 31 to January 1, 2019

December 24 to 28, 2018

December 17 to 21, 2018

December 10 to 14, 2018

November 19 to 23, 2018

November 12 to 16, 2018

November 5 to 9, 2018

  • Continue working on JPhyloRef and Clade Ontology with a goal of producing lists of passing and failing phyloreferences and producing the Clade Ontology as an OWL ontology
    • Decide whether to retain the Python library or to replace it with the Javascript library for converting PHYX to JSON-LD.
    • Extend JPhyloRef to support running tests directly in JSON-LD rather than RDF/XML.
    • Extend JPhyloRef/Python library/Javascript library to support validating and categorizing phyloreferences.
    • Start writing scripts to combine JSON-LD files into a single ontology.
  • Polish JPhyloRef PRs and submit for review: phyloref/jphyloref#30, phyloref/jphyloref#24
  • Get PDFs for our sticker and T-shirt design from Kim Schoonover (who designed our logo)
  • Make requested fixes to phyloref/curation-tool#109

October 29 to November 2, 2018

  • Made a list of information in Regnum that should be included in the Clade Ontology: phyloref/clade-ontology#46
  • Finish cleaning up JPhyloRef: PR #24, PR #30, PR #32
  • Continue working on JPhyloRef and Clade Ontology with a goal of producing lists of passing and failing phyloreferences and producing the Clade Ontology as an OWL ontology
  • Get PDFs for our sticker and T-shirt design from Kim Schoonover (who designed our logo)
  • Submit PRs for review once previous PRs have been reviewed: PR #109

October 22 to 26, 2018

  • Complete immigration paperwork to allow me to work for another two years
  • Work on the Clade Ontology: improve JPhyloRef and the test suites, produce lists of passing and failing phyloreferences and start producing the Clade Ontology as an OWL ontology
  • Get PDFs for our T-shirt design from Kim Schoonover (who designed our logo)
  • Polished several Curation Tool pull requests: PR #105, PR #107, PR #108, PR #110, PR #111
  • Started cleaning up JPhyloRef: PR #26, PR #29, PR #33,

October 15 to 19, 2018

October 8 to 12, 2018

October 1 to 5, 2018

Gaurav's goals

September 10 to 14, 2018

Gaurav's goals

August 20 to 24, 2018

Gaurav's goals

  • Continue working on TDWG presentation and send out a draft of the Powerpoint before Thursday.
  • Improve new term requests to the Phyloref Ontology
  • Organize embargoed phyloreferences and submit for review. Will require the faster speed from converting the Clade Ontology to use the FaCT++ reasoner for improved speed (phyloref/clade-ontology#43)
  • Get to New Zealand in one piece.

August 13 to 17, 2018

Gaurav's goals

August 6 to 10, 2018

Gaurav's goals

July 30 to Aug 3, 2018

Gaurav's goals

July 23 to July 27, 2018

Gaurav's goals

July 16 to July 20, 2018

Gaurav's goals

July 9 to July 13, 2018

Gaurav's goals

July 2 to July 6, 2018

Gaurav's goals

June 25 to June 29, 2018

Gaurav's goals

June 18 to June 22, 2018

Gaurav's goals

June 11 to June 15, 2018

Gaurav's goals

June 4 to June 8, 2018

Guanyang's goals

Gaurav's goals

May 29 to June 01, 2018

Guanyang's goals

  • Write a curation protocol/documentation.
  • Curate two papers.
  • Compile use cases.

Gaurav's goals

May 21 to 25, 2018

Gaurav's goals

May 14 to 18, 2018

Gaurav's goals

May 07 to May 11, 2018

Guanyang's updates and goals

Updates

  • Curated Huisman et al. (2003, which includes:
    • Transcribing graphical tree to newick format, using a newly invented manual method.
    • Entering data into curation tool and generating a JSON file.

Goals

  • Curate additional definitions/papers
  • Compile previously proposed use cases
  • Finalize TDWG talks schedule

Gaurav's goals

Apr 30 to May 04, 2018

Guanyang's updates and goals

Updates

  • Curated definitions in Brochu (2003)
  • Presented on curation experiment and documented issues concerning node labeling
  • Drafted AB meeting blog and revised it twice.
  • Curate definitions in Regnum and document time expenses.
  • Compile list of use cases developed or proposed in a hackmd document.
  • Finalize TDWG talk schedule.

Gaurav's goals

Apr 23 to 27, 2018

Guanyang's goals

  • Write blog for AB meeting.
  • Curate Brochu 2003.
  • Curate another paper with a larger phylogeny.
  • Discuss use cases with Gaurav.

Gaurav's goals

Apr 16 to 20, 2018

Guanyang was in China.

Apr 09 to 13, 2018

Guanyang's goals

Gaurav's goals

Apr 02 to 06, 2018

Guanyang's goals

  • Close blog pull request and publish it.
  • Write up several use cases and share with team.
  • Review TDWG2018 abstracts.

Gaurav's goals

March 26 to 30, 2018

Gaurav's goals

  • Apply for travel funding through the FLMNH.
  • Go through meeting notes for the third f2f meeting, extract all tasks, and add them to Github as issues.
  • Add a section on the Curation Tool to the third f2f meeting blog post.
  • Reorganize Github Projects on the basis of the updated software development plan for the next six months. In particular:
    • Rename the Curation Workflow to Ontology of Phyloreferences (phyloref/curation-workflow#19), and reorganize its Github projects
    • The focus of development should be Curation Tool 0.1, which is a complete enough Curation Tool to create PHYX files that can be incorporated into the ontology of phyloreferences.
    • This will be followed by the Curation Tool 0.2, which will incorporate reasoning directly into the application.
    • We can take our time with Curation Tool 1.0, which will incorporate user interface feedback from users and create a more polished curation experience.
  • Cleaned up Curation Tool specifier matching pull request and submitted to Hilmar for review (phyloref/curation-tool#10)
  • Start moving terms into the formal ontology by filing issues.
  • (Probably next week) Fully curate five papers into the Curation Workflow, with no outstanding test failures

March 19 to 23, 2018

Guanyang's updates and goals

Updates (Mar 12-16):

  • Finished TDWG talk abstract. Link here.
  • Discussed with Gaurav about upcoming f2f meeting and designated tasks.
  • Manually mapped some of Wilcox & Hillis's Rana clade definitions to Che et al (2007). Contacted John Wiens and obtained a data file of his phylogeny in Wiens et al. (2009).

Goals/tasks

  • Prepare for f2f meeting.
  • F2f meeting.

March 12 to March 16, 2018

Guanyang's updates and goals

Updates:

  • TDWG talk abstract 70% done. Talk will focus on OTT-VTO taxonomy mapping and data integration/synthesis with Phenoscape, GBIF and Traitbank.
  • Contacted ~30 researchers to solicit talks. Received 10 applications and 3 abstract submissions.
  • Discussed with Scott Chamberlain over email about a tool to perform "phylogeny-aware" data queries. He is considering writing an R package to do this and giving a TDWG talk.

Goal/tasks for this week:

  • Finish TDWG abstract.
  • Prepare for upcoming f2f meeting: (1) aligning perspectives, (2) phyloreferencing: theoretical and practical considerations, (3) pathway towards developing potential use cases, and (4) tractable research problems. More specifically, gather a list of research questions as a base from which we will develop use cases.
  • Review submitted TDWG abstracts.
  • Apply clade definitions of Wilcox & Hillis (2005) to Che et al (2007) and Wiens et al. (2009), two newer and larger phylogenies.
  • Review a PeerJ manuscript on Asillidae phylogeny.

Gaurav's goals

March 5 to 9, 2018

Guanyang's goals

  • Continue recruiting speakers for the TDWG symposium.
  • Finish two TDWG abstracts, one on OTT-VTO mapping for own symposium, and another on data and knowledge for a different symposium (probably as contributed).
  • Conference call with Hilmar and Emily (already happened on Mon).

Gaurav's goals

Feb 26 to Mar 02, 2018

Guanyang

Updates

  • Met with TDWG symposium co-organizers online, finalized a draft to call for abstracts and distributed it via Evoldir, Taxacom and Twitter.
  • Glimpsed Hillis & Wilcox (2005) and Hillis (2007).
  • Met with David Hillis and talked about phyloreferencing and other aspects of phylogenetic nomenclature during formal meeting and socials.
  • Discussed with other researchers about the prospect of using a rank-free taxonomy to organize natural history collections. I was contemplating a little bit applying phyloreferencing to collections. See Facebook and Twitter threads.

Goals/tasks

  • Send emails to recruit symposium speakers. List of names here.
  • Distribute Call for Abstracts more broadly, including various society FB groups and listservs.
  • Decide on a topic for presentation at TDWG2018. See last week's log for possibilities.
  • Write a small report on the meeting with D. Hillis and share it internally.
  • Prepare a write-up on the issues of integrating data in face of unstable clade content.

Gaurav's goals

Feb 19 to Feb 23, 2018

Guanyang's updates and goals

I spent most of my efforts on digging into the case of Basidiomycota and organizing the TDWG2018 symposium. After some discussions within our project team and with David Hibbet, we may conclude that the same definition may resolve to clades with different compositions on different phylogenies. We assert that the clade definition remains unchanged. What might be the practical implications of this theoretical stance? What questions potential stakeholders (biologists, taxonomists and biodiversity researchers) of our phyloreferencing project might ask when presented with such a viewpoint? More generally, how does phyloreferencing enable better systematics and biodiversity research, while the theoretical underpinning diverges quite significantly from "traditional" thinking? To explore those issues, it will be worthwhile to investigate the following questions. (1) How does changing clade composition affect data integration of organismal traits? (2) How should a clade name be used on different phylogenies? (3) Node and branch-based definitions of the same name may resolve to different clades. What are the philosophical and empirical bases for composing a definition and choosing between the two methods? (4) Can phyloreferencing facilitate comparison of clade compositions? I would welcome any comments on those questions, especially regarding how they may drive the development of actual phyloreferencing use cases.

For this week, I am going to focus primarily on TDWG organizing as well as preparing for a talk abstract(s) for that meeting.

  • [x ] TDWG2018 organizing - conference call, distribute funding application form and invite speakers.
  • TDWG2018 abstract preparation. Some possible presentation topics:
    • OTT/VTO/GBIF taxonomy matching and phylogeny-driven data integration.
    • Integrate phylogenetic information into biodiversity databases.
    • Rank-free taxonomy in biodiversity databases.
    • Closing the gap between data and knowledge - how do we discover new natural history knowledge in biodiversity data?
  • [x ] With David Hillis visiting our lab on Friday, I would like to take a close look at Rana.

Gaurav's goals

Feb 12 to Feb 16, 2018

Guanyang's updates and goals

I queried clade definitions stored Phyloregnum against the Open Tree of Life (OTL), focusing on node-based definitions (based on internal specifiers). The results of the queries are recorded here. While most queries returned a clade (a node that represents the Most Recent Common Ancestor, or MRCA of the specifiers used in the query) in OTL with the same name as the clade definition, some returned a clade without a name, or a different name. An interesting case is Basidiomycota. A query using the four specifiers listed in Phyloregnum returned a clade (or MRCA) labeled as "h2007-1", which includes Basidiomycota, Ascomycota and Entorrhizomycota. After a bit of readings, I found out about what is going on here. Entorrhiza casparyana was formerly part of Basidiomycota, but it got classified into a recently proposed and named phylum "Entorrhizomycota" (Bauer et al., 2015), which is placed as the sister to Dikarya (Basidiomycota and Ascomycota). OTL reflects this taxonomic treatment and that phylogenetic relationship. A query using the four specifiers of the former, more broadly defined Basidiomycota should concern only two lineages in OTL, namely, the redefined, narrow Basidiomycota and the new phylum Entorrhizomycota, but not Ascomycota. Basidiomycota s.s. (sensu strico) and Entorrhizomycota formed a paraphyletic relationship with respect to Ascomycota, so the MRCA would have to be the node that has all three lineages as children, which is h2007-1 (no idea where that comes from). I can draw some simple trees to illustrate this, if you feel that may help with your understanding.

This Basidiomycota case demonstrates how a clade definition could identify different clades when the underlying phylogeny changes. It is true that the definition did not change, but it appears to be clear that the same definition references two different clades on two different phylogenies, with the same specifiers. This seems contradicting to the goal of phyloreferencing, i.e, creating stable, precise clade definitions. Can somebody offer a different view on how to interpret this, or why this might not be a problem, or how phyloreferencing will address this?

Another challenge is the feasibility to compare the OTL clade composition with that of the original or intended clade. We cannot assume that a clade with the same name would definitely mean the "same" clade. If that assumption holds, then we would not have the messy issue of the same name referring to different things, which is quite commonplace. The original publication of the clade definition usually included just a small phylogeny with some "higher-level" taxa labeled, rather than all of their descendants listed. This it leaves lots of room for interpretation as to how the lineages on the original phylogeny compares to those in OTL. Also, that comparison is difficult to make manually as OTL contains hundreds and thousands of tips for any given clades (at the "higher" level). One idea of Phyloreferencing is to give computable definition (or concept) to a clade, and I have been pondering how we can use phyloreferences to compare clades computationally, but I think I can use a little help there. For the aforementioned issues, can we discuss them as our Wed conference call agendas?

Goals for the week

For this week, I would like to continue doing the Phyloregnum queries. Besides documenting the search results, a specific goal for that is explore if and how phyloreferencing can be used to compare clade compositions. This is something biologists will find highly desirable.

  • Continue Phyloregnum definition queries agains OTL.
  • Explore if and how phyloreferencing can be used to compare clade compositions.
  • (Unfinished goal from previous week) Understand the nature of phylogenetic clade definitions (e.g., intensional or ostensive). Read Ghiselin, 1984, Ghiselin, 1995, and Rieppel, 2006. Skim Stanford & Kitcher

Gaurav's goals

Feb 05 to Feb 09, 2018

Guanyang's updates and goals

During the the past week, I started looking at clade definitions stored in PhyloRegnum and explored the possibility of using those to retrieve clades from the Open Tree's synthetic tree. I played with Open Tree's API to learn how to find the MRCA (most recent common ancestor) of a set of OTT ids, or a node in the synthetic tree. It is possible to locate a clade using a set of internal specifier, but I will need to investigate further the relationship of the clade recovered from Open Tree and the one defined in the original publication. Just because a node in the Open Tree bears the same name as the original clade does not necessarily mean they are the "same" or congruent clade.

For the coming weeks, I hope to a nail down a concrete project that I can present at TDWG 2018. This will be something along the line of reconciling taxonomy and phylogenies, using open tree and clade definitions. Abstract submission is due on Mar 12.

  • Retrieve clades from Open Tree based on clade definitions stored in Phyloregnum. Understand concept relationship.
  • For the OTT-VTO matching project, drill down into the Scleropages problem. [Did not work on this]
  • (Unfinished goal from previous week) Understand the nature of phylogenetic clade definitions (e.g., intensional or ostensive). Read Ghiselin, 1984, Ghiselin, 1995, and Rieppel, 2006. Skim Stanford & Kitcher

Gaurav's goals

Jan 29 to Feb 2, 2018

Guanyang's updates and goals

During last week, other than preparing the TDWG symposium abstract, I worked mainly on the OTT (open tree taxonomy) project. I formulated research questions and started exploring study cases to address these questions. For this week, I'd like to continue working on this project. I plan to focus on three groups, Scleropages fishes, one group of insects (probably Reduviidae, which I'm rather familiar), and Campanulaceae (or Campanula). I think this project will illustrate the problems of matching phylogeny with taxonomy and make a plead for phylogenetically defined names. I will also document how the findings may inform phyloreferencing software development.

Gaurav's goals

Jan 22-26, 2018

Guanyang's updates and goals

Updates. Met with Nico and had a conference call with phyloref team. Discussed and clarified some issues regarding the design of use cases. It was agreed that outreach to Golife projects will be best done after we have a working tool and some worked out use cases. I will focus on further developing the Phenoscape project, i.e., mapping two different taxonomies, but will do so from a phyloref perspective. What I would also like to do is devise dummy use cases of phyloreferences, which should demonstrate the range of expected applications and issues. I prepared a TDWD 2018 symposium proposal. Other than those, I prioritized on a job phone interview last week, and I think it went pretty well.

Goals.

  • Finalize TDWD symposium proposal. Contact potential speakers (if need to). 1 day
  • For the Opentree-Phenoscape taxonomy mapping project, find one manageable but interesting fish group which fulfills the following criteria: 1) it has a phylogeny and/or the phylogeny has been updated, 2) that phylogeny has been used to study evolutionary questions, 3) taxonomic history of that group can be traced easily. Apply phyloreferences to this group. 2 days
  • Finish writing the review on Berendsohn & Kennedy's approach on concept taxonomy. 2 days
  • Understand the nature of phylogenetic clade definitions (intensional or ostensive). Read Ghiselin, 1984, Ghiselin, 1995, and Rieppel, 2006. Skim Stanford & Kitcher

Gaurav's goals

Jan 16-19, 2018

Guanyang's goals

  • Phone interview
  • Continue working on the open tree taxonomy project developed at Phenoscape hackathon
  • TDWG symposium proposal/abstract

Gaurav's goals

Jan 08-12, 2018

Guanyang's updates and this week's goals

Updates:

  • I re-read some of Berendsohn's writings and also took a closer look at his two database models -- IOPI and MoReTax, which used concept taxonomy. The former is like a checklist, but occurrences of a name from different sources (taxonomic concepts, essentially) are curated separately and there are some concept relationship statements. It also links out to other databases, but only via name strings (not concepts). The second model describes concept relationships in a more sophisticated way and how the "transmittability of linked information" could be achieved (a kind of data integration, I suppose). I was in the process of summarizing these findings. I would like to do a bit more, but not too much, investigation on if and how these data models were used.

  • I started writing down my questions and thoughts about our project, but also more generally about taxonomy and biodiversity data integration. Link to the doc will be shared via email.

  • Read Introduction and some of the M&M of Franz et al's latest manuscript on reconciling two bird phylogenies, which feels more or less like another test case of running Euler/X. The authors claim at least two major conceptual advances: concept congruences at higher-level can be attained in spite of non-overlapping taxon sampling and diverging perspectives can be reconciled and verbalized, which according to the authors is a more powerful method/language for integrating evolving phylogenomic research.

This week's goals:

  • Continue developing the concept taxonomy critique
    • Develop a more thorough analysis on two particular issues: (1) why Berendsohn and Kennedy's concept taxonomy models did not get widely used in specimen databases? The history appears to be not well documented. I might have to approach Berendsohn himself for a direct conversation. Not a huge priority, though (meaning that the critique can still be developed even I fail to present a full historical analysis of Berendsohn's concept taxonomy). (2) For each of the perceived challenges facing concept taxonomy that I have outlined, I would like to expand the writing to something more substantial. What I have is somewhat superficial and constitute opinions, rather than a critique. More specifically, I will need to illustrate the challenges using examples, hypothetical or actual.
  • Learn how to write on own computer and push commits to Github.
  • Continue developing the list of questions I have about our project.

Gaurav's goals

Jan 02-05, 2018

Guanyang's goals:

  • Continue writing the critique of Concept Taxonomy, more specifically,
    • Read and review several papers (Kennedy et al., 2005; Berendsohn) and survey databases that had or attempted to implement some aspects of Concept Taxonomy (Berendsohn's plant/bryophyte database, GBIF, Fishbase).
    • Try to understand and write about why concept taxonomy sensu Berendsohn & Kennedy did not get adopted by major databases.
    • Read Nico Franz's latest MS on reconciling two bird phylogenomic studies.
  • Write down a list of questions that I have about the Phyloreferencing project.
  • Learn how to use Github to manage projects. Watch this video

Gaurav's goals:

  • Get feedback on my three pull requests, with the goal of getting the blog post published this weekend or early next week.
  • Develop an initial design for the Curation Tool that can be reviewed by the other team members.
  • Start developing Curation Tool, version 0.1.
  • Add support for specimen matching to the Curation Workflow: phyloref/curation-workflow#8
  • Add support for identifying unmatched specifiers in the Curation Workflow: phyloref/curation-workflow#9

Clone this wiki locally