Evaluating the utility of linear measurements to identify isolated tooth loci of extinct Hyracoidea

.


Introduction
The fragmentary nature of the paleontological and archeological record means that much of the evidence for past vertebrate life is in isolated skeletal remains (Korth 1979;Badgley 1986;Cassiliano 1997;Cuvier 2009).A number of biological attributes can be inferred from isolated elements, including reconstructions of body size, diet, locomotor mode, and growth rate, as well as the spatiotemporal range of a taxon (Janis 1990;Bever 2005;Boyer 2008;Weaver et al. 2022).
However, accuracy of proxies rests on identification of both species and serial position, two main sources of uncertainty that contribute morphological variation.The task of identifying serial position includes the correct position, or locus, for elements in serially homologous sequences like teeth in tooth rows and vertebrae in vertebral columns (Polly and Head 2004;Zack 2012;Calede and Glusman 2017;Engelman and Croft 2022).In depositional environments where associated teeth can be scarce, the inability to correctly identify isolated elements to specific loci can cause the exclusion of records from datasets, limiting the scope of ecological inference such as relative abundance based on minimum number of individuals, or evolutionary inference about the evolution of modularity and integration between members of such serially homologous sequences (Naylor and Marcus 1994;Pickford 1994; Borths and Stevens 2019).
Solving the locus identification part of the identification problem makes the species identification problem easier to solve.For mammalian molars, the serial position problem may be easier to solve first because the number of possible states is more constrained, as opposed to the unknown number of species that may be present in new or understudied assemblages.Most crown mammals have either three (Eutheria) or four (Metatheria) molar loci, creating three or four discrete groups for any tooth locus identification problem.In contrast to the predictable number of locus categories, the number of potential species in an assemblage is an unknown.It is rarely clear from the outset how many species are present, and how similar or dissimilar they might be expected to be from one another.
In some taxonomic systems, identifying tooth loci is a challenge.Some serial homologues differ very little from each other, requiring significant expertise to distinguish (Clemens 1966).Features that distinguish between morphologically similar serial homologues are rarely the presence or absence of discrete features, but instead changes in the location, size, or shape of one feature relative to another (Böhmer et al. 2015;Head and Polly 2015;Jones et al. 2018).Other serial homologues often differ from each other in a characteristic pattern shared by members of a single developmental field, or module (Butler 1939;Young et al. 2015).However, the characteristic pattern can and does evolve, limiting its predictive power in cases of new or poorly known species (Kimura et al. 2013).
For example, the molar sizes along a toothrow often vary in a unidirectional, proportional pattern allowing the prediction of the size of one molar based on the other two (Kavanagh et al. 2007).However, the exact direction and proportion is not constant across mammals (Polly 2007).Within some clades, such as Rodentia, these relationships between tooth loci vary widely (Labonne et al. 2012).These unknown species-specific relationships between tooth loci may be stronger than any general diagnostic trait, rendering a proposed, general trait invalid for use when species identities are unknown.
In this paper, the goal is to evaluate a set of classification criteria that can be used to assign individual molars to serial position, regardless of whether other sources of variation in teeth have been completely identified.In addition to variation between different species, additional sources of variation potentially include sexual dimorphism, plasticity including age-related differences, and damage to the tooth in vivo or taphonomically (Benazzi et al. 2011).These sources of variation may interact with one another, creating a more complex problem.Larger sample sizes rich with information on each variable, or even all but one variable, could solve the problem.However, fossil samples are classically data-deficient in this regard, both in terms of small sample sizes and lack of information about variables such as sex and ontogeny.Effective validation methods must identify traits that avoid the problem either by being rich in information or by being robust to the influence of extraneous sources of information.Unfortunately, otherwise powerful methods that rely on large training datasets, such as machine learning or artificial intel-ligence, are fundamentally poorly suited for cases in the fossil record limited by very small sample sizes (Chicco 2017).
A phylogenetic bracket may help assess sensitivity of the trait to other potentially confounding influences that cannot be minimized, although the tool is not always applicable (Witmer 1995).In scenarios that are likely to be permanently data-deficient, like fossils, the phylogenetic bracket may be a better tool than more sophisticated statistical treatment of species-specific trait diagnoses.Rather than try to enumerate all potentially confounding sources of variation in the target sample of isolated, data-deficient species, the bracket validates a relationship in more datarich species (Witmer 1995).It then makes the assumption that members of the minimum clade defined by that bracket share the same relationship validity, based on assuming phylogenetic conservatism.For example, if a trait distinguishes loci in bracketing species, then the trait is also inferred to be diagnostic for new species within the clade.As such, it is considered robust to species-specific departures from the trait-locus relationship.If the trait-locus relationship is insensitive to potentially confounding influences like ontogeny and dimorphism in the bracketing species, it is inferred to also be robust to these influences in other, more data deficient members of the clade.Inferences from the phylogenetic bracket can be applied to a sample even when the number of species in the sample is unknown.However, use of the phylogenetic bracket makes assumptions about the applicability of trait relationships from one taxon to another, including assumptions of little phylogenetic change in trait relationships that may not be valid over large spans of evolutionary change.It is important to test the assumption of phylogenetic conservatism through phylogenetic comparative methods, because some traits are remarkably labile and do not meet this assumption (Blomberg et al. 2003).
Hyracoid molars represent a biologically realistic, unsolved challenge of distinguishing morphologically similar serial homologues that is suitable for this kind of approach (Pickford 1994;Pickford et al. 1994;Tsujikawa and Pickford 2006;Barrow 2011).Other examples of this problem include the molars of multiple clades of herbivorous mammals or vertebrate clades with weakly regionalized vertebral columns (Fortelius 1985;Polly and Head 2004;Head and Polly 2015).Results of prior morphometric studies suggest that individual hyracoid molar tooth loci can be discriminated (Barrow et al. 2008;Barrow 2011), but to date these assessments have limited application, lacking published guidance that can be applied to other studies.
We focus our efforts in this case study on distinguishing first from second molars.In many taxa, the first two lower molars are the most challenging cheek teeth to identify in isolation because their number and general arrangement of features is identical.At best, they differ in subtle proportions and relative sizes.The crowns of lower third molars (m3) are identifiable by a distinctively enlarged hypoconulid (Rasmussen and Simons 1988;Barrow 2011;Asher et al. 2017).However, in hyracoids an enlarged m3 hypoconulid feature is not universally shared among all sampled taxa, and therefore in certain cases third molars can also pose a challenge (Asher et al. 2017).Upper first and second molars present a similar challenge (Pickford et al. 1994).The only proposed diagnostic feature that might be shared among all hyracoids has been the relative size of each tooth in the toothrow (Barrow 2011), a feature known to overlap between loci and therefore still leave the position of some isolated teeth ambiguous (Pickford 1994).In this state of knowledge, isolated teeth of some hyracoids remain ambiguously identified, especially those that could be either first or second molars (Pickford 1994;Pickford et al. 1994).
Hyracoids also represent the limitations of a straightforward, extant phylogenetic bracket.Extant species represent only a narrow range of the phylogenetic and morphological diversity that has existed in hyracoids (Fig. 1;Seiffert 2007;Cooper et al. 2014).Most species of hyracoids are extinct and represented only by fossils.Extant species, such as Procavia capensis, have multiple derived features, including high crowns and highly reduced m3 hypoconulids, compared to most of the extinct taxa to which we wanted to apply results (Fig. 1; Asher et al. 2017).The closest relatives of Hyracoidea, Sirenia, and Proboscidea, which might help form an extant phylogenetic bracket, also have unsuitably highly derived dental morphology (Stanhope et al. 1998).In short, one reason that tooth loci of extinct species of hyracoid are difficult to diagnose is that a reasonable extant phylogenetic bracket for inferring their likely diagnostic traits does not exist.
However, a number of extinct hyracoids are known from associated dentitions, and present an opportunity to generate a suitable phylogenetic bracket that could be applied to other, extinct hyracoid taxa known from isolated teeth.A focus on molars also illustrates one strength of using a phylogenetic bracket to limit potentially interacting sources of variation.Molar crown morphology is often highly heritable, and does not change after eruption during ontogeny, other than change due to tooth wear (Bader 1965;Ungar and Williamson 2000;Polly and Mock 2017).Mammalian molar teeth are not generally sexually dimorphic in shape, although they may be size dimorphic in species that are generally size dimorphic (Wood et al. 1991;Uchida 1998;Miller et al. 2009).Within Hyracoidea specifically, extant and extinct species are weakly sexually dimorphic in cranial size but not in molar shape (Meyer 1973;Yom-Tov 1993).Mandibles and incisors are shape dimorphic, but molars are not known to be (Whitworth 1954;Meyer 1973).
We follow three main steps in evaluating identification criteria and illustrating how they can be used in future studies.First, we statistically test whether the proposed criteria render molar loci distinguishable.Second, we use phylogenetic comparative methods to evaluate whether or not a phylogenetic bracket can be applied to these criteria.Third, we illustrate a hypothetical application of the criteria to an example sample of fossils.

Material and methods
Sampling.-Oursampling made use of publically available data regarding hyrax molars (Tables 1, 2).We searched the repository MorphoSource.org(Boyer et al. 2016) for avail-Fig.1. Phylogenetic tree and tooth size distribution in hyracoids (topology from Cooper et al. 2014).Taxa are time-scaled along the x-axis of the tree to reflect fossil occurrences based on the literature, with branches rescaled between these tip dates and a root age estimated at 70.1 million years.Taxa in bold text were included in analyses.Minimum monophyletic clade including taxa in bold indicates the range of the phylogenetic bracket applied for both length and width measures (base of clade indicated by black star).Minimum monophyletic clade for length measures from the literature is indicated by a white star at the base of the clade.Shapes to the right of tips indicate whether there is a significant fit with a model of ascending (increasing) tooth size down the molar row.Abbreviations: M, upper molars; m, lower molars.able micro-CT (µCT) scans of specimens of Hyracoidea with associated teeth, through which molar position could be unambiguously determined (Asher et al. 2017).These specimens consisted of the extant rock hyrax, Procavia cap-ensis, and Eocene-Oligocene specimens from the Fayum, Egypt.The sample of P. capensis includes specimens from Cameroon, Ethiopia, Kenya, South Africa, Sudan, Uganda, and unknown collecting localities.In some cases not all molars in the sequence were preserved, had erupted or remained undamaged enough for a specific type of measurement, resulting in uneven sample sizes of species represented per locus (Table 1).In cases where both the left and right form of a tooth was present in a specimen, we sampled both and took the average of any downstream quantitative measures to reduce the impact of measurement error (Yezerinac et al. 1992;Vitek et al. 2017).We extracted 3D surfaces of each specimen from a µCT scan image stack using Avizo 9.0 software, and exported them in .plyformat (FEI Visualization Science Group, Berlin).Directly measured trait data were collected in MeshLab (Cignoni et al. 2008).Specimen numbers are listed in SOM 1 (Supplementary Online Material available at http://app.pan.pl/SOM/app69-Vitek_Princehouse_SOM.pdf, also reposited on Dryad at DOI: 10.5061/dryad.n2z34tn40).
In addition, we searched the published literature for reported measurements of fossil hyracoid molars.We hypothesized that the commonly reported length and width measurements might be sufficient to evaluate at least some potentially diagnostic traits (Matsumoto 1921(Matsumoto , 1926;;Teilhard de Chardin and Licent 1936;Whitworth 1954;Meyer 1973;Sudre 1979;Rasmussen and Simons 1988, 1991, 2000;Court and Mahboubi 1993;Pickford 1994Pickford , 2005Pickford , 2009;;Rasmussen et al. 1996;Tsujikawa and Pickford 2006;Pickford et al. 2008;Rasmussen and Gutierrez 2009;Barrow et al. 2010Barrow et al. , 2012;;Tabuce 2016;Pickford and Senut 2018).We excluded measurements of specimens assigned to species that were subsequently considered junior synonyms of multiple other species, which meant that species assignment of any particular specimen was unclear (Rasmussen and Gutierrez 2010).Tables of collected measurements are provided in SOM 2.
Traits were each evaluated for their utility as diagnostic indicators of tooth position in the context of intraspecific variation where possible.Ideal sample size was unclear for a study meant to apply to multiple species but focused on between-locus variation.Prior work has shown that for measurements such as lengths and areas, average values can adequately characterize between-locus relationships within a population, and as few as two complete toothrows can accurately represent the mean of between-locus values (Vitek et al. 2020).Prior pilot work supports the use of 1-3 specimens as representative of a species in studies with between-species applications (Gutzwiller and Hunter 2015).However, other studies focused on within-species variation required sample sizes >5 (Polly 2003).Given this uncertainty, we did not impose sample size limitations on taxa.Future work could more directly test the relationship between optimum sample size and within-species levels of variation for similar studies.
Trait choice.-Wedeveloped a list of potentially locusdiagnostic traits based on a review of hyracoid-specific differences in molar position size and shape, as well as differences documented more broadly in Eutheria.Increase in Table 1.Sample sizes of hyracoid molars from MorphoSource.orgincluded in analyses.

Species m1 m2 m3 M1 M2 M3
Table 2. Samples sizes of length and width measurements of hyracoid molars recovered from the published literature and this study.
relative tooth size was previously proposed to differentiate molar loci of hyracoids (Seiffert 2007;Barrow 2011).To test hypotheses that some aspect of relative crown size differentiated loci, we collected length data from each tooth.
From those measurements we calculated a ratio of tooth length relative to the length of the first molar (m1 or M1) in the tooth row.A second proposed trait from the literature is that the width of the trigonid decreases down the lower molar row in many, though not all, eutherians (Butler 1939;Novacek et al. 1985;Kondo et al. 1994).We wanted to test the utility of this trait, but did not want to develop measurements that could only be observed in pristine, unworn specimens because such measurements would have limited applicability.We therefore operationalized relative trigonid width as trigonid base width divided by talonid base width (Fig. 2).
Variation in traits between loci of upper and lower molar rows covaries, providing an additional potential line of evidence for choosing diagnostic traits (Butler 1937;Gómez-Robles and Polly 2012).For example, the upper molar lophs measured here occlude with lophids that intersect with points of maximum width of the trigonid and talonid on lower molars (Janis 1979;Gheerbrant et al. 2016).These lophs and lophids covary in shape because of their functional, occluding relationship (Marshall and Butler 1966).If trigonid width were potentially diagnostic for lower molar positions, the occluding metaloph width might be diagnostic for upper molar positions.To better understand what might be diagnostic about changing upper molar proportions, we tested three potential traits: the ratios of width of the crown at paraloph level relative to crown length, width of the crown at metaloph level relative to crown length, and the length of each tooth relative to M1 length in the same toothrow.To measure the lengths of these lophs while trying to avoid the influence of tooth wear changing the apparent position of cusp tips on the crown, lophs were measured in terms of their maximum transverse length, or the distance from the lingual border of one cusp to the lingual border of its buccolingual pair.
Measurement and analysis.-First,we tested for significant difference in trait values between tooth loci using analysis of variance (ANOVA).We included a term for species along with tooth locus in the model, as well as an interaction effect between the two to test the hypothesis that differences between loci may themselves evolve, resulting in species-specific differences (or lack thereof) between tooth loci.
A common next step would be a canonical variates analysis (CVA) or similar classification test (Bookstein 2018).However, in this study such an approach applied to the entire sample as a single unit would be inappropriate because of the possibility that the trait value relationships between loci evolves.CVA models a constant, unchanging relationship between loci regardless of species identity.Evolution resulting in species-specific linear relationships could not be accurately captured in a single linear model.In three species for which standard deviations could be estimated for all three tooth loci for all measurements, we conducted species-specific CVA using the "MASS" package (Venables and Ripley 2002) to test the utility of these metrics for species-specific locus classification.Input variables were length of each tooth, which served as a measure of size, and a ratio describing relative width of each tooth.For lower molars, the ratio was relative trigonid width.For upper molars, the ratio was width at the paraloph divided by width at the metaloph.Ratios were preferable to absolute values because the ratios remove information about size, helping prevent the results from being the consequence of size alone.Such a result would not fulfill the goal because it has already been shown that size is not a sufficient source of evidence for tooth position (Pickford 1994).Leave-one-out cross-validation was used to as an assessment of the model's ability to accurately classify loci within species.
A more complex classification analysis of the entire sample, like classification and regression trees (CART), and other various machine learning or classification analyses were also not ideal in this case, despite their statistical power (Kuhn and Johnson 2013).They require complete enumeration of possible species in the sample, which is not known in cases where locus identification is a step in determining number of species.An approach that relies on knowing this information would not have achieved our goal of developing criteria that could be used in the absence of knowledge about species identity.
Instead, we assessed the reliability of a more conservative set of relationships, given the potential for species-specific differences in exact degree of difference between tooth locus.First, we assessed the degree to which all species followed a single ascending-descending developmental model based on preliminary visual assessment of the data and knowledge of mammalian tooth development (Polly 2007;Young et al. 2015).These models predict that developmental modules, such as a molar tooth field, share covariance relationships with a racheting effect, even if the exact developmental mechanism is unknown (Kavanagh et al. 2007;Polly 2007).We assessed the degree to which hyracoids share a single overarching developmental pattern by examining consistency in the directionality of trait value differences down the molar row.We calculated the number of taxa that did not share this relationship as a simple proportion out of the total number of species.
Then, we assessed the consistency with which molar loci significantly differed using a particular trait from one taxon.We modeled the distribution of each trait for each tooth position as a normal distribution, lacking information to support use of an alternative distribution (Jaynes and Bretthorst 2003).We used variation in the best-sampled taxon, P. capensis, as a model for standard deviation that we applied to all taxa and tooth positions.We thought it appropriate to apply measures of variation from one taxa to other taxa because in studies of other clades, within-clade variation in variance is relatively low (Gingerich and Schoeninger 1979).We chose the sample of P. capensis as a model because as large a sample size as possible would better represent the true amount of variation in an underlying population.The large geographic scope of sampling of the P. capensis sample makes it more likely to be an overestimate rather than an underestimate of the true variance in other hyracoid taxa.We also considered that an overestimate of variation was a more conservative and better error to make in this case than an underestimate of variation.For each species, we z-transformed the differences between traits among subsequent pairs of loci (first molar versus second molar, second molar versus third molar) by taking the difference between the two per-locus mean values and dividing by the P. capensis standard deviation for the latter molar.Given this model of trait values as a standard normal distribution, we then calculated the probability of finding a larger z-value given the hypothesis that the two locus means were actually drawn from the same distribution.We report those probability values as p-values.The underlying hypothesis being tested is that most of these tests will have significant p-values, and therefore it was appropriate to correct p-values for each batch of species-specific tests for each measurement using a Benjamini-Hochberg false discovery rate test (Benjamini and Hochberg 1995;Bender and Lange 2001).All analyses were conducted in R version 4.3.1 (R Core Team 2015), and copies of scripts as used are included in supplementary data.
Phylogenetic conservatism.-Testingthe hypothesis of phylogenetic conservatism in locus-specific trait values was required before using a phylogenetic bracket to apply values to poorly known species.Two different phylogenetic comparative methods were used to test two slightly different hypotheses.The first, Blomberg et al.'s K is a standard measure of phylogenetic signal in continuous, univariate traits (Blomberg et al. 2003).A value significantly greater than 1 indicates significant phylogenetic retention, a specific form of conservatism in which trait values change less than expected for a Brownian-motion model of trait evolution (Cooper et al. 2010).We estimated Blomberg et al.'s K for ratio values using the phylogenetic relationships between hyracoids recovered by the most recent published phylogenetic analysis (Cooper et al. 2014).This topology does not include all measured hyracoids.An expanded phylogenetic analysis is beyond the scope of this particular study, but is part of ongoing research.We used tip dates listed in SOM 1: table 1 (Matsumoto 1921(Matsumoto , 1926;;Sudre 1979;Drake et al. 1988;Rasmussen and Simons 1988, 1991, 2000;Feibel and Brown 1991;Pickford 1994Pickford , 2009;;Seiffert 2006;Tsujikawa and Pickford 2006;Pickford et al. 2008;Barrow et al. 2010;Rasmussen and Gutierrez 2010;Leakey et al. 2011;Coster et al. 2012;Cote et al. 2018;Walker 2019;Heritage and Seiffert 2022) and a root age of 70.1 million years (Heritage et al. 2021;Heritage and Seiffert 2022) to scale branches and randomly resolved polytomies using the "RRphylo" package (Castiglione et al. 2021).
However, our interest was not in the conservatism of a specific set of trait values, but rather conservatism in the pattern of relationships between loci.That is, we were interested in testing for significant phylogenetic retention in the categorical state of the m2 being significantly longer than the m1, and other similar traits.Two metrics exist for evaluating phylogenetic signal in categorical data.The first, Cooper's D, does not perform well in cases of high trait prevalence, or when a high proportion of taxa all share one character state, which was the case for our observed traits (Fritz and Purvis 2010).The second, Borges et al.'s δ, may be appropriate to the hypothesis, but its properties in cases of phylogenetic retention have not yet been explored to our knowledge (Borges et al. 2019).We decided to explore the properties of Borges et al.'s δ under simulated models of phylogenetic retention to see if it could be used in a statistical test.
In its original development, δ is calculated based on estimates of ancestral states in a phylogenetic tree with a distribution of tip values.The δ value is higher when there is less entropy, or more information, or less uncertainty, in estimates of ancestral states.Possible values of δ have no theoretical upper bound.Instead, the original approach tests for the existence of phylogenetic signal by comparing the observed δ to a distribution of δ estimated from randomly shuffling trait values between tips, simulating a case of no phylogenetic signal (Borges et al. 2019).Such a test would not be informative in differentiating a case of phylogenetic retention from phylogenetic signal fitting a model of Brownian motion.In the case of phylogenetic retention of categorical traits, more tips have the same character states than expected from a hypothesis of trait evolution by Brownian motion.Shuffling a nearly invariant trait among tips would not result in a test of an alternative model.The null hypothesis in this case contains phylogenetic signal, so comparing observed results to a test of no phylogenetic signal also does not provide an adequate test.In the scenario of retention, δ should be high because there is little uncertainty in the ancestral state estimate of an invariant or nearly invariant character, but it is not clear if it will result in a higher value than the range of values produced by phylogenetic signal of evolution following Brownian motion along a topology.
An appropriate test would be comparison of a null hypothesis of trait evolution by Brownian motion versus an alternative hypothesis of trait retention.The two hypotheses should differ in the character states observed, not just the distribution of states across tips.Therefore, we explored the behavior of Borges et al.'s δ (Borges et al. 2019) under this testing model.To do so, we used the empirical topology of Cooper et al. (2014) as described above.On this topology we simulated binary character evolution under a model of Brownian motion 1000 times to generate a distribution of character states and associated Borges et al.'s δ values that fit the null hypothesis (Borges et al. 2019).To provide an indication of the probability that any of our observed trait distributions could have come from this null model, we calculated the number of simulations that (i) produced δ equal to or larger than the observed value, and (ii) had as many or more taxa with the derived state of the binary character compared to the observed value.Each of these proportions is presented as p-values.
To explore the expected behavior of δ in cases of phylogenetic retention, we iteratively made each simulation more and more similar to a case of character invariance.For each simulation, from root to tip, we changed one tip value to the derived character state, then re-calculated Borges et al.'s δ.We repeated this process until all but one tip had the derived character state, which models the maximum amount of phylogenetic retention possible because the statistic can-not be calculated for an invariant character.We compared our observed result to this set of simulations graphically.We conducted all phylogenetic comparative analyses only for the relative length proportions because these values could be calculated from a larger number of tips on the tree.Other proportions were only calculated for 10 or fewer tips, and in that situation δ does not produce reliable estimates (Borges et al. 2019).
Case study.-Wechose the lower molars of Meroehyrax kyongoi from the sites of Losodok (Lothidok) and Nakwai in the Turkana Basin, Kenya in the Turkana Basin, Kenya (Rasmussen and Gutierrez 2009) as a case study for three main reasons.First, they represent a realistic challenge, because the authors noted that without the late discovery of an associated dentition, the assignment of the isolated tooth fossils to position would have been much more tentative (Rasmussen and Gutierrez 2009).Second, these fossils are described in the only hyracoid publication we could find that reported both trigonid and talonid width measures, permitting us to use the full range of diagnostic criteria.Third, they represent realistic examples in the fossil record where some minimum level of variation can be investigated, but sample size is too small for investigation by machine learning or other promising methods (Chicco 2017).We analyzed only the lower molars because the upper molars only had three teeth with complete reported measurements (Rasmussen and Gutierrez 2009).
In Meroehyrax kyongoi, the m3 has a distinctly expanded hypoconulid, making isolated m1s and m2s most difficult to distinguish, like in many other hyracoid species (Rasmussen and Gutierrez 2009).
We applied trait relationships that showed evidence of phylogenetic conservatism to this sample.If specific continuous ratio values showed conservatism, we applied those directly to the sample.If more categorical patterns for indices showed conservatism, we plotted relevant measures in bivariate plots, then visually inspected the plots for clusters of specimens that showed congruence with categorical patterns for both indices.Fig. 3. Distribution of values for a set of univariate, potentially locus-diagnostic traits (A, length vs. m1 length; B, trigonid width vs. talonic width) described in Fig. 2 in lower molars of a range of hyracoid species.

Results
Lower molar locus differentiability.-Thetwo traits, length relative to m1 length and trigonid width relative to talonid width, appear to differentiate tooth loci (Fig. 3) and significantly differ between tooth positions in the di-rectly-measured sample (Table 3).Species also differ significantly in relative length of tooth positions in the larger sample of literature-derived measurements (Table 4).The published measurements were not sufficient to assess potential differences in relative trigonid width for almost all taxa.In both the directly-measured and literature-derived samples, relative length has a significant interaction with species identity (Tables 3, 4).
The standard deviation in both trait values for each tooth position is listed in Table 5. Mean trait values consistently increased down the tooth row in both traits for all species measured in this study (Table 6), but length did not consistently increase for all species with published measurements in the literature (Table 7).Procavia pliocenica and Seggurius amourensis both had smaller reported m2s than m1s.Traits did not always differ significantly between loci  when using modeled standard deviation and a significance threshold of p = 0.05 (Table 6).First and second molars often differed in relative trigonid widths, but second and third molars rarely did.Both pairs of molars frequently, but not universally, differed in relative length.Three species, Procavia capensis, Saghatherium bowni, and Thyrohyrax domorictus, had a minimum sample size needed to estimate a standard deviation for all three loci, and therefore perform a CVA within each species.Visualization of overlap in trait values between loci (Fig. 4) matched different levels of classification accuracy between species (Table 8).All models were significantly more accurate than their associated no-information rate (p < 0.03).The no-information rate is calculated based on the category with the highest proportion of predicted specimens out of the total dataset.For example, the no-information rate for lower molars of Procavia capensis is 0.375, based on the model predicting 9 teeth to be m1 and 9 teeth being predicted to be m2, out of a total of 24 teeth.The model for Procavia capensis had relatively low classification accuracy, consistent with wide overlap in values between loci.In contrast, models for Saghatherium bowni and Thyrohyrax domorictus accurately classified all but one specimen each.
Upper molar locus differentiability.-Of the four traits, three visually differentiate tooth loci and significantly differ between tooth loci: length relative to M1 length, relative metaloph to paraloph width, and metaloph width relative to length (Fig. 5, Table 3).Paraloph width relative to length did not differ significantly between loci in the sample and was not considered further.A larger sample of species drawn from the literature also differ significantly in relative length  of tooth positions (Table 4).Measurements from the literature were not sufficient to assess other potentially diagnostic traits, similar to limitation on lower molar relative trigonid widths.In the larger sample from the literature, relative length has a significant interaction with species identity.Variance in both trait values for each tooth position is listed in Table 5. Mean trait values for relative length and relative width generally, but not universally, increased down the molar row (Tables 7, 9).In six species, Dendorhyrax samueli, Meroehyrax kyongoi, Procavia capensis, Procavia pliocenica, Thyrohyrax domorictus, and Thyrohyrax pygmaeus, the M3 was shorter than the M2.The M2 was consistently significantly longer than the M1 across species, but the M3 was rarely significantly longer than the M2.The width of the metaloph relative to paraloph width generally decreased down the tooth row, but rarely to a significant degree.Metaloph width relative to crown length did not differ significantly between positions for any taxon.
Three species, Procavia capensis, Saghatherium bowni, and Thyrohyrax meyeri, had a minimum sample size needed to estimate a standard deviation for all three loci, and therefore perform a CVA within each species.None of the three models were able to classify specimens more more accurately than the no-information rate (Table 8, Fig. 6).It is possible that these results were due to small sample sizes in Saghatherium bowni and Thyrohyrax meyeri, although sample sizes don't differ much between upper and lower molars of Saghatherium bowni.
Phylogenetic conservatism.-Continuousratio values of relative widths contained phylogenetic signal not significantly different from that expected under Brownian motion, with only one exception (Table 10).The relative widths of the upper third molar showed a signal of phylogenetic retention.Simulations produced a wide range of Borges et al.'s δ values (Fig. 7), similar to the range previously documented for other simulations (Borges et al. 2019).The distribution of δ produced by simulations of Brownian motion appeared to be unimodal, right skewed and spread across a range of character frequencies.As simulations increased in amount of phylogenetic retention, the average value of δ also appeared to increase, but only slightly compared to the range of δ values.The distribution of simulated δ values also become more strongly bimodal as retention increased.Values of δ finally converged on a much narrower range of values when all but one tip had the derived character state.
In comparison to simulations, the state of the m2 significantly longer than m1 did not have significantly higher δ values than expected from Brownian motion (p = 0.061), but did have a proportion of tips with a single state greater than any observed value in simulation (p < 2.2 * 10 -16 ).The state of m3 significantly longer than m2 could not be statistically tested because it was completely invariant across our observed tips, where all third lower molars were significantly longer than second lower molars.The state of the M2 sig-nificantly longer than M1 did not have significantly higher δ values than expected from Brownian motion (p = 0.204), but did have a significantly greater proportion of tips with a single state (p = 0.002).The state of the M3 significantly longer than M2 was similar (p = 0.195, p = 0.004, respectively).
Case study.-Phylogeneticinertia results overall supported the application of a phylogenetic bracket to lower molar proportions of Meroehyrax kyongoi.In this case, we considered it valid to assume that molar length and relative trigonid width increases along the tooth row, even without observations of any associated molars.Prior locus identifications of isolated molars of Meroehyrax kyongoi were also consistent with an overall pattern of increasing relative trigonid widths along with relative lengths (Fig. 8), consistent with statistical analyses of fit of hyracoids to ascending models of tooth length (Table 7).As a case study of how prior analyses might be used to draw inferences about tooth positions of isolated specimens, the following results necessarily include some inferences that would go in a Discussion section in another context.Notably, two associated pairs of specimens match the pattern of directionality expected from other, better known species of hyracoids (KNM-NW 22547A and KNM-NW 22547B, KNM-NW18249 respectively).These three specimens must each be an m1 and m2 because none has the enlarged hypoconulid of an m3 (Rasmussen and Gutierrez 2009).Based on those inferred positions, the only way to designate non-overlapping clusters corresponding to published identifications of m1s and m2s would require a wide distribution of m2 relative trigonid widths.However, if KNM-NW 22554C were re-identified as an m1 then ranges of trait values would match other species of hyracoids more closely.If two specimens, KNM-NW 22549A (a left molar) and KNM-NW 22549G (a right molar), are associated, their proportions relative to one another make it more likely that KNM-NW 22549G is an  m2, rather than an m1 as originally designated (Rasmussen and Gutierrez 2009).Other inferred designations are consistent with published designations.

Discussion
In a prior work, unpublished eigensurface analyses contained significant differences between the shapes of upper molar tooth positions of hyracoids (Barrow et al. 2008;Barrow 2011).Although the goals, methods, and overall results were more focused on differentiating species while accounting for meristic variation, the result that upper molar tooth positions have distinctive morphology contrasts with the results of this study.In this study, we find that traits useful for discriminating between lower molar tooth positions do not discriminate between upper molar positions as well, indicating that either the shape underlying previously published results corresponds to different traits, or that statistical differences in mean shapes is accompanied by wide, overlapping variances that prevent consistent diagnosibility.
The overlap in values between some specimens also match prior observations that tooth loci overlap to some extent in their morphology (Pickford 1994).For the remainder of the discussion, we focus on lower molar tooth positions that could be better discriminated.
As expected, no single set of values proved diagnostic for tooth loci across Hyracoidea.However, the presence of differences and the directionality of differences was largely maintained and significance of those differences between at least one pair of loci was common.The exceptions to directionality of differences were found in 7% of the 30 taxa sampled, depending on how results are interpreted.For molar relative length, Seggeurius amourensis and Procavia pliocenica did not follow an ascending-descending pattern in the lower molar arcade.We suggest that the aberrant m2 value for Seggeurius amourensis reported by Court and Mahboubi (1993) is an error.Visually, the m2 appears intermediate in length between the m1 and m3, contrary to published measurements (Court and Mahboubi 1993: fig. 4).Results for P. pliocenica can be more confidently interpreted biologically.Procavia pliocenica follows a model in which molars become progressively smaller down the molar field rather than larger, but it also has some of the smallest average differences between molar locus sizes of any of the recorded taxa, indicating that the loci differ very little from each other in either direction (Table 7).Procavia is also remarkable within Hyracoidea as having a derived molar morphology lacking clear hypoconulids on its third molars.
The presence and directionality of differences between loci in certain trait values is conserved across most of the taxa sampled.It is those relative differences, as opposed to absolute cut-off values, that we consider diagnostic and useful in future studies.Our hypothesis of phylogenetic retention in the trait of loci following an ascending-descending increase model, but not in the trait of following a specific set of trait values, is supported by phylogenetic comparative methods to the extent that current methods allow.Consistent with results of a significant interaction effect between locus and species in ANOVAs, specific ratio values are not static across the tree but instead show phylogenetic signal.It is to be expected that cut-off values for differentiating loci will differ between taxa, because the exact relationships between loci evolve.Even within genera, cut-off values may be species-specific (Rasmussen and Gutierrez 2009).Unfortunately, the behavior of δ values under models of phylogenetic retention, including the apparently non-normal distribution of results, means that using δ alone to test for significant differences between a phylogenetic signal model and phylogenetic retention model is uninformative.
However, we note two observations from our simulations.First, our observed δ values were consistent with a phylogenetic retention model, even when they could not be statistically differentiated from other models.Second, the proportion of tips with the same trait value was significantly different from our null hypothesis, supporting the phylogenetic retention hypothesis.Although the results of our simulation do not provide a way to use δ to test for phylogenetic retention, we report them here as a basis for others to potentially conduct future work refining use of the statistic and exploring its other properties.
Biologically, this conservatism in hyracoids is notable compared to other mammalian clades such as Rodentia.The two clades are inferred to have existed for roughly the same length of time (Upham et al. 2019), but rodents have explored a much larger region of developmental morphospace (Labonne et al. 2012).Even certain rodentian subfamilies, which appeared more recently than hyracoids, are more variable in their relative molar lengths (Labonne et al. 2012).Considering hyracoids in a broader evolutionary sense, not just their crown-group representatives, still does not contribute the same amount of diversity in relative tooth sizes as is found in other clades.It may be worth exploring molar field conservatism vs. lability more broadly in other mammals.
The consistency of directional relationships between lower molar locus traits and the high classification accuracy of those traits for a majority of species tested supports their use as potential tools to differentiate otherwise indistinguishable tooth positions, particularly the m1 and m2.These results suggest that multiple traits should be used together, rather than relying on a single trait for diagnosis.In new or understudied taxa where specific ratios are not known, future researchers can infer that isolated lower molars that are longer and have narrower talonids relative to other, similar looking molars in the sample may represent more distal loci of the same species.There is some overlap between loci, and therefore we acknowledge that certain isolated teeth with trait values in intermediate ranges may remain impossible to unambiguously identify to a specific position.However, based on our results we expect that the position of many specimens can be resolved.The consistency also supports the utility of phylogenetic bracketing within Hyracoidea.We acknowledge that this use of a phylogenetic bracket to make assumptions about differences between loci within a clade is not necessarily novel.We think it likely that many paleontologists use this reasoning implicitly.However, identifying isolated teeth is a research step conducive to contention because of its uncertainty.For that reason, making assumptions explicit may clarify sources and treatment of uncertainty, improving the reproducibility and making disagreements about identification more productive.
Overall, in the case study of Meroehyrax kyongoi, the inferred positions based on relative lengths and relative trigonid widths are congruent with those that were made using the aid of an additional specimen with all three associated molars (Rasmussen and Gutierrez 2009).This result may not at first appear to advance knowledge because it only partially replicates what was already known, rather than presenting a completely new set of identifications.However, its improvement comes from the observation that consistent results could be obtained even in the absence of a more complete specimen preserving all three associated.It also provides an additional perspective on evidence for identifying the locus position of isolated teeth.In the case study, this additional evidence suggests an alternative tooth position for two specimens.Images of those two specimens have not been published, and with no additional published data, there is not yet a way to choose between the hypothesized positions presented here and the published hypothesis of tooth position.
The case study also allows us to illustrate ways in which our proposed identification traits could be incorporated into a fossil identification workflow.The first step, before trying to identify teeth to a specific molar locus, would be to categorize teeth into broader groups.Specifically, the goal of the first step is to identify whether teeth belong to the molar field at all or to other dental fields, then identify coarse categories of morphotypes.Coarse categories may correspond to genera or higher clades, although there is no reason they must correspond to monophyletic groups.For an example within Hyracoidea, identifying a group of Thryohyrax-or Saghatherium-like molars might be the endpoint of this first round of classification.In the case of M. kyongi, it is important to note that the first step was already completed for us.As an alternative example, had our case study been the two species of Thyrohyrax from Losodok and Nakwai (Rasmussen and Gutierrez 2009), we might have categorized them together without trying to separate into species at this first step.This step is complete when clear, discrete traits like those that can be used in a phylogenetic analysis are no longer useful for further subdividing the sample.The next step would be to take diagnostic measurements, such as tooth lengths and relative trigonid widths, then plot them in a scatterplot like those in this study.Importantly, identification work is not complete simply by inferring tooth locus or additional subdivisions of the dataset from apparent clusters alone.Potential groups of specimens should be re-examined to investigate whether additional traits support the hypothesis that the cluster is a single, coherent biological unit or the alternative hypothesis that specimens should be grouped in other ways.All lines of evidence, both qualitative and quantitative, should be presented as evidence for a final designation of tooth identity.From there, additional work may be conducted to determine if multiple species exist within a coarse morphotype, and then whether all potential species are represented by all tooth positions.It is important to note that this last part of the process is likely to be iterative and non-linear.The process of comparative morphology and identification is inextricable from the process of developing perceptual expertise (Gibson and Gibson 1955).As such, the inherent feedback between the steps of collecting observations about a set of specimens, then using that information to learn to perceive additional observations means that no simple, short set of steps can guarantee a complete solution to identification problems.

Fig. 2 .
Fig. 2. Illustration of measurements taken on lower (m1-m3, A-C) and upper (M1-M3, D-F) teeth in occlusal views to calculate potentially diagnostic traits.Measurements are illustrated on specimen UMZC H5101A, Procavia capensis.Abbreviations: LE, length; MW, width of the crown at metaloph; PW width of the crown at paraloph.

Fig. 4 .
Fig. 4. Overlap in potentially diagnostic trait values for lower molar loci of three example hyracoid taxa (A 1 , B 1 , Procavia capensis; A 2 , B 2 , Saghatherium bowni; A 3 , B 3 , Thyrohyrax domorictus).A. Length vs. proportional frequency, showing how a single trait, length, would be modeled in a univariate discriminant analysis using observed means and variances.Colored sections of the distributions show length values that are within 95% confidence intervals of the means of more than one tooth locus, indicating regions of ambiguous lengths.B. Length vs. relative width, showing scatter plots overlaid on 2D density diagrams showing the distribution of values for individual teeth.

Fig. 5 .
Fig. 5. Distribution of values for a set of univariate, potentially locus-diagnostic traits (A, length vs. M1 length; B, paraloph vs. metaloph; C, metaloph vs. length; D, paraloph vs. length) described in Fig. 2 in upper molars of a range of hyracoid species.

Fig. 6 .
Fig. 6.Overlap in potentially diagnostic trait values for upper molar loci of three example hyracoid taxa (A 1 , B 1 , Procavia capensis; A 2 , B 2 , Saghatherium bowni; A 3 , B 3 , Thyrohyrax meyeri). A. Length vs. proportional frequency, showing how a single trait, length, would be modeled in a univariate discriminant analysis using observed means and variances.Colored sections of the distributions show length values that are within 95% confidence intervals of the means of more than one tooth locus, indicating regions of ambiguous lengths.B. Length vs. relative width, showing scatter plots overlaid on 2D density diagrams showing the distribution of values for individual teeth.

Fig. 8 .
Fig.8.Overlap in potentially diagnostic trait values for a case study of isolated molars of Meroehyrax kyongoi.In contrast to Fig.4, molar locus identifications are based on occupation of space in this scatterplot.Question marks are overlaid over two specimens whose inferred tooth position conflicts with published diagnoses.In parentheses original identification in publication.

Table 3 .
Results of ANOVAs testing differences in univariate metrics between molar loci of hyracoids.Significant p-values are in bold.Abbreviations: df, degrees of freedom; F, F values; p, p-values; SS, sum of squares.

Table 5 .
Standard deviation of potentially locus-diagnostic traits in Procavia capensis.

Table 6 .
Mean locus differences and associated p-values in trait values for lower molars of hyracoids.NA values indicate that at least one necessary tooth locus was not available to measure.Significance values are corrected by a false-discovery rate correction, and those <0.05 are in bold.

Table 7 .
Lengths of second and third molars relative to the length of the first molars from previously published sources.Mean values are given in the first two columns for each arcade (by definition the first molar has a relative length to itself of 1, and is therefore not listed).Modeled significance of those differences are given in the next two columns.Significance values are corrected by a false-discovery rate correction, and those <0.05 are in bold.Relative length values that indicate divergence from an ascending-descending model of tooth sizes, are underlined.Abbreviations: C, fit to an ascending (A, increasing size) or descending (D, decreasing size) cascade model down the molar row, or to neither (N); NA, not applicable because none observed; p, p-value, ?, fit could not be evaluated because of missing data.

Table 8 .
Results of linear discriminant analysis for selected species; p-values assess null hypothesis that the accuracy is less than or equal to the no-information rate.

Table 9 .
Mean locus differences and associated p-values in trait values for upper molars of hyracoids.NA values indicate that at least one necessary tooth locus was not available to measure.Significance values are corrected by a false-discovery rate correction, and those <0.05 are in bold.Underlined values indicate those that do not follow the ascending-descending relationship of the rest of the dataset.

Table 10 .
Results of phylogenetic comparative analysis of Blomberg et al.'s (Blomberg et al. 2003) K; p-values assess likely difference from null hypothesis that the trait evolved under Brownian motion.