Add Typing to Features#1956
Merged
Merged
Conversation
Type-annotate the whole music21/features/ directory (base, jSymbolic, native, outputFormats): __init__/process return types, attribute and signature annotations, with narrowing guards where typed bodies need them. Move the dimensions=1 default into FeatureExtractor and drop the 122 redundant `self.dimensions = 1` lines from the extractors that kept it. Fix docstring grammar, capitalization, and ending punctuation across the class docstrings and description strings. Behavior notes: - Feature.vector now starts as [] rather than None (Changed in v11). - native.QualityFeature: a single key with a non-major/minor mode now falls through to key analysis instead of writing None into the vector. AI-assisted (Claude).
Make name, description, isSequential, dimensions, discrete, and normalize
class-level attributes on FeatureExtractor with non-None defaults
('', '', True, 1, True, False). FeatureExtractor.__init__ now only sets the
per-instance stream/data/feature. Feature's name/description default to ''
and isSequential/discrete to bool True. DataInstance.classLabel is now str=''
(falsy) rather than str|None, and setClassLabel takes a str.
Hoist each jSymbolic/native extractor's metadata to class attributes and
remove the now-empty __init__ methods (133 of them). Drop values that match
the new defaults; keep the non-default ones. FifthsPitchHistogramFeature's
mapping loop becomes a dict comprehension class attribute; LanguageFeature
keeps __init__ only for its LanguageDetector instance.
Net ~674 fewer lines. mypy/ruff/pylint clean; doctests and unit tests pass.
AI-assisted (Claude).
Define a SecondsMapEntry TypedDict describing the per-element dictionary returned by Stream.secondsMap (offsetSeconds, durationSeconds, endTimeSeconds, element, voiceIndex), and type _getSecondsMap to return list[SecondsMapEntry], building each entry as a typed literal. AI-assisted (Claude).
Type the boolean keyword arguments (includeClassLabel, includeId, concatenateLists, prepareStream) as bool across DataSet, DataInstance, StreamForms, and the OutputFormat classes, plus outputFmt/fp/format. addFeatureExtractors now takes type[FeatureExtractor] or a Collection of them; DataSet's featureExtractors arg is typed likewise. Parameterize the histogram helpers' Counter returns (Counter[str], [float], [int]) and type their pitches arguments as Iterable[pitch.Pitch]. Thread the new SecondsMapEntry through formSecondsMap and formBeatHistogram. In outputFormats, lineBreak defaults to '\n' (str) and the redundant 'if lineBreak is None' guards are removed; _getOutputFormatFromFilePath uses rsplit(maxsplit=1) and a single _getOutputFormat call. AI-assisted (Claude).
Add PEP 695 aliases (StreamOrPath, DataSource, ValueOrFunction) and use them to type addData, addMultipleData, DataInstance.__init__, and setClassLabel. addMultipleData's dataList accepts a Sequence of sources or a MetadataBundle. Typing surfaced two cleanups: addData's local `s` was dead (assigned, never read) so it's removed, and addMultipleData now builds normalized local lists instead of reassigning its parameters to different types. AI-assisted (Claude).
extractorsById discarded the results of featureId.replace('-', '') and
.replace(' ', ''), so hyphen/space variants such as 'p-20' or 'p 21'
silently returned no extractor. Assign the result back so the
normalization actually applies, and add a doctest covering it.
Also remove the unreachable None member from the format-membership list
in DataSet._getOutputFormat, since featureFormat is typed str.
AI-assisted (Claude)
…place no-op Per the project typing convention (no t.Any; use real types or leave untyped): - Add a `ClassValue = str|float|int` alias and thread it through `ValueOrFunction`, `DataInstance._id`/`_classValue`, `getClassValue`, `getUniqueClassValues`, `_dataSetParallelSubprocess`, and `addMultipleData` (whose `classValues` is also loosened from `Sequence[str]` to `Sequence[ValueOrFunction]`, restoring the ability to pass per-item callables). `getClassValue`/`getId` are restructured to use a local so the callable union narrows without t.Any. - Type the public utility functions' parameters: extractorById, extractorsById, vectorById, getIndex, allFeaturesAsList, and formMidiIntervalHistogram. extractorsById now narrows str-vs-iterable with isinstance (equivalent to common.isIterable, which treats str as a scalar) so the types check. - Parameterize bare list/tuple returns and add return types to DataSet.write / OutputFormat.write / the OutputFormat.getHeaderLines overrides. DataInstance.__getitem__ is deliberately left untyped (its "form" return is genuinely heterogeneous). - ARFF getHeaderLines now joins class values via str(v); the previous ','.join(values) raised TypeError on numeric class values. Also fix an unrelated unassigned str.replace no-op in lily/lilyObjects.py (_reprInternal discarded the newline-flattening result). AI-assisted (Claude)
Covers the case where DataSet class values are ints/floats rather than
strings: OutputARFF.getHeaderLines must produce `@ATTRIBUTE class {3,4}`
without raising. Before the str(v) coercion, ','.join(values) raised
`TypeError: sequence item 0: expected str instance, int found`.
AI-assisted (Claude)
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
__init__: self.name = ....to be class variables..stream()to values that were returning StreamIterator but which expectedSteams.