Datasets

DDBJ

Semantic representation of annotated sequence records provided by the DNA Data Bank of Japan (DDBJ), a member of the International Nucleotide Sequence Database Collaboration (INSDC). DDBJ collects, curates, and distributes nucleotide sequence data submitted by researchers worldwide, and this dataset represents those records in RDF to enable semantic integration and advanced querying.

Dataset specifications

Tags
Genome Gene cDNA Tag sequence (nucleic acid) Polymorphism Other DNA RNA Sequence Ontology/Terminology/Nomenclature Others
Provenance Original
Registration Submitted
Data provider
  • National Institute of Genetics
Creator
  • Takatomo FujisawaNational Institute of Genetics
  • Toshiaki KatayamaDatabase Center for Life Science
  • Yasukazu NakamuraNational Institute of Genetics
Issued 2026-02-20
Licenses
  • Referred to in International Nucleotide Sequence Database Collaboration Policy (http://www.insdc.org/policy.html)
Version 140
Download https://rdfportal.org/download/ddbj
SPARQL Endpoint https://rdfportal.org/ddbj/sparql

Dataset statistics

Triples
68495145543
Subjects
10331097478
Properties
121
Objects
11623356647
Classes
156

SPARQL example queries

Example 1

Run on Endpoint
#  Search for a DDBJ entry "CP002459.1" and the metadata.
#  Note the SPARQL endpoint for DDBJ is https://rdfportal.org/ddbj/sparql .

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX insdc:<http://ddbj.nig.ac.jp/ontologies/nucleotide/>

SELECT
  ?predicate ?object
WHERE
{
  VALUES ?entry { <http://identifiers.org/insdc/CP002459.1> }
  VALUES ?predicate { 
    insdc:comment
    insdc:definition
    insdc:division
    insdc:organism
    insdc:sequence_date
    insdc:sequence_version
    insdc:source
    insdc:taxonomy
    insdc:dblink
  }                      
  ?entry rdf:type insdc:Entry. 
  ?entry ?predicate ?object.
}

Example 2

Run on Endpoint
#  Search for a BioProject entry "PRJNA244038.1" and the anntated sequence enties. 
#  Note the SPARQL endpoint for DDBJ is https://rdfportal.org/ddbj/sparql .

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX insdc:<http://ddbj.nig.ac.jp/ontologies/nucleotide/>

SELECT
  ?sequence_version ?definition replace(str(?topology), "http://ddbj.nig.ac.jp/ontologies/nucleotide/","") as ?topology
WHERE
{
  VALUES ?project { <http://identifiers.org/bioproject/PRJNA244038> }
  ?entry insdc:dblink ?project.
  ?entry rdf:type insdc:Entry. 
  ?project rdf:type insdc:BioProject.
  ?entry insdc:sequence_version ?sequence_version.
  ?entry insdc:definition ?definition.
  ?entry insdc:sequence ?sequence.
  ?sequence insdc:topology ?topology.
}

Example 3

Run on Endpoint
#  Search for a PubMed ID "21441521" and the related DDBJ enties. 
#  Note the SPARQL endpoint for DDBJ is https://rdfportal.org/ddbj/sparql .

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX insdc:<http://ddbj.nig.ac.jp/ontologies/nucleotide/>
PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT
  ?sequence_version
WHERE
{
  VALUES ?pubmed { <http://identifiers.org/pubmed/21441521> }
  ?entry dcterms:references ?pubmed.
  ?entry rdf:type insdc:Entry.  
  ?pubmed rdf:type insdc:PubMed.
  ?entry insdc:sequence_version ?sequence_version.
}

Example 4

Run on Endpoint
#  Search for a Taxonomy ID "1148" and the anntated sequence enties.
#  Note the SPARQL endpoint for DDBJ is https://rdfportal.org/ddbj/sparql .

PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo:<http://purl.obolibrary.org/obo/>
PREFIX insdc:<http://ddbj.nig.ac.jp/ontologies/nucleotide/>
PREFIX tax: <http://ddbj.nig.ac.jp/ontologies/taxonomy/>

SELECT
  ?organism ?sequence_version ?definition  ?sequence_date
WHERE
{
  values ?tax_id {tax:1148} 
  ?entry insdc:sequence ?sequence.
  ?sequence obo:RO_0002162 ?taxon.
  ?taxon owl:sameAs ?tax_id.
  ?entry insdc:definition ?definition .
  ?entry insdc:division ?division .
  ?entry insdc:organism ?organism .
  ?entry insdc:sequence_date ?sequence_date .
  ?entry insdc:sequence_version ?sequence_version .
}
ORDER BY ?sequence_date

Example 5

Run on Endpoint
#  Find features containing locus_tag qualifier in DDBJ entry "AP011615.1"
#  Note the SPARQL endpoint for DDBJ is https://rdfportal.org/ddbj/sparql .

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo:<http://purl.obolibrary.org/obo/>
PREFIX insdc:<http://ddbj.nig.ac.jp/ontologies/nucleotide/>
PREFIX faldo:  <http://biohackathon.org/resource/faldo#>

select
 ?locus_tag ?feature_type ?product ?gene_symbol 
 IF(?fstart < ?fend , ?fstart, ?fend) as ?start 
 IF(?fstart < ?fend , ?fend, ?fstart) as ?end
 IF( ?faldo_type = faldo:ForwardStrandPosition,"+", IF( ?faldo_type = faldo:ReverseStrandPosition,"-",".")) as ?strand
where
{ 
  values ?entry {<http://identifiers.org/insdc/AP011615.1>}.
  values ?faldo_type { faldo:ForwardStrandPosition faldo:ReverseStrandPosition }
  ?entry   insdc:sequence ?sequence .
  ?feature obo:BFO_0000050 ?sequence .
  ?sequence rdfs:subClassOf obo:SO_0000001 .
  ?feature rdf:type ?type .
  ?type rdfs:label ?feature_type .
  ?feature insdc:locus_tag ?locus_tag.
  ?feature insdc:product ?product .
  OPTIONAL { ?feature insdc:gene ?gene_symbol .}
  ?feature faldo:location ?faldo .
  ?faldo faldo:begin/rdf:type ?faldo_type .
  ?faldo faldo:begin/faldo:position ?fstart .
  ?faldo faldo:end/faldo:position ?fend .
  ?feature obo:BFO_0000051  ?parent .
}
ORDER BY ?locus_tag

Schema diagram

Schema diagram for ddbj
Schema diagram for ddbj